# Hello LTR!

In [2]:
import sys
sys.path.append("../../")
import ltr
import ltr.client as client
import ltr.index as index
import ltr.helpers.movies as helpers

### Use the Elastic client

Two LTR clients exist in this code, an ElasticClient and a SolrClient. The workflow for doing Learning to Rank is the same in both search engines

In [3]:
client = client.ElasticClient()
client.elastic_ep
client.es

<Elasticsearch([{'host': 'search-es2.stage.datahou.se', 'port': 443, 'use_ssl': True}])>

## Step1. Create FeatureSet

실험 버전에 따라서 FeatureSet을 `{myindex}/_ltr/_featureset` 경로에 저장한다.

네이밍 규칙은 `{service:card|commerce|...}.{phase:test|dev|stage|prod}.{version:v1|v2}`로 하고
버전별로 비교할 수 있는 헬퍼 코드도 마련해놓자.

In [4]:
# wipes out any existing LTR models/feature sets in the tmdb index
# client.reset_ltr(index='card_search')

Removed Default LTR feature store [Status: 200]
Initialize Default LTR feature store [Status: 200]


When adding features, we recommend sanity checking that the features work as expected. Adding a “validation” block to your feature creation let’s Elasticsearch LTR run the query before adding it.

In [26]:
# A feature set as a tuple, which looks a lot like JSON
feature_set = {
    "validation": {
        "params": {
            "query": "러그"
        },
        "index": "card_search"
    },
    "featureset": {
        "features": [
            {
                "name": "description",
                "params": [
                    "query"
                ],
                "template_language": "mustache",
                "template": {
                    "match": {
                        "description": "{{ query }}"
                    }
                },
            }
        ]
    }
}

featureset_name = "test.card.v1"

In [27]:
# pushes the feature set to the tmdb index's LTR store (a hidden index)
# overwrite 가능하다.
client.create_featureset(index='card_search', name=featureset_name, ftr_config=feature_set)

Create test.card.v1 feature set [Status: 200]


## Step2. Get Judgment Data



### 1. RelevanceScore, search_keyword, docid 로 이뤄진 데이터셋(Train/Dev/Test)을 가져온다.

```
grade,keywords,docId
4,rambo,7555
3,rambo,1370
3,rambo,1369
4,rocky,4241
```


### 2. 설정된 FeatureSet을 선택하여 피쳐를 데이터에 붙여준다. (synthesize)
```
GET card_search/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "terms": {
          "_id": ["11857781", "12479875"]
          }
        },
        {
          "sltr": {
            "_name": "logged_features",
            "featureset": "test.card.v1",
            "params": {"query": "러그"}
          }
        }
      ]
    }
  },
  "ext": {
    "ltr_log": {
      "log_specs": {
        "name": "ltr_features",
        "named_query": "logged_features"
      }
    }
  },
  "size": 1000
}
```

```
1   qid:1   1:1998.0 # 4518 
0   qid:1   1:2016.0 # 375315   
1   qid:1   1:2005.0 # 16608    
```

In [28]:
import random

def synthesize(
    client,
    featureset_name,
    TrainingSetOut='test.train.txt',
):
    from ltr.judgments import judgments_to_file, Judgment
    NO_ZERO = False

    # resp = client.log_query('tmdb', 'release', None)
    params, resp = client.log_query('card_search', featureset_name, ids=["11857781", "12479875"], params={"query": "러그"})
    print(params)

    # A classic film fan
    judgments = []

    for hit in resp:
        judgments.append(Judgment(
            qid=1,
            docId=hit['id'],
            grade=random.choice([0,1,2,3,4]),
            features=hit['ltr_features'],
            keywords=''
            )
        )

    with open(TrainingSetOut, 'w') as out:
        judgments_to_file(out, judgments)

synthesize(client, featureset_name)


{'query': {'bool': {'filter': [{'sltr': {'_name': 'logged_features', 'featureset': 'test.card.v1', 'params': {'query': '러그'}}}], 'must': [{'terms': {'_id': ['11857781', '12479875']}}]}}, 'ext': {'ltr_log': {'log_specs': {'name': 'ltr_features', 'named_query': 'logged_features'}}}, 'size': 1000}


In [29]:
import ltr.judgments as judge

classic_training_set = [j for j in judge.judgments_from_file(open('data/classic-training.txt'))]
latest_training_set = [j for j in judge.judgments_from_file(open('data/latest-training.txt'))]
classic_training_set[:3]

FileNotFoundError: [Errno 2] No such file or directory: 'data/classic-training.txt'

## Step3. Learning LTR Models

### Train and Submit

We'll train a lot of models in this class! Our ltr library has a `train` method that wraps a tool called `Ranklib` (more on Ranklib later), allows you to pass the most common commands to Ranklib, stores a model in the search engine, and then returns diagnostic output that's worth inspecting. 

For now we'll just train using the generated training set, and store two models `latest` and `classic`.


In [None]:
from ltr.ranklib import train

train(client, training_set=latest_training_set, 
      index='tmdb', featureSet='release', modelName='latest')

Now train another model based on the 'classsic' movie judgments.

In [None]:
train(client, training_set=classic_training_set, 
      index='tmdb', featureSet='release', modelName='classic')

## Step4. Upload LTR Model

## Step5. Predict (Run `sltr` query)

### Ben Affleck vs Adam West
If we search for `batman`, how do the results compare?  Since the `classic` model prefered old movies it has old movies in the top position, and the opposite is true for the `latest` model.  To continue learning LTR, brainstorm more features and generate some real judgments for real queries.

In [None]:
import ltr.release_date_plot as rdp
rdp.plot(client, 'batman')

### See top 12 results for both models

Looking at the `classic` model first.

In [None]:
import pandas as pd
classic_results = rdp.search(client, 'batman', 'classic')
pd.json_normalize(classic_results)[['id', 'title', 'release_year', 'score']].head(12)

And then the `latest` model.

In [None]:
latest_results = rdp.search(client, 'batman', 'latest')
pd.json_normalize(latest_results)[['id', 'title', 'release_year', 'score']].head(12)