# Basics

Fire up an elastic server with the LTR plugin installed and run thru the cells below to get started with Learning-to-Rank!

### Download some requirements

In [1]:
from ltr import download
download.run()

GET https://dl.bintray.com/o19s/RankyMcRankFace/com/o19s/RankyMcRankFace/0.1.1/RankyMcRankFace-0.1.1.jar
GET http://es-learn-to-rank.labs.o19s.com/tmdb.json
Done.


### Index Movies

In [2]:
from ltr import index
index.run()

DELETE INDEX: 200
POST INDEX: 200
Indexing 27846 movies
Done


### Judgment List for "Drama" and "Science Fiction" queries

In Learning to Rank and search more broadly, a judgment list is a set of quality metrics per document. We're going to generate a judgment list for two queries "Drama" and "Science Fiction". We've decided (or learned?) that older drama is better and newer science fiction is prefered. So we have some code that generates that judgment list for these two queries

In [1]:
from ltr import date_genre_judgments
judgments = date_genre_judgments.buildJudgments(judgmentsFile='data/genre_by_date_judgments.txt')

Generating judgments for scifi & drama movies
Done


In [2]:
# Uncomment this line to see the judgments
# 
# for judgment in judgments:
#    print(judgment.toRanklibFormat())

### Create FeatureSet
A feature set is required to log out features to train models.  This step creates the `release` feature set, it consists of one feature, the release year of a movie


In [6]:
config = {"featureset": {
            "features": [
            {
                "name": "release_year",
                "params": [],
                "template": {
                    "function_score": {
                        "field_value_factor": {
                        "field": "release_year",
                        "missing": 2000
                    },
                    "query": { "match_all": {} }
                }
            }
            },
             {
                "name": "genre_match",
                "params": ["keywords"],
                "template": {
                    "match": {
                        "genres": "{{keywords}}"
                    }
            }
            }
    ]
    }}

In [7]:
from ltr import setup_ltr
setup_ltr.run(config=config, featureSet='genre')

Removed LTR feature store: 200
Initialize LTR: 200
Created RELEASE feature set: 201


### Log from search engine -> to training set

Each feature is a query to be scored against the judgment list

In [8]:
from ltr import collectFeatures
collectFeatures.trainingSetFromJudgments(judgmentInFile='data/genre_by_date_judgments.txt', 
                                         trainingOutFile='data/genre_by_date_judgments_train.txt', 
                                         featureSet='genre')

Recognizing 2 queries...
POST
{
  "size": 10000,
  "query": {
    "bool": {
      "must": [
        {
          "terms": {
            "_id": [
              "985",
              "62",
              "149",
              "244267",
              "13363",
              "17431",
              "16320",
              "24428",
              "101299",
              "13060",
              "313106",
              "38",
              "185",
              "78",
              "31011",
              "49047",
              "63",
              "119450",
              "13475",
              "601",
              "206487",
              "106",
              "1954",
              "154",
              "840",
              "830",
              "16633",
              "69315",
              "11561",
              "199",
              "1272",
              "10514",
              "1895",
              "14139",
              "1694",
              "102899",
              "168",
              "14337",
            

AttributeError: 'int' object has no attribute 'qid'

### Train and Submit
Using the training data from the previous step, we'll use RankyMcRankFace to spit out two LambaMART models.  Once these files are generated, we can then submit them to elastic to be used in scoring.

In [None]:
from ltr import train
train.run()

### Query Time
If we search for `batman`, how do the results compare?  Since the `classic` model prefered old movies it has old movies in the top position, and the opposite is true for the `latest` model.  To continue learning LTR, brainstorm more features and generate some real judgments for real queries.

In [None]:
from ltr import plot
plot.run()