# Basics

Fire up an elastic server with the LTR plugin installed and run thru the cells below to get started with Learning-to-Rank!

### Download some requirements

In [1]:
from ltr import download
download.run()

GET https://dl.bintray.com/o19s/RankyMcRankFace/com/o19s/RankyMcRankFace/0.1.1/RankyMcRankFace-0.1.1.jar
GET http://es-learn-to-rank.labs.o19s.com/tmdb.json
Done.


### Index Movies

In [1]:
from ltr import index
index.run()

DELETE INDEX: 200
POST INDEX: 200
Indexing 27846 movies
Done


### Judgment List for "Drama" and "Science Fiction" queries

In Learning to Rank and search more broadly, a judgment list is a set of quality metrics per document. We're going to generate a judgment list for two queries "Drama" and "Science Fiction". We've decided (or learned?) that older drama is better and newer science fiction is prefered. So we have some code that generates that judgment list for these two queries

In [2]:
from ltr import date_genre_judgments
judgments = date_genre_judgments.buildJudgments(judgmentsFile='data/genre_by_date_judgments.txt')

Generating judgments for scifi & drama movies
Done


In [3]:
# Uncomment this line to see the judgments
# 
# for judgment in judgments:
#    print(judgment.toRanklibFormat())

### Create FeatureSet
A feature set is required to log out features to train models.  This step creates the `release` feature set, it consists of one feature, the release year of a movie


In [4]:
config = {"featureset": {
            "features": [
            {
                "name": "release_year",
                "params": [],
                "template": {
                    "function_score": {
                        "field_value_factor": {
                        "field": "release_year",
                        "missing": 2000
                    },
                    "query": { "match_all": {} }
                }
            }
            },
             {
                "name": "is_sci_fi",
                "params": [],
                "template": {
                    "constant_score": {
                        "filter": {
                            "match_phrase": {"genres": "Science Fiction"}
                        },
                        "boost": 10.0
                    }
            }
            },
             {
                "name": "is_drama",
                "params": [],
                "template": {
                    "constant_score": {
                        "filter": {
                            "match_phrase": {"genres": "Drama"}
                        },
                        "boost": 4.0
                    }
                }
            },
             {
                "name": "is_genre_match",
                "params": ["keywords"],
                "template": {
                    "constant_score": {
                        "filter": {
                            "match_phrase": {"genres": "{{keywords}}"}
                        },
                        "boost": 100.0
                    }
                }
            }
    ]
    }}

In [5]:
from ltr import setup_ltr
setup_ltr.run(config=config, featureSet='genre')

Removed LTR feature store: 200
Initialize LTR: 200
Created genre feature set: 201


### Log from search engine -> to training set

Each feature is a query to be scored against the judgment list

In [6]:
from ltr import collectFeatures
trainingSet = collectFeatures.trainingSetFromJudgments(judgmentInFile='data/genre_by_date_judgments.txt', 
                                                       trainingOutFile='data/genre_by_date_judgments_train.txt', 
                                                       featureSet='genre')

Recognizing 2 queries...
REBUILDING TRAINING DATA for Science Fiction (0/2)
REBUILDING TRAINING DATA for Drama (1/2)


### Train and Submit
Using the training data from the previous step, we'll use RankyMcRankFace to spit out two LambaMART models.  Once these files are generated, we can then submit them to elastic to be used in scoring.

In [8]:
from ltr import train
train.run(trainingInFile='data/genre_by_date_judgments_train.txt',
          featureSet='genre',
          modelName='doug')

Delete model doug: 200
Created model doug: 201
Done


### Query Time
If we search for `batman`, how do the results compare?  Since the `classic` model prefered old movies it has old movies in the top position, and the opposite is true for the `latest` model.  To continue learning LTR, brainstorm more features and generate some real judgments for real queries.

In [2]:
from ltr import search
search.run(keywords="Drama", modelName="doug")

{"size": 5, "query": {"sltr": {"params": {"keywords": "Drama"}, "model": "doug"}}}
Rogue One: A Star Wars Story 
10.187046 
2016 
['Adventure', 'Science Fiction', 'Action'] 
A rogue band of resistance fighters unite for a mission to steal the Death Star plans and bring a new hope to the galaxy. 
---------------------------------------
Guardians of the Galaxy Vol. 2 
10.187046 
2017 
['Action', 'Adventure', 'Comedy', 'Science Fiction'] 
The Guardians must fight to keep their newfound family together as they unravel the mysteries of Peter Quill's true parentage. 
---------------------------------------
Wonder Woman 
10.187046 
2017 
['Action', 'Adventure', 'Fantasy', 'Science Fiction'] 
An Amazon princess comes to the world of Man to become the greatest of the female superheroes. 
---------------------------------------
Captain America: Civil War 
10.187046 
2016 
['Adventure', 'Action', 'Science Fiction'] 
Following the events of Age of Ultron, the collective governments of the world pa