# hello-ltr

Fire up an elastic server with the LTR plugin installed and run thru the cells below to get started with Learning-to-Rank!

### Download some requirements

In [1]:
from ltr import download
download.run()

GET http://es-learn-to-rank.labs.o19s.com/tmdb.json
GET http://es-learn-to-rank.labs.o19s.com/RankyMcRankFace.jar
Done.


### Index Movies

In [1]:
from ltr import index
index.run()

DELETE INDEX: 404
POST INDEX: 200
Indexing 27846 movies
Done


### Create FeatureSet
A feature set is required to log out features to train models.  This step creates the `release` feature set, it consists of one feature, the release year of a movie


In [1]:
config = {
    "featureset": {
        "features": [
            {
                "name": "release_year",
                "params": [],
                "template": {
                    "function_score": {
                        "field_value_factor": {
                            "field": "release_year",
                            "missing": 2000
                        },
                        "query": { "match_all": {} }
                    }
                }
            }
        ]
    }
}


from ltr import setup_ltr
setup_ltr.run(config=config, featureSet='release')

Removed LTR feature store: 200
Initialize LTR: 200
Created RELEASE feature set: 201


### Logging and Ratings
For this example we're working with one query, `match_all`.  We utilize the `sltr` query to log out features which will be tagged with a rating to be used in training.

- The classic model prefers old movies: `4 qid:1 1:1960`
- The latest model prefers new movies: `4 qid:1 1:2019`

Using this simple signaling can we do a pseudo-sort on date using LTR? Let's find out.  The next step will generate two training files, `classic-training.txt` and `latest-training.txt`.

In [2]:
from ltr import years_as_ratings
years_as_ratings.run(featureSet='release',
                     classicTrainingSetOut='data/classic-training.txt',
                     latestTrainingSetOut='data/latest-training.txt')

Generating ratings for classic and latest model
Done


### Train and Submit
Using the training data from the previous step, we'll use RankyMcRankFace to spit out two LambaMART models.  Once these files are generated, we can then submit them to elastic to be used in scoring.

In [4]:
from ltr import train
train.run(trainingInFile='data/latest-training.txt', featureSet='release', modelName='latest')
train.run(trainingInFile='data/classic-training.txt', featureSet='release', modelName='classic')

Delete model latest: 404
Created model latest: 201
Done
Delete model classic: 404
Created model classic: 201
Done


### Query Time
If we search for `batman`, how do the results compare?  Since the `classic` model prefered old movies it has old movies in the top position, and the opposite is true for the `latest` model.  To continue learning LTR, brainstorm more features and generate some real judgments for real queries.

In [5]:
from ltr import plot
plot.run()