# hello-ltr (Solr Edition)

Fire up a Solr server with the LTR plugin enabled and with the `tmdb` configset available.  See the Dockerfile under the solr folder for assistance.

### Download some requirements

In [1]:
from ltr import download
download.run()

GET http://es-learn-to-rank.labs.o19s.com/tmdb.json
GET http://es-learn-to-rank.labs.o19s.com/RankyMcRankFace.jar
Done.


### Run in Solr mode
By default the LTR examples run against Elastic.  The following snippet switches things to Solr mode.

In [3]:
from ltr import useSolr
useSolr()

Switched to Solr client_mode


### Index Movies

In [4]:
from ltr import index
index.run()

Deleted index: tmdb [Status: 200]
Created index: tmdb [Status: 200]
Indexing 27846 documents
Flushing 500 movies
Flushing 500 movies
Flushing 500 movies
Flushing 500 movies
Flushing 500 movies
Flushing 500 movies
Flushing 500 movies
Flushing 500 movies
Flushing 500 movies
Flushing 500 movies
Flushing 500 movies
Flushing 500 movies
Flushing 500 movies
Skipping 67456
Skipping 67479
Skipping 133252
Flushing 500 movies
Skipping 211779
Skipping 200039
Skipping 69372
Skipping 69487
Skipping 164721
Flushing 500 movies
Skipping 124531
Flushing 500 movies
Skipping 202855
Flushing 500 movies
Skipping 110639
Skipping 205300
Skipping 74309
Skipping 206216
Flushing 500 movies
Skipping 10700
Flushing 500 movies
Skipping 273740
Flushing 500 movies
Skipping 13057
Skipping 264330
Skipping 13716
Flushing 500 movies
Skipping 276690
Flushing 500 movies
Skipping 68149
Skipping 211877
Skipping 15533
Skipping 15594
Skipping 15738
Flushing 500 movies
Skipping 82205
Skipping 82400
Flushing 500 movies
Skipping 

### Create FeatureStore
A feature store is required to log out features to train models.  This step creates the `release` feature store, it consists of one feature, the release year of a movie


In [5]:
config = [
  {
    "store": "release", # Note: This overrides the _DEFAULT_ feature store location
    "name" : "release_year",
    "class" : "org.apache.solr.ltr.feature.SolrFeature",
    "params" : {
      "q" : "{!func}def(release_year,2000)"
    }
  }
]


from ltr import setup_ltr
setup_ltr.run(config=config, featureset='release')

Deleted classic model: 200
Deleted latest model: 200
Delete release feature store: 200
Delete _DEFAULT_ feature store: 200
Created release feature store under tmdb: 200


### Logging and Ratings
For this example we're working with one query, `match_all`.  We utilize the `sltr` query to log out features which will be tagged with a rating to be used in training.

- The classic model prefers old movies: `4 qid:1 1:1960`
- The latest model prefers new movies: `4 qid:1 1:2019`

Using this simple signaling can we do a pseudo-sort on date using LTR? Let's find out.  The next step will generate two training files, `classic-training.txt` and `latest-training.txt`.

In [6]:
from ltr import years_as_ratings
years_as_ratings.run(featureSet='release',
                     classicTrainingSetOut='data/classic-training.txt',
                     latestTrainingSetOut='data/latest-training.txt')

Generating ratings for classic and latest model
Done


### Train and Submit
Using the training data from the previous step, we'll use RankyMcRankFace to spit out two LambaMART models.  Once these files are generated, we can then submit them to elastic to be used in scoring.

In [7]:
from ltr import train
train.run(trainingInFile='data/latest-training.txt', featureSet='release', modelName='latest')
train.run(trainingInFile='data/classic-training.txt', featureSet='release', modelName='classic')

Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t DCG@10 -tree 100 -train data/latest-training.txt -save data/latest_model.txt
DONE
PUT latest model under release: 200
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t DCG@10 -tree 100 -train data/classic-training.txt -save data/classic_model.txt
DONE
PUT classic model under release: 200


<ltr.train.TrainingLog at 0x111158128>

### Query Time
If we search for `batman`, how do the results compare?  Since the `classic` model prefered old movies it has old movies in the top position, and the opposite is true for the `latest` model.  To continue learning LTR, brainstorm more features and generate some real judgments for real queries.

In [8]:
from ltr import plot
plot.run()