# hello-ltr (Solr Edition)

Fire up a Solr server with the LTR plugin enabled and with the `tmdb` configset available.  See the Dockerfile under the solr folder for assistance.

### Download some requirements

In [1]:
from ltr.download import download
download()

GET http://es-learn-to-rank.labs.o19s.com/tmdb.json
GET http://es-learn-to-rank.labs.o19s.com/RankyMcRankFace.jar
GET http://es-learn-to-rank.labs.o19s.com/title_judgments.txt
Done.


### Create the LTR Client

Instantiate a Solr client, so we talk to LtR with Solrese

In [1]:
from ltr.client.solr_client import SolrClient
client = SolrClient()

### Index Movies

In [2]:
from ltr.index import rebuild_tmdb
rebuild_tmdb(client)

Deleted index tmdb [Status: 200]
Created index tmdb [Status: 200]
Indexing 27846 documents
Flushing 500 movies
Done [Status: 200]
Flushing 500 movies
Done [Status: 200]
Flushing 500 movies
Done [Status: 200]
Flushing 500 movies
Done [Status: 200]
Flushing 500 movies
Done [Status: 200]
Flushing 500 movies
Done [Status: 200]
Flushing 500 movies
Done [Status: 200]
Flushing 500 movies
Done [Status: 200]
Flushing 500 movies
Done [Status: 200]
Flushing 500 movies
Done [Status: 200]
Flushing 500 movies
Done [Status: 200]
Flushing 500 movies
Done [Status: 200]
Flushing 500 movies
Done [Status: 200]
Skipping 67456
Skipping 67479
Skipping 133252
Flushing 500 movies
Done [Status: 200]
Skipping 211779
Skipping 200039
Skipping 69372
Skipping 69487
Skipping 164721
Flushing 500 movies
Done [Status: 200]
Skipping 124531
Flushing 500 movies
Done [Status: 200]
Skipping 202855
Flushing 500 movies
Done [Status: 200]
Skipping 110639
Skipping 205300
Skipping 74309
Skipping 206216
Flushing 500 movies
Done [S

### Create FeatureStore
A feature store is required to log out features to train models.  This step creates the `release` feature store, it consists of one feature, the release year of a movie


In [3]:
config = [
  {
    "store": "release", # Note: This overrides the _DEFAULT_ feature store location
    "name" : "release_year",
    "class" : "org.apache.solr.ltr.feature.SolrFeature",
    "params" : {
      "q" : "{!func}def(release_year,2000)"
    }
  }
]


from ltr.setup import setup
setup(client, config=config, featureset='release')

Deleted classic model [Status: 200]
Deleted genre model [Status: 200]
Deleted latest model [Status: 200]
Deleted title model [Status: 200]
Deleted title_fuzzy model [Status: 200]
Deleted _DEFAULT Featurestore [Status: 200]
Deleted genre Featurestore [Status: 200]
Deleted release Featurestore [Status: 200]
Deleted title Featurestore [Status: 200]
Deleted title_fuzzy Featurestore [Status: 200]
Created release feature store under tmdb: [Status: 200]


### Logging and Ratings
For this example we're working with one query, `match_all`.  We utilize the `sltr` query to log out features which will be tagged with a rating to be used in training.

- The classic model prefers old movies: `4 qid:1 1:1960`
- The latest model prefers new movies: `4 qid:1 1:2019`

Using this simple signaling can we do a pseudo-sort on date using LTR? Let's find out.  The next step will generate two training files, `classic-training.txt` and `latest-training.txt`.

In [4]:
from ltr import years_as_ratings
years_as_ratings.run(client, 
                     featureSet='release',
                     classicTrainingSetOut='data/classic-training.txt',
                     latestTrainingSetOut='data/latest-training.txt')

Generating ratings for classic and latest model
Searching tmdb [Status: 200]
Done


### Logging Query
http://localhost:8983/solr/tmdb/select?q=*:*&fl=title,genres,release_year,[features%20store=release]

### Train and Submit
Using the training data from the previous step, we'll use RankyMcRankFace to spit out two LambaMART models.  Once these files are generated, we can then submit them to elastic to be used in scoring.

In [5]:
from ltr.train import train
train(client=client, trainingInFile='data/latest-training.txt', featureSet='release', modelName='latest')
train(client=client, trainingInFile='data/classic-training.txt', featureSet='release', modelName='classic')

Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t DCG@10 -tree 100 -train data/latest-training.txt -save data/latest_model.txt
DONE
Submit Model latest Ftr Set release [Status: 200]
Deleted Model latest [Status: 200]
Created Model latest [Status: 200]
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t DCG@10 -tree 100 -train data/classic-training.txt -save data/classic_model.txt
DONE
Submit Model classic Ftr Set release [Status: 200]
Deleted Model classic [Status: 200]
Created Model classic [Status: 200]


<ltr.train.TrainingLog at 0x13606b6d8>

### Query Time
If we search for `batman`, how do the results compare?  Since the `classic` model prefered old movies it has old movies in the top position, and the opposite is true for the `latest` model.  To continue learning LTR, brainstorm more features and generate some real judgments for real queries.

In [6]:
from ltr.release_date_plot import plot
plot(client)

Search keywords - title:(batman)^0 [Status: 200]
Search keywords - title:(batman)^0 [Status: 200]


### Model Query
http://localhost:8983/solr/tmdb/select?rq={!ltr%20model=latest}&q=batman&fl=title,release_year