# hello-ltr (Solr Edition)

Fire up a Solr server with the LTR plugin enabled and with the `tmdb` configset available.  See the Dockerfile under the solr folder for assistance.  These notebooks we'll use in this training have something of an ltr client library, and a starting point for demonstrating several important learning to rank capabilities.

This notebook will document many of the important pieces so you can reuse them in future training sessions

### Download some requirements

In [1]:
from ltr import download
download()

GET http://es-learn-to-rank.labs.o19s.com/tmdb.json
GET http://es-learn-to-rank.labs.o19s.com/RankyMcRankFace.jar
GET http://es-learn-to-rank.labs.o19s.com/title_judgments.txt
GET http://es-learn-to-rank.labs.o19s.com/genome_judgments.txt
GET http://es-learn-to-rank.labs.o19s.com/sample_judgments_train.txt
Done.


### Create the LTR Client

Instantiate a Solr client, so we talk to LtR with Solrese

In [2]:
from ltr.client.solr_client import SolrClient
client = SolrClient()

### Index Movies

In [3]:
from ltr.index import rebuild_tmdb
rebuild_tmdb(client)

Deleted index tmdb [Status: 200]
Created index tmdb [Status: 200]
Reindexing...
Indexed 0 movies (last Black Mirror: White Christmas)
Indexed 100 movies (last Apocalypse Now)
Indexed 200 movies (last Crooks in Clover)
Indexed 300 movies (last For a Few Dollars More)
Indexed 400 movies (last Downfall)
Flushing 500 movies
Done [Status: 200]
Indexed 500 movies (last Finding Nemo)
Indexed 600 movies (last Platoon)
Indexed 700 movies (last Night of the Living Dead)
Indexed 800 movies (last Evangelion: 1.0: You Are (Not) Alone)
Indexed 900 movies (last Batman: Assault on Arkham)
Flushing 500 movies
Done [Status: 200]
Indexed 1000 movies (last Riley's First Date?)
Indexed 1100 movies (last The Raid)
Indexed 1200 movies (last Falling Down)
Indexed 1300 movies (last Kal Ho Naa Ho)
Indexed 1400 movies (last Elizabeth)
Flushing 500 movies
Done [Status: 200]
Indexed 1500 movies (last Irreversible)
Indexed 1600 movies (last Friday Night Lights)
Indexed 1700 movies (last Ben X)
Indexed 1800 movies (

Done [Status: 200]
Indexed 16000 movies (last The Great Northfield Minnesota Raid)
Indexed 16100 movies (last Lotta Leaves Home)
Indexed 16200 movies (last Just One of the Girls)
Indexed 16300 movies (last Which Way Is The Front Line From Here? The Life and Time of Tim Hetherington)
Indexed 16400 movies (last The Ladies Man)
Flushing 500 movies
Done [Status: 200]
Indexed 16500 movies (last Assassin of the Tsar)
Indexed 16600 movies (last The Adventures of Tarzan)
Indexed 16700 movies (last Vendetta)
Indexed 16800 movies (last Trucker)
Indexed 16900 movies (last Branded)
Flushing 500 movies
Done [Status: 200]
Indexed 17000 movies (last Mariage à Mendoza)
Indexed 17100 movies (last Love Bites)
Indexed 17200 movies (last The Ballad of Ramblin' Jack)
Indexed 17300 movies (last Blade of the Ripper)
Indexed 17400 movies (last Kiler)
Flushing 500 movies
Done [Status: 200]
Indexed 17500 movies (last Kaïrat)
Indexed 17600 movies (last Body Bags)
Indexed 17700 movies (last Dave Attell: Captain M

### Create FeatureStore
We'll discuss the feature store a bit more. You can think of them as a series of queries that will be stored and executed before we need to train a model.

setup is our function for preparing learning to rank to optimize search using a set of features. In this stock demo, we just have one feature, the year of the movie's release.


In [4]:
config = [
  {
    "store": "release", # Note: This overrides the _DEFAULT_ feature store location
    "name" : "release_year",
    "class" : "org.apache.solr.ltr.feature.SolrFeature",
    "params" : {
      "q" : "{!func}def(release_year,2000)"
    }
  }
]


from ltr import setup
setup(client, config=config, featureset='release')

Deleted classic model [Status: 200]
Deleted genre model [Status: 200]
Deleted latest model [Status: 200]
Deleted title model [Status: 200]
Deleted title_fuzzy model [Status: 200]
Deleted _DEFAULT Featurestore [Status: 200]
Deleted genre Featurestore [Status: 200]
Deleted release Featurestore [Status: 200]
Deleted title Featurestore [Status: 200]
Deleted title_fuzzy Featurestore [Status: 200]
Created release feature store under tmdb: [Status: 200]


## Is this thing on?

Before we dive into all the pieces, with a real training set, we'll try out two examples of models. One that always prefers newer movies. And another that always prefers older movies. If you're curious you can opet classic-training.txt and latest-training.txt after running this to see what the training set looks like.

In [5]:
from ltr import years_as_ratings
years_as_ratings.synthesize(client, 
                            featureSet='release',
                            classicTrainingSetOut='data/classic-training.txt',
                            latestTrainingSetOut='data/latest-training.txt')

Generating ratings for classic and latest model
Searching tmdb [Status: 200]
Done


### Train and Submit
Using the training data from the previous step, we'll use RankyMcRankFace to spit out two LambaMART models.  Once these files are generated, we can then submit them to elastic to be used in scoring.

In [6]:
from ltr import train
train(client=client, trainingInFile='data/latest-training.txt', featureSet='release', modelName='latest')
train(client=client, trainingInFile='data/classic-training.txt', featureSet='release', modelName='classic')

Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t DCG@10 -tree 10 -leaf 50 -train data/latest-training.txt -save data/latest_model.txt 
DONE
Submit Model latest Ftr Set release [Status: 200]
Feature Set release... [Status: 200]
Deleted Model latest [Status: 200]
Created Model latest [Status: 200]
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t DCG@10 -tree 10 -leaf 50 -train data/classic-training.txt -save data/classic_model.txt 
DONE
Submit Model classic Ftr Set release [Status: 200]
Feature Set release... [Status: 200]
Deleted Model classic [Status: 200]
Created Model classic [Status: 200]


<ltr.helpers.ranklib_result.TrainingLog at 0x1049e9940>

### Ben Affleck vs Adam West
If we search for `batman`, how do the results compare?  Since the `classic` model prefered old movies it has old movies in the top position, and the opposite is true for the `latest` model.  To continue learning LTR, brainstorm more features and generate some real judgments for real queries.

In [7]:
from ltr.release_date_plot import plot
plot(client)

Search keywords - title:(batman)^0 [Status: 200]
Search keywords - title:(batman)^0 [Status: 200]


### Model Query
http://localhost:8983/solr/tmdb/select?rq={!ltr%20model=latest}&q=batman&fl=title,release_year