# hello-ltr (Solr Edition)

Fire up a Solr server with the LTR plugin enabled and with the `tmdb` configset available.  See the Dockerfile under the solr folder for assistance.  These notebooks we'll use in this training have something of an ltr client library, and a starting point for demonstrating several important learning to rank capabilities.

This notebook will document many of the important pieces so you can reuse them in future training sessions

### Download needed dataset

In [None]:
from ltr import download

corpus='http://es-learn-to-rank.labs.o19s.com/tmdb.json'
download([corpus], dest='data/');

### Create the LTR Client

Instantiate a Solr client, so we talk to LtR with Solrese

In [None]:
from ltr.client.solr_client import SolrClient
client = SolrClient()

### Index Movies

In [None]:
from ltr.index import rebuild
from ltr.helpers.movies import indexable_movies

movies=indexable_movies(movies='data/tmdb.json')
rebuild(client, index='tmdb', doc_src=movies)

### Create FeatureStore
We'll discuss the feature store a bit more. You can think of them as a series of queries that will be stored and executed before we need to train a model.

setup is our function for preparing learning to rank to optimize search using a set of features. In this stock demo, we just have one feature, the year of the movie's release.


In [None]:
client.reset_ltr(index='tmdb')

config = [
  {
    "store": "release", # Note: This overrides the _DEFAULT_ feature store location
    "name" : "release_year",
    "class" : "org.apache.solr.ltr.feature.SolrFeature",
    "params" : {
      "q" : "{!func}def(release_year,2000)"
    }
  }
]


client.create_featureset(index='tmdb', name='release', ftr_config=config)

## Is this thing on?

Before we dive into all the pieces, with a real training set, we'll try out two examples of models. One that always prefers newer movies. And another that always prefers older movies. If you're curious you can opet classic-training.txt and latest-training.txt after running this to see what the training set looks like.

In [None]:
from ltr.judgments import judgments_from_file
from ltr import years_as_ratings
years_as_ratings.synthesize(client, 
                            featureSet='release',
                            classicTrainingSetOut='data/classic-training.txt',
                            latestTrainingSetOut='data/latest-training.txt')

# Load into training set 
classic_training_set = [j for j in judgments_from_file(open('data/classic-training.txt'))]
latest_training_set = [j for j in judgments_from_file(open('data/latest-training.txt'))]

classic_training_set

### Train and Submit
Using the training data from the previous step, we'll use RankyMcRankFace to spit out two LambaMART models.  Once these files are generated, we can then submit them to elastic to be used in scoring.

In [None]:
from ltr.ranklib import train
train(client=client, training_set=latest_training_set, 
      index='tmdb', featureSet='release', modelName='latest')
train(client=client, training_set=classic_training_set, 
      index='tmdb', featureSet='release', modelName='classic')

### Ben Affleck vs Adam West
If we search for `batman`, how do the results compare?  Since the `classic` model prefered old movies it has old movies in the top position, and the opposite is true for the `latest` model.  To continue learning LTR, brainstorm more features and generate some real judgments for real queries.

In [None]:
import ltr.release_date_plot as rdp
rdp.plot(client, 'why does no one care')

### Model Query
http://localhost:8983/solr/tmdb/select?rq={!ltr%20model=latest}&q=batman&fl=title,release_year

In [None]:
import pandas as pd
classic_results = rdp.search(client, 'batman', 'classic')
print('top results from classic model:')
pd.json_normalize(classic_results)[['id', 'title', 'release_year', 'score']].head(12)

In [None]:
latest_results = rdp.search(client, 'batman', 'latest')
print('top results from latest model:')
pd.json_normalize(latest_results)[['id', 'title', 'release_year', 'score']].head(12)