# We're Gonna Need a Bigger Bot

Let's map all the bits and pieces to a meatier example, to see how hello-ltr's abstractions make it easier to play with LTR via ipynbs

Genome-tags is a crowdsourced movie tagging resource project. Each movie is assigned from 0-1 how close a movie matches a tag. Luckily the tags look remarkably like search queries (ie `star trek` or `berlin`). We derrived judgments from the genome-tags data, and use them to experiment with search.

BUT

While some tags are straight forward (`Star Trek`) others are much tougher (`boxing` or `french art movie`) where its unlikely any text has a match. 

We're not going to solve those problems here, but we give this to you as a sandbox to apply your skills after the class to see how close you can get to approximating the genome tags data


### Clients 

While syntaxes differ, the LTR process is nearly identical between Solr and Elastic. So you can repeat a lot of the labs with Solr (some have been already translated)

But we'll stick with Elastic

In [1]:
from ltr.client.elastic_client import ElasticClient
client = ElasticClient()

### Reindex if you need to

In [5]:
from ltr.index import rebuild_tmdb
rebuild_tmdb(client)

Deleted index tmdb [Status: 200]
Created index tmdb [Status: 200]
Reindexing...
Indexed 0 movies (last Black Mirror: White Christmas)
Indexed 100 movies (last Apocalypse Now)
Indexed 200 movies (last Crooks in Clover)
Indexed 300 movies (last For a Few Dollars More)
Indexed 400 movies (last Downfall)
Indexed 500 movies (last Finding Nemo)
Indexed 600 movies (last Platoon)
Indexed 700 movies (last Night of the Living Dead)
Indexed 800 movies (last Evangelion: 1.0: You Are (Not) Alone)
Indexed 900 movies (last Batman: Assault on Arkham)
Indexed 1000 movies (last Riley's First Date?)
Indexed 1100 movies (last The Raid)
Indexed 1200 movies (last Falling Down)
Indexed 1300 movies (last Kal Ho Naa Ho)
Indexed 1400 movies (last Elizabeth)
Indexed 1500 movies (last Irreversible)
Indexed 1600 movies (last Friday Night Lights)
Indexed 1700 movies (last Ben X)
Indexed 1800 movies (last Pump up the Volume)
Indexed 1900 movies (last Armour of God)
Indexed 2000 movies (last Swingers)
Indexed 2100 mo

Indexed 18400 movies (last Queen of the Mountains)
Indexed 18500 movies (last Urgh! A Music War)
Indexed 18600 movies (last Wuthering Heights)
Indexed 18700 movies (last Gabriel Over the White House)
Indexed 18800 movies (last Friendship!)
Indexed 18900 movies (last Mía)
Indexed 19000 movies (last Danger! 50,000 Zombies)
Indexed 19100 movies (last Top Dog)
Indexed 19200 movies (last Reaching for the Moon)
Indexed 19300 movies (last A Child's Christmas in Wales)
Indexed 19400 movies (last The Dog Who Stopped the War)
Indexed 19500 movies (last Police Python 357)
Indexed 19600 movies (last Accidents Happen)
Indexed 19700 movies (last Changing Times)
Indexed 19800 movies (last The Ape)
Indexed 19900 movies (last Heartbreak Hotel)
Indexed 20000 movies (last Left Behind III: World at War)
Indexed 20100 movies (last Dragon Ball Z: Lord Slug)
Indexed 20200 movies (last The Adventures of Sherlock Holmes)
Indexed 20300 movies (last Billy's Hollywood Screen Kiss)
Indexed 20400 movies (last Short

###  Feature Sets in ipynb

You played with creating a feature set in the last lab, see the same process repeated here.

Learning to rank requires creating feature set. Each feature has a name like `title_bm25` and as part of a list an ordinal `title_bm25` is the 0th item. Confusingly, Ranklib uses 1-based feature numbering, so feature 0 in this list corresponds to feature 1 in Ranklib training file, that we'll see soon.

Notice also:

- Each feature is a templated query with `{{keywords}}` parameter, that is passed at query time
- We've added a `validation` block, which will run these queries with the specified parameters and index and return any query errors

In [2]:
config = {
    "featureset": {
        "features": [
            {
                "name": "title_bm25",
                "params": ["keywords"],
                "template": {
                    "match": {
                        "title": "{{keywords}}"
                    } 
                }
            },
            {
                "name": "text_all_bm25",
                "params": ["keywords"],
                "template": {
                    "match": {
                        "text_all": "{{keywords}}"
                    } 
                }
            },
            {
                "name": "text_std_bm25",
                "params": ["keywords"],
                "template": {
                    "match": {
                        "text_std": "{{keywords}}"
                    } 
                }
            },
            {
                "name": "release_year",
                "params": [],
                "template": {
                    "function_score": {
                        "field_value_factor": {
                            "field": "release_year",
                            "missing": 2000
                        },
                        "query": { "match_all": {} }
                    }
                }
            }
        ]
    },
    "validation": {
              "index": "tmdb",
              "params": {
                  "keywords": "rambo"
              }
    
           }
}


from ltr import setup
setup(client, config=config, featureset='genome')

Removed Default LTR feature store [Status: 200]
Initialize Default LTR feature store [Status: 200]
Create genome feature set [Status: 201]


### Logging Queries

Logging is one of the more complex operations from an engineering perspective. 

The same query you ran manually when reviewing the slides is rerun here for every query in the source judgment list `judgmentInFile` with some batching when needed.

In [3]:
from ltr.log import judgments_to_training_set
trainingSet = judgments_to_training_set(client,
                                        judgmentInFile='data/genome_judgments.txt', 
                                        trainingOutFile='data/genome_judgments_train.txt', 
                                        featureSet='genome')

Recognizing 1128 queries...
Parsing QID 100
Parsing QID 200
Parsing QID 300
Parsing QID 400
Parsing QID 500
Parsing QID 600
Parsing QID 700
Parsing QID 800
Parsing QID 900
Parsing QID 1000
Parsing QID 1100
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for 007 (0/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for 007 (series) (1/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for 18th century (2/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA f

Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for alcatraz (40/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for alcoholism (41/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for alien (42/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for alien invasion (43/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for aliens (44/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {

Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for assassins (82/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for astronauts (83/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for atheism (84/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for atmospheric (85/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for australia (86/1128)
Searching tmdb - [{'terms'

Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for beer (123/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for berlin (124/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for best of 2005 (125/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for best war films (126/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for betrayal (127/1128)
Searching tmdb - [{'ter

Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for british (163/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for british comedy (164/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for broadway (165/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for brothers (166/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for brutal (167/1128)
Searching tmdb - [{'term

Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for chocolate (205/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for chris tucker (206/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for christian (207/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for christianity (208/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRA

Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for con men (247/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for confrontational (248/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for confusing (249/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for conspiracy (250/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAI

Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for dc comics (289/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for deadpan (290/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for death (291/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for death penalty (292/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for demons (293/1128)
Searching tmdb - [{'terms':

Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for dumb (331/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for dumb but funny (332/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for dynamic cgi action (333/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for dysfunctional family (334/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]


Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for family (373/1128)
Duplicate Doc in qid:375 105045
Duplicate Doc in qid:375 105045
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for family bonds (374/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for family drama (375/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for fantasy (376/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searchi

Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for fun movie (414/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for funniest movies (415/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for funny (416/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for funny as hell (417/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for future (418/1128)
Searching tmdb - [{

Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for goretastic (456/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for gory (457/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for goth (458/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for gothic (459/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for gra

Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for heroin (498/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for heroine (499/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for heroine in tight suit (500/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for high fantasy (501/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING

Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for india (540/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for indiana jones (541/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for indians (542/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for indie (543/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA f

Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for judaism (582/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for jungle (583/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for justice (584/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for kick-butt women (585/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for kidnapping (586/1128)
Searching tmdb - [{'te

Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for magic realism (623/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for male nudity (624/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for man versus machine (625/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for manipulation (626/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for marijuana (627/1128)
Sear

Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for moody (665/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for moon (666/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for moral ambiguity (667/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for morality (668/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for mother daughter relationship (669/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searchi

Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for no dialogue (707/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for no plot (708/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for nocturnal (709/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for noir (710/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for noir thriller (711/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'ter

Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for oscar (best directing) (749/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for oscar (best editing) (750/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for oscar (best effects - visual effects) (751/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for oscar (best foreign language film) (752/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 

Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for poignant (790/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for pointless (791/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for poker (792/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for poland (793/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for police (794/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id

Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for race issues (832/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for racing (833/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for racism (834/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for radio (835/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for

Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for sappy (874/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for sarcasm (875/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for satire (876/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for satirical (877/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for 

Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for shopping (914/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for short (915/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for short-term memory loss (916/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for silent (917/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for silly (918/1128)
Searching tmdb - [{'t

Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for spock (956/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for spoof (957/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for sports (958/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for spy (959/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for spying (

Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for survival (997/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for suspense (998/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for suspenseful (999/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for swashbuckler (1000/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAIN

Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for train (1037/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for trains (1038/1128)
Duplicate Doc in qid:1040 105045
Duplicate Doc in qid:1040 105045
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for transformation (1039/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for transgender (1040/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
RE

Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for vietnam war (1079/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for view askew (1080/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for vigilante (1081/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for vigilantism (1082/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING 

Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for world war i (1119/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for world war ii (1120/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for writer's life (1121/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for writers (1122/1128)
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
REBUILDING TRAINING DATA for writing (1123/1128)
Searching tm

### Training

Here's where we train the model, under the hood this executes Ranklib just as you ran during the training exercises.

Notice here we're optimizing for NDCG@10


In [4]:
from ltr.train import train
trainLog = train(client,
                 trainingInFile='data/genome_judgments_train.txt',
                 metric2t='NDCG@10',
                 featureSet='genome',
                 modelName='genome')

Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 10 -leaf 10 -train data/genome_judgments_train.txt -save data/genome_model.txt 
DONE
Delete model genome: 404
Created Model genome [Status: 201]


Now that training is done, we can output some statistics about the model, including the training metrics. In future units we'll get more into what this looks like.

Notice the training NDCG isn't that great. When originally run, it was only 0.5885. So pretty far off of the genome data. One challenge of Learning to Rank (and Relevance in general) is trying to figure out the features that can close the gap. 

In [5]:
print("Impact of each feature on the model")
for ftrId, impact in trainLog.impacts.items():
    print("{} - {}".format(ftrId, impact))
    
print("trainLog Metric %s" % trainLog.metric())

Impact of each feature on the model
2 - 4152515.485482863
1 - 854906.8367087343
4 - 1.7066532720419414
3 - 1.1561805533583989
trainLog Metric 0.5885


### Search with our model

Here we're going to search using the `genome` model. You can see the LTR query being output (sent to Elasticsearch). You're encouraged to run that directly against Elasticsearch if you like.

Please note, this isn't rescoring. And that's fine for our purposes of directly evaluating the model, in real life you really should run a rescore query.

In [7]:
from ltr import search
search(client, "world war", modelName='genome')

{"size": 5, "query": {"sltr": {"params": {"keywords": "world war"}, "model": "genome"}}}
Searching tmdb - {'size': 5, 'query': [Status: 200]
World War Three 
1.3450011 
1998 
['War', 'Documentary', 'History', 'Drama', 'TV Movie'] 
This mock documentary uses archival footage, interviews and reports taken out of context and staged interviews to highlight a possible escalation into a nuclear war. In this feature, tension in East Germany, and an uprising triggered by a visit by Gorbachev sees a successful military coup taking place in the USSR. Western actions against brutal crack-downs on civilians involved increases tension between the sides, finally resulting in nuclear war. 
---------------------------------------
The Fourth World War 
1.3450011 
2003 
['Documentary'] 
From the front-lines of conflicts in Mexico, Argentina, South Africa, Palestine, Korea, 'the North' from Seattle to Genova, and the 'War on Terror' in New York, Afghanistan, and Iraq. It is the story of men and women aro