# We're Gonna Need a Bigger Bot

Let's map all the bits and pieces to a meatier example, to see how hello-ltr's abstractions make it easier to play with LTR via ipynbs

Genome-tags is a crowdsourced movie tagging resource project. Each movie is assigned from 0-1 how close a movie matches a tag. Luckily the tags look remarkably like search queries (ie `star trek` or `berlin`). We derrived judgments from the genome-tags data, and use them to experiment with search.

BUT

While some tags are straight forward (`Star Trek`) others are much tougher (`boxing` or `french art movie`) where its unlikely any text has a match. 

We're not going to solve those problems here, but we give this to you as a sandbox to apply your skills after the class to see how close you can get to approximating the genome tags data. 


### Clients 

While syntaxes differ, the LTR process is nearly identical between Solr and Elastic. So you can repeat a lot of the labs with Solr (some have been already translated)

But we'll stick with Elastic

In [1]:
from ltr.client import ElasticClient
client = ElasticClient()

### Redownload corpus & judgments if you need to

In [2]:
from ltr import download
corpus='http://es-learn-to-rank.labs.o19s.com/tmdb.json'
judgments='http://es-learn-to-rank.labs.o19s.com/genome_judgments.txt'

download([corpus, judgments], dest='data/');

data/tmdb.json already exists
data/genome_judgments.txt already exists


### Reindex if you need to

In [None]:
from ltr.index import rebuild
from ltr.helpers.movies import indexable_movies

movies=indexable_movies(movies='data/tmdb.json')
rebuild(client, index='tmdb', doc_src=movies)

###  Feature Sets in ipynb

You played with creating a feature set in the last lab, see the same process repeated here.

Learning to rank requires creating feature set. Each feature has a name like `title_bm25` and as part of a list an ordinal `title_bm25` is the 0th item. Confusingly, Ranklib uses 1-based feature numbering, so feature 0 in this list corresponds to feature 1 in Ranklib training file, that we'll see soon.

Notice also:

- Each feature is a templated query with `{{keywords}}` parameter, that is passed at query time
- We've added a `validation` block, which will run these queries with the specified parameters and index and return any query errors

In [3]:
client.reset_ltr(index='tmdb')

config = {
    "featureset": {
        "features": [
            {
                "name": "title_bm25",
                "params": ["keywords"],
                "template": {
                    "match": {
                        "title": "{{keywords}}"
                    } 
                }
            },
            {
                "name": "overview_bm25",
                "params": ["keywords"],
                "template": {
                        "match": {
                            "overview": {
                                "query": "{{keywords}}",
                            }
                        }

                }
            }
        ]
    },
    "validation": {
              "index": "tmdb",
              "params": {
                  "keywords": "rambo"
              }
    
           }
}

client.create_featureset(index='tmdb', name='genome', ftr_config=config)

Removed Default LTR feature store [Status: 200]
Initialize Default LTR feature store [Status: 200]
Create genome feature set [Status: 201]


### Logging Queries

Logging is one of the more complex operations from an engineering perspective. 

The same query you ran manually when reviewing the slides is rerun here for every query in the source judgment list `judgmentInFile` with some batching when needed.

In [4]:
from ltr.log import FeatureLogger
from ltr.judgments import judgments_open
from itertools import groupby

ftr_logger=FeatureLogger(client, index='tmdb', feature_set='genome')
with judgments_open('data/genome_judgments.txt') as judgment_list:
    for qid, query_judgments in groupby(judgment_list, key=lambda j: j.qid):
        ftr_logger.log_for_qid(judgments=query_judgments, 
                               qid=qid,
                               keywords=judgment_list.keywords(qid))

Recognizing 1128 queries...
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 24549
Missing doc 12773
Missing doc 225130
Missing doc 61917
Discarded 4 Keep 1061
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 12773
Missing doc 67479
Missing doc 37106
Discarded 3 Keep 1065
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 164721
Missing doc 67479
Missing doc 13057
Missing doc 61917
Missing doc 37106
Discarded 5 Keep 1089
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 13716
Missing doc 253941
Missing doc 12773
Discarded

Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 67479
Missing doc 206216
Discarded 2 Keep 1033
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 206216
Missing doc 17882
Missing doc 61920
Missing doc 37106
Missing doc 64699
Discarded 5 Keep 1039
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 10700
Missing doc 133252
Discarded 2 Keep 1052
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 133252
Missing doc 24549
Missing doc 94174
Missing doc 61920
Missing doc 253768
Missing doc 206216
Discarded 6 Keep 1034
Searching tm

Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 110414
Missing doc 61917
Missing doc 15533
Discarded 3 Keep 1081
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 156078
Missing doc 17882
Discarded 2 Keep 1050
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 94174
Missing doc 15738
Discarded 2 Keep 1057
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 37106
Missing doc 17882
Discarded 2 Keep 1033
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'

Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Discarded 0 Keep 1069
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 133252
Missing doc 67479
Missing doc 13716
Missing doc 164721
Missing doc 253768
Discarded 5 Keep 1217
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 133252
Discarded 1 Keep 1077
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 164721
Missing doc 253768
Discarded 2 Keep 1191
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 67479
Missing doc 6191

Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 164721
Missing doc 253768
Discarded 2 Keep 1030
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 24549
Missing doc 13716
Missing doc 58423
Missing doc 10700
Discarded 4 Keep 1048
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 24549
Missing doc 12773
Missing doc 110639
Missing doc 133252
Missing doc 253768
Missing doc 15738
Discarded 6 Keep 1056
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 110414
Discarded 1 Keep 1067
Searching tmdb - [{'terms': {'_id': [ [Status: 2

Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 61919
Discarded 1 Keep 1044
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 225130
Missing doc 68149
Discarded 2 Keep 291
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 225130
Missing doc 15738
Missing doc 64699
Discarded 3 Keep 1018
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 37106
Missing doc 206216
Missing doc 133252
Discarded 3 Keep 1066
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Discarded 0 Keep 1056
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 20

Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 156078
Missing doc 10700
Missing doc 58423
Discarded 3 Keep 1049
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 61919
Missing doc 61920
Missing doc 58423
Discarded 3 Keep 1061
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 67479
Missing doc 68149
Missing doc 206216
Missing doc 24549
Discarded 4 Keep 1066
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 133252
Discarded 1 Keep 1041
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 68149
Missing doc 21177

Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Discarded 0 Keep 1075
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 61917
Missing doc 253768
Discarded 2 Keep 1053
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 94174
Missing doc 58423
Missing doc 61917
Discarded 3 Keep 1058
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 17882
Missing doc 61917
Discarded 2 Keep 1015
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 37106
Missing doc 12773
Discarded 2 Keep 909
Searching tmdb - [{'terms': {'_id': [ [Status: 200]


Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 61920
Missing doc 94174
Missing doc 12773
Missing doc 133252
Discarded 4 Keep 1125
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 110414
Discarded 1 Keep 1063
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 253768
Missing doc 156078
Discarded 2 Keep 505
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 211779
Missing doc 61920
Missing doc 37106
Missing doc 164721
Missing doc 17882
Discarded 5 Keep 1066
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 584

Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 61919
Missing doc 13716
Discarded 2 Keep 1050
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 225130
Missing doc 110639
Missing doc 253768
Missing doc 17882
Missing doc 64699
Discarded 5 Keep 931
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 15738
Missing doc 58423
Missing doc 110639
Discarded 3 Keep 1035
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 61920
Missing doc 37106
Missing doc 68149
Discarded 3 Keep 1040
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]

Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 64699
Discarded 1 Keep 1077
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Discarded 0 Keep 1150
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 68149
Missing doc 37106
Discarded 2 Keep 1014
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 67479
Missing doc 64699
Discarded 2 Keep 1041
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 67479
Discarded 1 Keep 1079
Se

Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 15738
Missing doc 253941
Missing doc 64699
Discarded 3 Keep 1057
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Discarded 0 Keep 916
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 64699
Missing doc 37106
Missing doc 110639
Missing doc 156078
Discarded 4 Keep 1056
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 110414
Discarded 1 Keep 1119
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 110639
Missing doc 253768

Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 13057
Discarded 1 Keep 1178
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 253768
Missing doc 68149
Missing doc 37106
Discarded 3 Keep 859
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 15738
Missing doc 68149
Missing doc 67479
Missing doc 253941
Missing doc 12773
Discarded 5 Keep 1019
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 253941
Missing doc 68149
Missing doc 15533
Discarded 3 Keep 1052
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 110639

Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 156078
Missing doc 24549
Discarded 2 Keep 965
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 110414
Discarded 1 Keep 1033
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 211779
Missing doc 164721
Missing doc 110639
Missing doc 24549
Missing doc 12773
Discarded 5 Keep 1037
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 211779
Missing doc 68149
Discarded 2 Keep 806
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 253768
Missing doc 94174
Missing doc 619

Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 206216
Missing doc 15533
Discarded 2 Keep 1028
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 253768
Missing doc 64699
Discarded 2 Keep 1026
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 94174
Missing doc 24549
Missing doc 164721
Discarded 3 Keep 992
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 13057
Discarded 1 Keep 1033
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 225130
Missing doc 13716
Missing doc 24549
Discarded 3 Keep 918
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb

Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 58423
Discarded 1 Keep 1096
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 253941
Missing doc 12773
Discarded 2 Keep 971
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 15533
Discarded 1 Keep 1017
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 133252
Missing doc 17882
Missing doc 110639
Missing doc 37106
Missing doc 58423
Discarded 5 Keep 1093
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 133252
M

Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 253941
Missing doc 110414
Missing doc 156078
Discarded 3 Keep 989
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 225130
Discarded 1 Keep 752
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 24549
Missing doc 110639
Missing doc 37106
Discarded 3 Keep 841
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 94174
Discarded 1 Keep 1051
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 15533
Missing doc 164721
Missing doc 211779
Missing doc 15738
Discarded 4 Keep 1089
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status:

Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 61920
Missing doc 211779
Missing doc 68149
Missing doc 17882
Discarded 4 Keep 1154
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 37106
Discarded 1 Keep 1262
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 64699
Discarded 1 Keep 974
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 67479
Missing doc 206216
Discarded 2 Keep 1037
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 133252
Missing doc 156078
Discarded 2 Keep 1037
Searching tmdb - [{'terms': {'_

Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 156078
Missing doc 64699
Missing doc 94174
Missing doc 12773
Discarded 4 Keep 1036
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Discarded 0 Keep 1043
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 15738
Missing doc 61917
Missing doc 64699
Missing doc 12773
Discarded 4 Keep 1003
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Discarded 0 Keep 1051
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 68149
Discarded 1 Keep 1048

Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 15738
Missing doc 64699
Discarded 2 Keep 1039
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 225130
Missing doc 68149
Missing doc 156078
Missing doc 61920
Missing doc 61919
Missing doc 133252
Discarded 6 Keep 1019
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 156078
Missing doc 15533
Discarded 2 Keep 1033
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 61917
Discarded 1 Keep 1017
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 37106
Discarded 1 Keep

Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 58423
Discarded 1 Keep 1069
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Discarded 0 Keep 1019
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 110639
Missing doc 58423
Missing doc 24549
Discarded 3 Keep 1053
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 13057
Missing doc 17882
Missing doc 10700
Discarded 3 Keep 1036
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 133252
Missing doc 15533
Missing doc 211779
M

Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 13057
Missing doc 61919
Discarded 2 Keep 1021
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 13716
Discarded 1 Keep 1049
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 67479
Discarded 1 Keep 923
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 61919
Missing doc 67479
Missing doc 61917
Discarded 3 Keep 1038
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 211779
Missing doc 110414
Discarded 2 Keep 1068
Searching tmdb - [{'terms': {'_id': [ [Status: 200]

Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 110639
Discarded 1 Keep 1046
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 37106
Missing doc 67479
Missing doc 253768
Missing doc 13716
Missing doc 206216
Missing doc 61919
Missing doc 12773
Discarded 7 Keep 1072
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 61919
Missing doc 61920
Missing doc 67479
Missing doc 61917
Discarded 4 Keep 1018
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 156078
Missing doc 64699
Missing doc 61920
Missing doc 61919
Discarded 4 Keep 1011
Searching tmdb - [{'terms': {'_id': [ [Status: 2

Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 164721
Missing doc 206216
Discarded 2 Keep 1198
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 164721
Missing doc 133252
Discarded 2 Keep 1300
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 68149
Missing doc 24549
Missing doc 110639
Missing doc 61917
Missing doc 110414
Discarded 5 Keep 1301
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 17882
Discarded 1 Keep 1079
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200

Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 58423
Missing doc 17882
Discarded 2 Keep 1014
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 253941
Discarded 1 Keep 1048
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 110639
Missing doc 13716
Missing doc 15738
Missing doc 24549
Discarded 4 Keep 1189
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 12773
Discarded 1 Keep 1224
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ 

Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 17882
Missing doc 61917
Missing doc 67479
Missing doc 156078
Discarded 4 Keep 1045
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 10700
Missing doc 68149
Discarded 2 Keep 1005
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Discarded 0 Keep 988
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 225130
Missing doc 64699
Missing doc 156078
Missing doc 24549
Discarded 4 Keep 1036
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 17882
Discarded 1 Keep 1049
Searching tmdb

Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 15738
Missing doc 37106
Missing doc 225130
Missing doc 94174
Discarded 4 Keep 1117
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 253941
Missing doc 13057
Missing doc 133252
Missing doc 110639
Missing doc 13716
Discarded 5 Keep 1052
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 61919
Missing doc 156078
Missing doc 13716
Missing doc 24549
Missing doc 133252
Discarded 5 Keep 1022
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 13057
Missing doc 94174
Missing doc 1573

Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 58423
Missing doc 68149
Missing doc 24549
Missing doc 10700
Missing doc 64699
Discarded 5 Keep 1145
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 211779
Missing doc 94174
Missing doc 253768
Missing doc 64699
Missing doc 110414
Missing doc 24549
Discarded 6 Keep 1094
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 253768
Missing doc 225130
Missing doc 110639
Missing doc 206216
Missing doc 24549
Discarded 5 Keep 1042
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 61917
Missing doc 253941
Missing doc 133252
Missing doc

Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 253941
Discarded 1 Keep 1032
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 58423
Discarded 1 Keep 1058
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 133252
Missing doc 156078
Discarded 2 Keep 1073
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 24549
Missing doc 13716
Discarded 2 Keep 1012
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 110639
Missing doc 61920
Missing doc 67479
Missing doc 13716


Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 253768
Missing doc 12773
Missing doc 206216
Missing doc 17882
Missing doc 211779
Missing doc 110414
Discarded 6 Keep 1102
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 15738
Missing doc 61917
Missing doc 13057
Discarded 3 Keep 1074
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 253768
Missing doc 61920
Missing doc 37106
Missing doc 15533
Discarded 4 Keep 1275
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 253768
Missing doc 10700
Discarded 2 Keep 1228
Searching tm

Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 67479
Missing doc 15738
Discarded 2 Keep 1089
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 13057
Missing doc 164721
Missing doc 61919
Missing doc 61920
Missing doc 68149
Missing doc 17882
Missing doc 156078
Discarded 7 Keep 1036
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 64699
Missing doc 12773
Discarded 2 Keep 1212
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Missing doc 15533
Missing doc 253941
Discarded 2 Keep 1064
Searching tmdb - [{'terms': {'_id': [ [Status: 200]
Searching tmdb - [{'terms': {'_id': [ [Status: 200]

### Training

Here's where we train the model, under the hood this executes Ranklib just as you ran during the training exercises.

Notice here we're optimizing for NDCG@10


In [5]:
from ltr.ranklib import train
trainResponse = train(client,
                 training_set=ftr_logger.logged,
                 metric2t='NDCG@10',
                 featureSet='genome',
                 index='tmdb',
                 modelName='genome')

/var/folders/vc/thmh159x5xddb6_cgtx778sc0000gn/T/RankyMcRankFace.jar already exists
Running java -jar /var/folders/vc/thmh159x5xddb6_cgtx778sc0000gn/T/RankyMcRankFace.jar -ranker 6 -shrinkage 0.1 -metric2t NDCG@10 -tree 50 -bag 1 -leaf 10 -frate 1.0 -srate 1.0 -train /var/folders/vc/thmh159x5xddb6_cgtx778sc0000gn/T/training.txt -save data/genome_model.txt 
DONE
Delete model genome: 404
Created Model genome [Status: 201]
Model saved


Now that training is done, we can output some statistics about the model, including the training metrics. In future units we'll get more into what this looks like.

Notice the training NDCG isn't that great. When originally run, it was only 0.5885. So pretty far off of the genome data. One challenge of Learning to Rank (and Relevance in general) is trying to figure out the features that can close the gap. 

In [6]:
print("Impact of each feature on the model")
trainLog = trainResponse.trainingLogs[0]
for ftrId, impact in trainLog.impacts.items():
    print("{} - {}".format(ftrId, impact))
    
print("trainLog Metric %s" % trainLog.metric())

Impact of each feature on the model
1 - 637582.3838513177
2 - 558539.450163617
trainLog Metric 0.6801


### Search with our model

Here we're going to search using the `genome` model. You can see the LTR query being output (sent to Elasticsearch). You're encouraged to run that directly against Elasticsearch if you like.

Please note, this isn't rescoring. And that's fine for our purposes of directly evaluating the model, in real life you really should run a rescore query.

In [7]:
from ltr import search
search(client, "batman", modelName='genome')

{"size": 5, "query": {"sltr": {"params": {"keywords": "batman"}, "model": "genome"}}}
Searching tmdb - {'size': 5, 'query': [Status: 200]
Batman: Under the Red Hood 
4.059837 
2010 
['Action', 'Animation'] 
Batman faces his ultimate challenge as the mysterious Red Hood takes Gotham City by firestorm. One part vigilante, one part criminal kingpin, Red Hood begins cleaning up Gotham with the efficiency of Batman, but without following the same ethical code. 
---------------------------------------
Batman: The Dark Knight Returns, Part 2 
4.0364103 
2013 
['Animation', 'Action'] 
Batman has stopped the reign of terror that The Mutants had cast upon his city.  Now an old foe wants a reunion and the government wants The Man of Steel to put a stop to Batman. 
---------------------------------------
Batman Returns 
4.0364103 
1992 
['Action', 'Crime', 'Fantasy', 'Science Fiction', 'Thriller'] 
Having defeated the Joker, Batman now faces the Penguin - a warped and deformed individual who is in