# Changing retrieval model parameters in Elasticsearch

This assignment uses the `aquaint` index (assuming you've already created that for Assignment 1). Alternatively, any other Elasticsearch index may be used.

In [14]:
from elasticsearch import Elasticsearch

In [15]:
INDEX_NAME = "aquaint"
DOC_TYPE = "doc"
FIELD = "content"

In [16]:
es = Elasticsearch()

In [17]:
query = "tropical storms"

### Small utility function for printing document rankings formatted

In [18]:
def printres(res):
    for r in res:
        print("%s %6.2f" % (r["_id"], r["_score"]))

### Run query with default parameters

In [42]:
res = es.search(index=INDEX_NAME, q=query, df=FIELD, _source=False, size=10).get("hits", {}).get("hits", {})

In [43]:
printres(res)

APW19990810.0014  21.26
APW19990810.0132  21.26
NYT19990802.0233  20.35
NYT19980601.0290  19.90
NYT19980601.0292  19.90
APW19990402.0247  19.82
NYT20000415.0089  19.81
APW20000522.0084  19.81
NYT19980917.0428  19.77
XIE19960926.0183  19.57


### Changing BM25 parameters

Change the default similarity function

In [47]:
SIM = {
    "similarity": {
        "default": { 
            "type": "BM25",
            "b": 0,
            "k1": 2
        }
    }
}

A custom similarity can be updated by closing the index, updating the index settings, and reopening the index.

In [48]:
es.indices.close(index=INDEX_NAME)
es.indices.put_settings(index=INDEX_NAME, body=SIM)
es.indices.open(index=INDEX_NAME)

{'acknowledged': True}

Then run the query the same way as before

In [49]:
res = es.search(index=INDEX_NAME, q=query, df=FIELD, _source=False, size=10).get("hits", {}).get("hits", {})

In [50]:
printres(res)

NYT19990802.0233  27.01
APW20000510.0148  26.71
APW20000510.0252  26.71
NYT19980601.0290  26.34
NYT19980601.0292  26.34
NYT19990929.0517  26.19
APW19990810.0014  26.11
APW19990810.0132  26.11
NYT19980917.0428  24.48
NYT19981231.0071  23.23


You can also retrieve the current similarity settings.

In [51]:
es.indices.get_settings(index=INDEX_NAME)

{'aquaint': {'settings': {'index': {'creation_date': '1504451814186',
    'number_of_replicas': '1',
    'number_of_shards': '1',
    'provided_name': 'aquaint',
    'similarity': {'custom_bm25': {'b': '1', 'k1': '2', 'type': 'BM25'},
     'default': {'b': '0', 'k1': '2', 'type': 'BM25'},
     'lm': {'lambda': '0.2', 'type': 'LMJelinekMercer'}},
    'uuid': '_HsQL_W8QrqrIoSXJKrn5w',
    'version': {'created': '5040299'}}}}}

### Changing back to default similarity

In [40]:
SIM = {
    "similarity": {
        "default": {
            "type": "BM25",
            "b": 0.75,
            "k1": 1.2
        }
    }
}

In [41]:
es.indices.close(index=INDEX_NAME)
es.indices.put_settings(index=INDEX_NAME, body=SIM)
es.indices.open(index=INDEX_NAME)

{'acknowledged': True}

### Using a different retrieval model

Similarly to above, you may also change the retrieval model that is used. Elasticsearch implements, among others, language modeling (LM) and divergence from randomness (DFR).

See: https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-similarity.html

**TODO** Use Language Modeling with Jelinek-Mercer smoothing and with lambda=0.2. Check the retrieval scores change for the top-10 documents.