# Elasticsearch TREC Run





## Getting Started

In this example, we will use the Elaticsearch Python API. First, we will import and set-up all of the required Python modules and variables we will use later on. Additionally, if you wish to use `curl` instead of the Python api, the complimentary command line function has been commented above each API request.

In [1]:
from elasticsearch import Elasticsearch
from irkit.trec import run
import pandas as pd
es = Elasticsearch(urls=['localhost'], port=9200)

## From Elasticsearch `hits` to TREC result file

We will use Elasticsearch to retrieve some documents and use the ir-kit `run` module to produce a properly formatted TREC run file.

In [2]:
query = \
{
    'query': {
        'match_all': {}
    }
}

# curl -X GET localhost:9200/goma/_search -H 'Content-Type: application/json' -d @query.json
res = es.search(index='goma', body=query)

hits = []
for rank, hit in enumerate(res['hits']['hits'], 1):
    hits.append(run.TrecEvalRun(rank=rank, doc_id=hit['_id'], q=0, score=hit['_score'],
                                run_id='example', topic=0))

print(run.TrecEvalRuns(hits).dumps())

0	Q0	AV19Sgi4jk6MoKTLfifp	1	1.0	example
0	Q0	AV19Sgi4jk6MoKTLfifq	2	1.0	example
0	Q0	AV19Sgi4jk6MoKTLfifu	3	1.0	example
0	Q0	AV19Sgi4jk6MoKTLfifv	4	1.0	example
0	Q0	AV19Sgi4jk6MoKTLfif5	5	1.0	example
0	Q0	AV19Sgi4jk6MoKTLfigP	6	1.0	example
0	Q0	AV19Sgi4jk6MoKTLfigQ	7	1.0	example
0	Q0	AV19Sgi4jk6MoKTLfigc	8	1.0	example
0	Q0	AV19Sgi4jk6MoKTLfige	9	1.0	example
0	Q0	AV19Sgi4jk6MoKTLfigo	10	1.0	example


More often than not, however, we have more than one topic (multiple queries).

In [3]:
keywords = ['art', 'gallery', 'australia']
query_template = \
{
    'query': {
        'match': {
            'description': ''
        }
    }
}
hits = []
for topic, keyword in enumerate(keywords):
    query_template['query']['match']['description'] = keyword
    
    res = es.search(index='goma', body=query)
    
    for rank, hit in enumerate(res['hits']['hits'], 1):
        hits.append(run.TrecEvalRun(rank=rank, doc_id=hit['_id'], q=0, score=hit['_score'],
                                    run_id='example', topic=topic))

print(run.TrecEvalRuns(hits).dumps())

0	Q0	AV19Sgi4jk6MoKTLfifp	1	1.0	example
0	Q0	AV19Sgi4jk6MoKTLfifq	2	1.0	example
0	Q0	AV19Sgi4jk6MoKTLfifu	3	1.0	example
0	Q0	AV19Sgi4jk6MoKTLfifv	4	1.0	example
0	Q0	AV19Sgi4jk6MoKTLfif5	5	1.0	example
0	Q0	AV19Sgi4jk6MoKTLfigP	6	1.0	example
0	Q0	AV19Sgi4jk6MoKTLfigQ	7	1.0	example
0	Q0	AV19Sgi4jk6MoKTLfigc	8	1.0	example
0	Q0	AV19Sgi4jk6MoKTLfige	9	1.0	example
0	Q0	AV19Sgi4jk6MoKTLfigo	10	1.0	example
1	Q0	AV19Sgi4jk6MoKTLfifp	1	1.0	example
1	Q0	AV19Sgi4jk6MoKTLfifq	2	1.0	example
1	Q0	AV19Sgi4jk6MoKTLfifu	3	1.0	example
1	Q0	AV19Sgi4jk6MoKTLfifv	4	1.0	example
1	Q0	AV19Sgi4jk6MoKTLfif5	5	1.0	example
1	Q0	AV19Sgi4jk6MoKTLfigP	6	1.0	example
1	Q0	AV19Sgi4jk6MoKTLfigQ	7	1.0	example
1	Q0	AV19Sgi4jk6MoKTLfigc	8	1.0	example
1	Q0	AV19Sgi4jk6MoKTLfige	9	1.0	example
1	Q0	AV19Sgi4jk6MoKTLfigo	10	1.0	example
2	Q0	AV19Sgi4jk6MoKTLfifp	1	1.0	example
2	Q0	AV19Sgi4jk6MoKTLfifq	2	1.0	example
2	Q0	AV19Sgi4jk6MoKTLfifu	3	1.0	example
2	Q0	AV19Sgi4jk6MoKTLfifv	4	1.0	example
2	Q0	AV19Sgi4jk6MoKTLfif5	5	1.0	exampl

#### Exercise 1

Create code to produce TREC formatted runs for the following queries:

* query 3: african art
* query 4: brisbane art
* query 5: 



#### Exercise 2

Create code to produce TREC formatted runs for the queries below. Run these queries on the ClueWeb12_sample collection (from activity 1). Change the maximum number of retrieved documents to 20.

* query 100:
* query 101:


(Note, in typical retrieval experiments, often the limit is set to 1,000; but this is not a rule: e.g. the last TREC Web tracks asked participants to retrieve up to rank 10,000).