# Elasticsearch TREC Run

_Using Elasticsearch and the Python api_

TRECEVAL is a program to evaluate TREC results using the standard, NIST evaluation procedures. The TREC file format is the standard way to represent a run in information retrieval.

## What you will need

 - Python 3
 - ir-kit (http://ir-kit.readthedocs.io/en/latest/)
 
This example uses an index based on media releases by a gallery, available at: https://data.qld.gov.au/dataset/qagoma-media-releases/resource/a1e4dffa-edb1-4e6d-a4a0-353aca79e9a3.

## Getting Started

In this example, we will use the Elaticsearch Python api. First, we will import and set-up all of the required Python modules and variables we will use later on. Additionally, if you wish to use curl instead of the Python api, the complimentary command line function has been commented above each api request.

In [1]:
from elasticsearch import Elasticsearch
from irkit.trec import run
import pandas as pd
es = Elasticsearch(urls=['localhost'], port=9200)

## TREC File Format

The TREC file format is described in much detail here: http://faculty.washington.edu/levow/courses/ling573_SPR2011/hw/trec_eval_desc.htm.

The results file has the format: query_id, iter, docno, rank, sim, run_id  delimited by spaces.  Query id is the query number (e.g. 136.6 or 1894, depending on the evaluation year).  The iter constant, 0, is required but ignored by trec_eval.  The Document numbers are string values like FR940104-0-00001 (found between <DOCNO> tags in the documents).  The Similarity (sim) is a float value.  Rank is an integer from 0 to 1000, which is required but ignored by the program.  Runid is a string which gets printed out with the output.  An example of a line from the results file:
 
> 351   0  FR940104-0-00001  1   42.38   run-name

## Elasticsearch `hits` to TREC file

We will use Elasticsearch to retrieve some documents and use the ir-kit `run` module to produce a properly formatted TREC run file.

In [2]:
query = \
{
    'query': {
        'match_all': {}
    }
}

# curl -X GET localhost:9200/goma/_search -H 'Content-Type: application/json' -d @query.json
res = es.search(index='goma', body=query)

hits = []
for rank, hit in enumerate(res['hits']['hits'], 1):
    hits.append(run.TrecEvalRun(rank=rank, doc_id=hit['_id'], q=0, score=hit['_score'],
                                run_id='example', topic=0))

print(run.TrecEvalRuns(hits).dumps())

0	Q0	AV19Sgi4jk6MoKTLfifp	1	1.0	example
0	Q0	AV19Sgi4jk6MoKTLfifq	2	1.0	example
0	Q0	AV19Sgi4jk6MoKTLfifu	3	1.0	example
0	Q0	AV19Sgi4jk6MoKTLfifv	4	1.0	example
0	Q0	AV19Sgi4jk6MoKTLfif5	5	1.0	example
0	Q0	AV19Sgi4jk6MoKTLfigP	6	1.0	example
0	Q0	AV19Sgi4jk6MoKTLfigQ	7	1.0	example
0	Q0	AV19Sgi4jk6MoKTLfigc	8	1.0	example
0	Q0	AV19Sgi4jk6MoKTLfige	9	1.0	example
0	Q0	AV19Sgi4jk6MoKTLfigo	10	1.0	example


More often than not, however, we have more than one topic (multiple queries).

In [3]:
keywords = ['art', 'gallery', 'australia']
query_template = \
{
    'query': {
        'match': {
            'description': ''
        }
    }
}
hits = []
for topic, keyword in enumerate(keywords):
    query_template['query']['match']['description'] = keyword
    
    res = es.search(index='goma', body=query)
    
    for rank, hit in enumerate(res['hits']['hits'], 1):
        hits.append(run.TrecEvalRun(rank=rank, doc_id=hit['_id'], q=0, score=hit['_score'],
                                    run_id='example', topic=topic))

print(run.TrecEvalRuns(hits).dumps())

0	Q0	AV19Sgi4jk6MoKTLfifp	1	1.0	example
0	Q0	AV19Sgi4jk6MoKTLfifq	2	1.0	example
0	Q0	AV19Sgi4jk6MoKTLfifu	3	1.0	example
0	Q0	AV19Sgi4jk6MoKTLfifv	4	1.0	example
0	Q0	AV19Sgi4jk6MoKTLfif5	5	1.0	example
0	Q0	AV19Sgi4jk6MoKTLfigP	6	1.0	example
0	Q0	AV19Sgi4jk6MoKTLfigQ	7	1.0	example
0	Q0	AV19Sgi4jk6MoKTLfigc	8	1.0	example
0	Q0	AV19Sgi4jk6MoKTLfige	9	1.0	example
0	Q0	AV19Sgi4jk6MoKTLfigo	10	1.0	example
1	Q0	AV19Sgi4jk6MoKTLfifp	1	1.0	example
1	Q0	AV19Sgi4jk6MoKTLfifq	2	1.0	example
1	Q0	AV19Sgi4jk6MoKTLfifu	3	1.0	example
1	Q0	AV19Sgi4jk6MoKTLfifv	4	1.0	example
1	Q0	AV19Sgi4jk6MoKTLfif5	5	1.0	example
1	Q0	AV19Sgi4jk6MoKTLfigP	6	1.0	example
1	Q0	AV19Sgi4jk6MoKTLfigQ	7	1.0	example
1	Q0	AV19Sgi4jk6MoKTLfigc	8	1.0	example
1	Q0	AV19Sgi4jk6MoKTLfige	9	1.0	example
1	Q0	AV19Sgi4jk6MoKTLfigo	10	1.0	example
2	Q0	AV19Sgi4jk6MoKTLfifp	1	1.0	example
2	Q0	AV19Sgi4jk6MoKTLfifq	2	1.0	example
2	Q0	AV19Sgi4jk6MoKTLfifu	3	1.0	example
2	Q0	AV19Sgi4jk6MoKTLfifv	4	1.0	example
2	Q0	AV19Sgi4jk6MoKTLfif5	5	1.0	exampl