# Simple ElasticSearch API

In this project the ElasticSearch index has been created previously. 

ElasticSearch index can be composed of multiple fields, each indexing different parts of the documents. Each field uses a specific analyser and a similarity (retrieval model).

In the current index, the existing fields are the body and the named entities.

In this project you will use a simplified API of the original ElasticSearch Python API:

https://elasticsearch-py.readthedocs.io/en/master/api.html


## Query the ElasticSearch API

To search ElasticSearch Index you can use the simple API provided and get the results in JSON format or in Pandas format.

In [1]:
import ElasticSearchSimpleAPI as es
import numpy as np

elastic = es.ESSimpleAPI()

In [6]:
result_json = elastic.search_json_results(query="What is a physician's assistant?")
result_json

{'took': 13,
 'timed_out': False,
 '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 10000, 'relation': 'gte'},
  'max_score': 7.9882913,
  'hits': [{'_index': 'msmarco',
    '_type': '_doc',
    '_id': 'MARCO_849267',
    '_score': 7.9882913,
    '_source': {'body': "Salary for Physician Assistants. Also known as: Anesthesiologist Assistant, Certified Physician's Assistant, Family Practice Physician Assistant, Orthopaedic Physician Assistant, Orthopedic Physician Assistant, Pediatric Physician Assistant, Radiology Practitioner Assistant, Surgical Physician Assistant."}}]}}

In [45]:
result = elastic.search_body(query="What are the educational requirements required to become one?", numDocs = 100)
result

Unnamed: 0,_index,_type,_id,_score,_source.body
0,msmarco,_doc,MARCO_6652628,7.655465,The minimum educational requirement to become ...
1,msmarco,_doc,MARCO_5244354,7.093407,The minimum educational requirement to become ...
2,msmarco,_doc,MARCO_7789519,6.982583,Video: Physician Assistant: Educational Requir...
3,msmarco,_doc,MARCO_22367,6.541923,Educational Requirements. Electricians usually...
4,msmarco,_doc,MARCO_22376,6.528393,Electricians require some formal education. Le...
...,...,...,...,...,...
95,msmarco,_doc,MARCO_5918188,5.340497,Microbiologist: Educational Requirements for a...
96,msmarco,_doc,MARCO_5459300,5.337555,Band Director: Educational and Training Requir...
97,msmarco,_doc,MARCO_917799,5.331678,Training Requirements. Educational requirement...
98,msmarco,_doc,MARCO_8069863,5.331678,Specific educational requirements to become a ...


## Search 

https://www.elastic.co/guide/en/elasticsearch/reference/current/getting-started-search.html

In [46]:
import TRECCASTeval as trec

test_bed = trec.ConvSearchEvaluation()

topic_turn_id = '1_2'
#NaN quando recebermos 0 resultados de resposta query: "what about in the US?"
#ou quando nao temos docs relevantes- classe TRECCASTeval temos de ver o numero de docs relevantes por query- 
#ha queries que nao tem nenhum, mas o calculo do recall divide pelo numero total de docs relevantes
#se uma query nao tem docs relevantes nem deviamos contar com ela- nao faz sentido avalia-la

[p10, recall, ndcg] = test_bed.eval(result[['_id','_score']], topic_turn_id)
print('P10=', p10, '  Recall=', recall, '  NDCG=',ndcg)

P10= 0.0   Recall= 0.0   NDCG= 0


## Query String Syntax

Text search supports multiple operators:

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-simple-query-string-query.html

In [25]:
query = "What is a physician's assistant?"

query_qsl = {"query": {"match": {"body": query}}}

result = elastic.search_QSL(query_qsl,10)

print(result)

    _index _type            _id    _score  \
0  msmarco  _doc   MARCO_849267  7.988291   
1  msmarco  _doc  MARCO_2331424  7.979860   
2  msmarco  _doc  MARCO_5780723  7.856357   
3  msmarco  _doc   MARCO_920443  7.837830   
4  msmarco  _doc  MARCO_4903530  7.829967   
5  msmarco  _doc   MARCO_955948  7.767573   
6  msmarco  _doc  MARCO_4016757  7.679798   
7  msmarco  _doc  MARCO_5692406  7.574573   
8  msmarco  _doc  MARCO_2331422  7.536899   
9  msmarco  _doc  MARCO_6193189  7.521804   

                                        _source.body  
0  Salary for Physician Assistants. Also known as...  
1  $54,000. Average Physician Assistant Physician...  
2  how to become a physician assistant, how long ...  
3  Salary for Anesthesiologist Assistants. Also k...  
4  Physician Assistant Salaries. Median annual ph...  
5  Physician assistants work under the supervisio...  
6  Physicians Assistant Salaries Throughout North...  
7  Professional Abbreviations. MD -- Medical Doct...  
8  As the

## Mappings and Fields

At indexing time, each field is indexed with a predefined retrieval model (similarity) and text parser (analyser). This becomes the default _similarity_ and _analyser_ for that field. This correspondence is called _mapping_ in ElasticSearch.

See the configuration details here:

https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-similarity.html

https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-put-mapping.html

You can check the index configuration as follows:

In [26]:
msmarco_mappings = elastic.client.indices.get_mapping(index = 'msmarco')
print(msmarco_mappings)
#todas estas funcoes podem ser definidas no schema do elastic search

{'msmarco': {'mappings': {'properties': {'body': {'type': 'text', 'similarity': 'lmd', 'analyzer': 'rebuilt_english'}}}}}
