## Elasticsearch: The Definitive Guide - Python

Following the examples in the book, here are Python snippets that achieve the same effect.

Documentation for the Python libs:

Low-level API:

https://elasticsearch-py.readthedocs.io/en/master/index.html

Expressive DSL API (more "Pythonic")

http://elasticsearch-dsl.readthedocs.io/en/latest/index.html

Github repo for DSL API:

https://github.com/elastic/elasticsearch-dsl-py


In [2]:
import sys, os
sys.path.insert(1, os.path.join(sys.path[0], '..'))

In [3]:
import index
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search, Q, Index
from pprint import pprint

es = Elasticsearch(
    'localhost',
    # sniff before doing anything
    sniff_on_start=True,
    # refresh nodes after a node fails to respond
    sniff_on_connection_fail=True,
    # and also every 60 seconds
    sniffer_timeout=60
)

r = index.load_sid_examples(settings={ "settings": { "number_of_shards": 1 }},set=3)
#print('{} items created'.format(len(r['items'])))

# Let's repopulate the index as we deleted 'gb' in earlier chapters:
# Run the script: populate.ipynb

### Phrase Matching

In the same way that the match query is the go-to query for standard full-text search, the match_phrase query is the one you should reach for when you want to find words that are near each other:
    



In [4]:
s = Index('my_index', using=es).search()
s = s.query(Q('match_phrase', title="quick brown fox"))
s.execute()

<Response: [<Hit(my_index/my_type/1): {'title': 'The quick brown fox'}>, <Hit(my_index/my_type/2): {'title': 'The quick brown fox jumps over the lazy dog'}>, <Hit(my_index/my_type/3): {'title': 'The quick brown fox jumps over the quick dog'}>]>

Like the `match` query, the `match_phrase` query first analyzes the query string to produce a list of terms. It then searches for all the terms, but keeps only documents that contain all of the search terms, in the same positions relative to each other. A query for the phrase quick fox would not match any of our documents, because no document contains the word quick immediately followed by fox.

Can also be written as a `match` query with type `phrase`:

In [8]:
s = Index('my_index', using=es).search()
s = s.query(Q('match', title={"query": "quick brown fox", "type":"phrase"}))
s.execute()


<Response: [<Hit(my_index/my_type/1): {'title': 'The quick brown fox'}>, <Hit(my_index/my_type/2): {'title': 'The quick brown fox jumps over the lazy dog'}>, <Hit(my_index/my_type/3): {'title': 'The quick brown fox jumps over the quick dog'}>]>

In [6]:
s = Index('my_index', using=es).search()
s = s.query(Q('prefix', postcode="W1"))
s.execute()

<Response: [<Hit(my_index/address/0): {'postcode': 'W1V 3DG'}>, <Hit(my_index/address/2): {'postcode': 'W1F 7HW'}>]>

#### Term Positions

When a string is analyzed, the analyzer returns not only a list of terms, but also the position, or order, of each term in the original string:


In [9]:
es.indices.analyze(index='my_index', analyzer='standard', text='Quick brown fox')

{'tokens': [{'end_offset': 5,
   'position': 0,
   'start_offset': 0,
   'token': 'quick',
   'type': '<ALPHANUM>'},
  {'end_offset': 11,
   'position': 1,
   'start_offset': 6,
   'token': 'brown',
   'type': '<ALPHANUM>'},
  {'end_offset': 15,
   'position': 2,
   'start_offset': 12,
   'token': 'fox',
   'type': '<ALPHANUM>'}]}

Positions can be stored in the inverted index, and position-aware queries like the match_phrase query can use them to match only documents that contain all the words in exactly the order specified, with no words in-between.

For a document to be considered a match for the phrase “quick brown fox”, the following must be true:

* quick, brown, and fox must all appear in the field.
* The position of brown must be 1 greater than the position of quick.
* The position of fox must be 2 greater than the position of quick.
* If any of these conditions is not met, the document is not considered a match.

#### Mixing It Up

Requiring exact-phrase matches may be too strict a constraint. Perhaps we do want documents that contain “quick brown fox” to be considered a match for the query “quick fox,” even though the positions aren’t exactly equivalent.


In [10]:
# "sloppy"
s = Index('my_index', using=es).search()
s = s.query(Q('match_phrase', title={"query": "quick  fox", "slop":1}))
s.execute()

<Response: [<Hit(my_index/my_type/1): {'title': 'The quick brown fox'}>, <Hit(my_index/my_type/2): {'title': 'The quick brown fox jumps over the lazy dog'}>, <Hit(my_index/my_type/3): {'title': 'The quick brown fox jumps over the quick dog'}>]>

The `slop` parameter tells the `match_phrase` query how far apart terms are allowed to be while still considering the document a match. By how far apart we mean how many times do you need to move a term in order to make the query and document match?

We’ll start with a simple example. To make the query quick fox match a document containing `quick brown fox` we need a `slop` of just 1:


|         | Pos 1 | Pos 2 | Pos 3 |
|---------|-------|-------|-------|
| Doc:    | quick | brown | fox   |
| Query:  | quick | fox   |       |
| Slop 1: | quick | ↳     |fox   |

Higher slop can move the words in any direction:

|          |   Pos 1 | |Pos 2       | | Pos 3   |
|-----------|--------|--|-----------|-|---------|
|Doc:       | quick  |  |brown      |  | fox    |
|Query:     | fox      | |    quick |  |        |
|Slop 1:    | fox/quick  |↵|       |   |      |
|Slop 2:    | quick     | ↳ | fox    |      | |
|Slop 3:    | quick      |   |       | ↳ |    fox|