----
Elasticsearch Exercises
----

![](https://cdn.meme.am/instances/500x/53420967.jpg)

----
Overview
-----
- Install and use Java (without getting sued by Oracle 😉)
- Install Elasticsearch
- Start Elasticserach daemon
- Submit data to Elasticsearch
- Query data from Elasticsearch
- Profit 💰💰💰

Let's work through http://www.elasticsearchtutorial.com/elasticsearch-in-5-minutes.html.

----
Install Java >= 6.0
----

Check for which version of java you have:

In [1]:
! java -version

java version "1.8.0_72"
Java(TM) SE Runtime Environment (build 1.8.0_72-b15)
Java HotSpot(TM) 64-Bit Server VM (build 25.72-b15, mixed mode)


You should see something like this:

```shell
java version "1.8.0_92"
Java(TM) SE Runtime Environment (build 1.8.0_92-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.92-b14, mixed mode)
```

If you don't have java or an older than 6.0, then install java. As always, I suggest brew cask install:

In [2]:
! brew cask install java



---
Install and start running ElasticSearch
----

Run the following commands in the terminal

```bash
wget https://download.elastic.co/elasticsearch/elasticsearch/elasticsearch-1.7.2.tar.gz
tar -zxvf elasticsearch-1.7.2.tar.gz
cd elasticsearch-1.7.2
bin/elasticsearch
```

### Check for understanding

<details><summary>
What does `tar -zxvf elasticsearch-1.7.2.tar.gz` do?
</summary>
It "unzipped" / extracted the file
</details>

There should be a ElasticSearch daemon running. It will similar to starting a Jupyter Notebook daemon.

----
Double check it is running
----

In [19]:
import requests
from pprint import pprint

In [20]:
r = requests.get("http://localhost:9200")

In [21]:
pprint(r.json())

{u'cluster_name': u'elasticsearch',
 u'name': u'Lady Octopus',
 u'status': 200,
 u'tagline': u'You Know, for Search',
 u'version': {u'build_hash': u'e43676b1385b8125d647f593f7202acbd816e8ec',
              u'build_snapshot': False,
              u'build_timestamp': u'2015-09-14T09:49:53Z',
              u'lucene_version': u'4.10.4',
              u'number': u'1.7.2'}}


You should see something like this:

```
{'cluster_name': 'elasticsearch',
 'name': 'Goliath',
 'status': 200,
 'tagline': 'You Know, for Search',
 'version': {'build_hash': 'e43676b1385b8125d647f593f7202acbd816e8ec',
             'build_snapshot': False,
             'build_timestamp': '2015-09-14T09:49:53Z',
             'lucene_version': '4.10.4',
             'number': '1.7.2'}}
```

----
Use Elasticsearch awful API
----

![](http://image.slidesharecdn.com/elasticsearch-150225053202-conversion-gate02/95/introduction-to-elasticsearch-39-638.jpg?cb=1424842484)

---
Put data
---

In [6]:
! curl -XPUT 'http://localhost:9200/blog/user/dilbert' -d '{ "name" : "Dilbert Brown" }'

{"_index":"blog","_type":"user","_id":"dilbert","_version":1,"created":true}

### Check for understanding

<details><summary>
Which part of the returned json let's us know that it worked?
</summary>
... "created":true ...
</details>

In [7]:
! curl -XPUT 'http://localhost:9200/blog/post/1' -d '{ "user": "dilbert",  "postDate": "2011-12-15",  "body": "Search is hard. Search should be easy." , "title": "On search"}'

{"_index":"blog","_type":"post","_id":"1","_version":1,"created":true}

In [8]:
! curl -XPUT 'http://localhost:9200/blog/post/2' -d '{  "user": "dilbert",  "postDate": "2011-12-12",  "body": "Distribution is hard. Distribution should be easy." , "title": "On distributed search"}'

{"_index":"blog","_type":"post","_id":"2","_version":1,"created":true}

In [9]:
! curl -XPUT 'http://localhost:9200/blog/post/3' -d '{ "user": "dilbert",  "postDate": "2011-12-10",  "body": "Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat" , "title": "Lorem ipsum" }'

{"_index":"blog","_type":"post","_id":"3","_version":1,"created":true}

---
Get data
---

In [49]:
! curl -XGET 'http://localhost:9200/blog/user/dilbert?pretty=true'

{
  "_index" : "blog",
  "_type" : "user",
  "_id" : "dilbert",
  "_version" : 1,
  "found" : true,
  "_source":{ "name" : "Dilbert Brown" }
}


In [51]:
! curl -XGET 'http://localhost:9200/blog/post/1?pretty=true'

{
  "_index" : "blog",
  "_type" : "post",
  "_id" : "1",
  "_version" : 1,
  "found" : true,
  "_source":{ "user": "dilbert",  "postDate": "2011-12-15",  "body": "Search is hard. Search should be easy." , "title": "On search"}
}


---
Query data
---

Find all blog posts by Dilbert:

In [10]:
! curl 'http://localhost:9200/blog/post/_search?q=user:dilbert&pretty=true'

{
  "took" : 189,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "blog",
      "_type" : "post",
      "_id" : "1",
      "_score" : 1.0,
      "_source":{ "user": "dilbert",  "postDate": "2011-12-15",  "body": "Search is hard. Search should be easy." , "title": "On search"}
    }, {
      "_index" : "blog",
      "_type" : "post",
      "_id" : "2",
      "_score" : 0.30685282,
      "_source":{  "user": "dilbert",  "postDate": "2011-12-12",  "body": "Distribution is hard. Distribution should be easy." , "title": "On distributed search"}
    }, {
      "_index" : "blog",
      "_type" : "post",
      "_id" : "3",
      "_score" : 0.30685282,
      "_source":{ "user": "dilbert",  "postDate": "2011-12-10",  "body": "Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoree

In [66]:
# TODO: Refactor the previous command line command 
# - Use only Python's requests library to make the call and handle the data
# - Make the search term a variable
# - Try other search terms

In [18]:
# Note: Sometimes there will be a successful query (code = 200) but the payload will be empty or contain invalid data

# TODO: Write unit tests for the requests returned data object to check if it successful queried ElasticSearch 
# - Use `assert` statements
# - Document your tests with comments
# - Test the metadata, the specific results or number of result will often change and break your test

All posts which don't contain the term search:

In [63]:
! curl 'http://localhost:9200/blog/post/_search?q=+title:search%20-title:distributed&pretty=true&fields=title'

{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.625,
    "hits" : [ {
      "_index" : "blog",
      "_type" : "post",
      "_id" : "1",
      "_score" : 0.625,
      "fields" : {
        "title" : [ "On search" ]
      }
    } ]
  }
}


A range search on postDate:

In [65]:
%%bash
curl -XGET 'http://localhost:9200/blog/_search?pretty=true' -d '
{ 
    "query" : { 
        "range" : { 
            "postDate" : { "from" : "2011-12-10", "to" : "2011-12-12" } 
        } 
    } 
}'

{
  "took" : 7,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "blog",
      "_type" : "post",
      "_id" : "2",
      "_score" : 1.0,
      "_source":{  "user": "dilbert",  "postDate": "2011-12-12",  "body": "Distribution is hard. Distribution should be easy." , "title": "On distributed search"}
    }, {
      "_index" : "blog",
      "_type" : "post",
      "_id" : "3",
      "_score" : 1.0,
      "_source":{ "user": "dilbert",  "postDate": "2011-12-10",  "body": "Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat" , "title": "Lorem ipsum" }
    } ]
  }
}


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0100   932  100   798  100   134  78082  13111 --:--:-- --:--:-- --:--:-- 99750


----

If you choose to use ElasticSearch, you probably won't be making raw http requests. Check out a [Python library](https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/index.html)

<br>
<br> 
<br>

----