<u><center><h1>Elasticsearch</h1></center></u>
---
---

<img src="images/elastic_logo.png" width=60%>

[Elasticsearch](https://www.elastic.co/) is open-source full-text search and analytics engine based on [Lucene](https://en.wikipedia.org/wiki/Apache_Lucene). It was developed by Shay Banon and published in 2010. 

From the moment of its appearance on the market, the popularity and share of use are growing rapidly and today Elasticsearch is the leader among other search tools. Elasticsearch can be used for many purposes, particularly, financial services, government, healthcare, manufacturing, media & entertainment, software & technology, professional services, travel & transportation, telecommunications, retail. Many famous companies are alredy successfully use Elasticsearch, for example: DELL, eBay, Symantec, Netflix, Facebook, Cisco, Microsoft, Mozilla, Adobe, IBM, Docker, GitHub, SoundCloud and many-many other huge companies (more info you can find [here](https://www.elastic.co/use-cases)). 

The below graph shows the difference between [Apache Solr](http://lucene.apache.org/solr/) (this is another popular seach tool) and Elasticsearch by search interest.
<img src="images/solr_vs_elasticsearch.jpg" width=80%>

With clear Elasticsearch you can install several packages:
* [Kibana](https://www.elastic.co/downloads/kibana) - Kibana enables visual exploration and real-time analysis of your data in Elasticsearch. Kibana helps to understand large volumes of data.
* [Filebeat](https://www.elastic.co/downloads/beats/filebeat) - is a log data shipper. Installed as an agent on your servers.
* [Packetbeat](https://www.elastic.co/downloads/beats/packetbeat) - is a real-time network packet analyzer.
* [Winlogbeat](https://www.elastic.co/downloads/beats/winlogbeat) - can capture event data from any event logs running on Windows system.
* [Metricbeat](https://www.elastic.co/downloads/beats/metricbeat) - collect metrics from the operating system and from services running on the server. 
* [Heartbeat](https://www.elastic.co/downloads/beats/heartbeat) - tells you whether your services are reachable.
* [Topbeat](https://www.elastic.co/downloads/beats/topbeat) - helps you monitor your servers by collecting metrics, for example: system-wide, per-process and file system statistics.
* [Logstash](https://www.elastic.co/downloads/logstash) - need to ingest, transform, enrich, and output data.
* [ES-Hadoop](https://www.elastic.co/downloads/hadoop) - allow you to use Elasticsearch from Hadoop environment.
* [X-Pack](https://www.elastic.co/downloads/x-pack) - security, alerting, monitoring, reporting, and Graph in one pack. It comes with an interactive console called Sense, which makes it easy to talk to Elasticsearch directly from your browser.

Before as you will know the basic concepts and how Elasticsearch works you need to know advantages and disadvantages.

#### Advantages:
* Elasticsearch is developed on Java, which makes it compatible on almost every platform.
* Elasticsearch is real time, in other words after one second the added document is searchable in this engine.
* Elasticsearch is distributed, which makes it easy to scale and integrate in any big organization.
* Creating full backups are easy by using the concept of gateway, which is present in Elasticsearch.
* Handling multi-tenancy is very easy in Elasticsearch when compared to Apache Solr.
* Elasticsearch uses JSON objects as responses, which makes it possible to invoke the Elasticsearch server with a large number of different programming languages.
* Elasticsearch supports almost every document type except those that do not support text rendering.

#### Disadvantages:
* Elasticsearch does not have multi-language support in terms of handling request and response data (only possible in JSON) unlike in Apache Solr, where it is possible in CSV, XML and JSON formats.
* Elasticsearch also have a problem of Split brain situations, but in rare cases.
* You can't write queries in SQL.
* The distributed nature of Elasticsearch can have negative effects on data consistency.
* Clear Elasticsearch does not have any built-in authentication or authorization system.

#### How does it works?

Using a restful API, Elasticsearch saves data and indexes it automatically. It assigns types to fields and that way a search can be done smartly and quickly using filters and different queries.

It’s uses JVM in order to be as fast as possible. It distributes indexes in “shards” of data. It replicates shards in different nodes, so it’s distributed and clusters can function even if not all nodes are operational. Adding nodes is super easy and that’s what makes it so scalable.

ES uses Lucene to solve searches. This is quite an advantage with comparing with, for example, Django query strings. A restful API call allows us to perform searches using json objects as parameters, making it much more flexible and giving each search parameter within the object a different weight, importance and or priority.

The final result ranks objects that comply with the search query requirements. You could even use synonyms, autocompletes, spell suggestions and correct typos. While the usual query strings provides results that follow certain logic rules, ES queries give you a ranked list of results that may fall in different criteria and its order depend on how they comply with a certain rule or filter.

ES can also provide answers for data analysis, like averages, how many unique terms and or statistics. This could be done using aggregations. To dig a little deeper in this feature check the documentation [here](https://www.elastic.co/guide/en/elasticsearch/reference/5.1/_basic_concepts.html).

#### Basic concepts of Elasticsearch:
* Cluster - it is a collection of one or more nodes (servers) that together holds your entire data and provides federated indexing and search capabilities across all nodes.
* Node - it is part of your cluster, stores your data, and participates in the cluster’s indexing and search capabilities.
* Index - it is a collection of different type of documents and document properties. In a single cluster, you can define as many indexes as you want. It is an analoques of a database.
* Type - it is a collection of documents sharing a set of common fields present in the same index. Within an index, you can define one or more types. It is like a table in a database.
* Document - it is a collection of fields that can be indexed and described in JSON format. It is similar to a row in a table.
* Mapping - is the process of defining how a document and its fields are stored and indexed. This is analoques of a schema.
* Shards - it is a fully-functional and independent "index" that can be hosted on any node in the cluster.
* Replicas - it is a copies of index shards.

#### Why Elasticsearch is fast? 

Your data in **`type`** will be split by default on 5 shards and when the request come in Elasticsearch coordinating node send the search request to all shards. Each shards return document IDs of 10 best matches. Then coordinating node find matches in returns of shards and return 10 best matches.

## Installing and running Elasticsearch

1. [Download](https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.2.2.zip) Elasticsearch and unzip files.

2. [Download](https://artifacts.elastic.co/downloads/logstash/logstash-5.2.2.zip) Logstash and unzip files. Logstash need to ingest, transform, enrich, and output data. 

3. Install [python elasticsearch client](http://elasticsearch-py.readthedocs.io/en/master/index.html). This is a specific Python library for using Elasticsearch. Open Tetminal or Command Prompt and execute the following command

    `pip install elasticsearch`

Another way to work with Elasticsearch in Jupyter is the usage of the extension [ipython-elasticsearch](https://pypi.python.org/pypi/ipython-elasticsearch). Examples of using it you can see, for example, [here](https://github.com/graphaelli/ipython-elasticsearch/blob/master/Learn%20Elasticsearch%20with%20Jupyter.ipynb).

Elasticsearch is now ready to run. You can start it up in the foreground with this, run this command in the Elasticsearch directory:

    ./bin/elasticsearch
    
At first, let's import the **`python elasticsearch client`** 

In [None]:
from elasticsearch import Elasticsearch
es = Elasticsearch()

Simple way to check if Elasticsearch is working is the following:

In [None]:
es.info()

## Basic CRUD, search, filtering, aggregation and mapping operations

Now when Elasticsearch is running you need to insert data. 

We will use [these data](http://finance.yahoo.com/quote/AAPL/history?ltr=1) for learning Elasticsearch and trying all its features. Select "Hostorical Data" tab (1) and set "Time Period" from `03/01/2012` to `12/30/2016` (2), after that click on and "Download Data" button (3) (save image below) and save the file in CSV format anywhere (preferably to the folder where the current notebook lies). Let's call the file `"table.csv"`. 

<img src="images/download.jpg" width="80%">

To use data from a CSV file you need to create **`logstash.conf`** file. Let's create it in **`/logstash-5.2.2/`**, but you can create it anywhere and then add path to them. Paste to the **`logstash.conf`** file the following code:
<code> 
input {  
  file {
    path => "/path/to/table.csv"
    start_position => "beginning"    
  }
}

filter {  
  csv {
      separator => ","
      columns => ["Date","Open","High","Low","Close","Volume","Adj Close"]
  }
  mutate {convert => {"Date" => "string"}}
  mutate {convert => {"Open" => "float"}}
  mutate {convert => {"High" => "float"}}
  mutate {convert => {"Low" => "float"}}
  mutate {convert => {"Close" => "float"}}
  mutate {convert => {"Volume" => "float"}}
  mutate {convert => {"Adj Close" => "float"}}
}

output {  
    elasticsearch {
        action => "index"
        hosts => ["127.0.0.1:9200"]
        index => "finance_history"
        workers => 1
    }
    stdout {}
}</code>

**Note: Change `"/path/to/table.csv"` to the correct folder where you have saved the `"table.csv"` file.**

> * In the input section we are telling Logstash to take the csv file as a datasource and start reading data at the beginning of the file.

> * The filter section is used to tell Logstash in which data format our dataset is present (in this case csv). We give the names of the columns we want to keep in the output. Then converting all the fields containing numbers to float and `Date` to `string`.

> * The output section is used to stream the input data to Elasticsearch. We specify the name of the index and hosts.

You can read more about [input](https://www.elastic.co/guide/en/logstash/current/input-plugins.html), [filter](https://www.elastic.co/guide/en/logstash/current/filter-plugins.html) and [output](https://www.elastic.co/guide/en/logstash/current/output-plugins.html)

Now you ready to load data in Elasticsearch. In the `Logstash` folder run this command:

    ./bin/logstash -f /path/to/logstash.conf

When data is loaded you can start working with it. Let's find the imported data for `"table.csv"` file. It can be done with the help of **`search`** method.

In [None]:
es.search(index='finance_history', size=1)

It was returned the record which was saved to the Elasticsearch the first. This records may not coincide with the first record in the CSV file.

We set **`index`** parameter to **`finance_history`** because this index we have set in the **`logstash.conf`** file.  The **`size`** parameter shows how many output results you will get, by default it is equal to 10.

Now let's create and insert some additional data. From the previous output you can see that document have **`_type`** field. That means that all data are in **`logs`**. To add new data you need to use **`create`** method.

In [None]:
data = {
    "Date": "2012-01-02",
    "Open": 408.24,
    "High": 410.36,
    "Low": 407.64,
    "Close": 409.74,
    "Volume": 67817400,
    "Adj Close": 52.74
}

es.create(index='finance_history', doc_type='logs', id=1, body=data)

Where:
* `doc_type` is the type of the document
* `id` is document id
* `body` is the document

Now let's check whether just added data are in the index. For this you need to use **`get`** method

In [None]:
es.get(index='finance_history', doc_type='logs', id=1)

We can also update any field of any document using the **`update`** method. Let's update the value of the `"High"` field of the last added document

In [None]:
data = {
  "doc": { "High": 411.12 }
}

es.update(index='finance_history', doc_type='logs', id=1, body=data)

To update data you need to define in body `doc` or `script` key: `doc` simply updates a document, `script` allows to set your own exception. 

In [None]:
es.get(index='finance_history', doc_type='logs', id=1)

As you can see the `"High"` is changed and equals to `411.12`. Now let's remove this added document with the help of **`delete`** method

In [None]:
es.delete(index='finance_history', doc_type='logs', id=1)

The data is successful deleted. If you try to find it you get response with [404 error](https://en.wikipedia.org/wiki/HTTP_404)

In [None]:
es.get(index='finance_history', doc_type='logs', id=1)

We have used previosly the **`search`** method, but now you meet with it more detailed. You can search by index, types and documents:

* es.search() - Search across all indexes and all types.
* es.search(index='finance_history') - Search across all types.
* es.search(index='finance_history', doc_type='logs') - Search the documents in the `finance_history` index of type `logs`.

Also it is possible to specify several `indexes` or `types`:
    
    es.search(index=['finance_history', 'movies_history'], doc_type=['logs', 'posts'])

These are the most used parameters which can take the **`search`** method, all possible parameters you can find [here](https://elasticsearch-py.readthedocs.io/en/master/api.html#elasticsearch.Elasticsearch.search):

|Parameter|Description|
|:---|------------|
|`body`|The search definition using the [Query DSL](https://www.elastic.co/guide/en/elasticsearch/reference/5.2/query-dsl.html).|
|`_source`|Set to false to disable retrieval of the _source field. You can also retrieve part of the document by using `_source_include` & `_source_exclude` (see the [request body](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-source-filtering.html) documentation for more details)|
|`sort`|Sorting to perform. Can either be in the form of fieldName, or fieldName:asc/fieldName:desc. The fieldName can either be an actual field within the document, or the special _score name to indicate sorting based on scores. There can be several sort parameters (order is important).|
|`size`|The number of hits to return. Defaults to 10.|

ES has different type of search query:

|Query|Description|
|:---|------------|
|[`match`](https://www.elastic.co/guide/en/elasticsearch/reference/5.2/query-dsl-match-query.html)|The standard query for performing full text queries, including fuzzy matching and phrase or proximity queries.|
|[`match_phrase`](https://www.elastic.co/guide/en/elasticsearch/reference/5.2/query-dsl-match-query-phrase.html)|Like the match query but used for matching exact phrases or word proximity matches.|
|[`match_phrase_prefix`](https://www.elastic.co/guide/en/elasticsearch/reference/5.2/query-dsl-match-query-phrase-prefix.html)|Like the match_phrase query, but does a wildcard search on the final word.|
|[`multi_match`](https://www.elastic.co/guide/en/elasticsearch/reference/5.2/query-dsl-multi-match-query.html)|The multi-field version of the match query.|
|[`common_terms`](https://www.elastic.co/guide/en/elasticsearch/reference/5.2/query-dsl-common-terms-query.html)|A more specialized query which gives more preference to uncommon words.|
|[`query_string`](https://www.elastic.co/guide/en/elasticsearch/reference/5.2/query-dsl-query-string-query.html)|Supports the compact Lucene [query string syntax](https://www.elastic.co/guide/en/elasticsearch/reference/5.2/query-dsl-query-string-query.html#query-string-syntax), allowing you to specify `AND/OR/NOT` conditions and multi-field search within a single query string.|
|[`simple_query_string`](https://www.elastic.co/guide/en/elasticsearch/reference/5.2/query-dsl-simple-query-string-query.html)|A simpler, more robust version of the query_string syntax suitable for exposing directly to users.|

Let's find data where **`Date`** is **`1 February 2016`**.

In [None]:
query = {
    "query": {
        "query_string" : {
            "default_field" : "Date",
            "query" : "2016-02-01"
        }
    }
}

es.search(index='finance_history', body=query)

`default_field` is the field name where search will find the `query`. You can do the multi-field search by setting of the `fields` parameter. The folowing query will select only those documents where `"Open"` OR `"High"` fields are eqaul to `116.519997` OR `116.50`

In [None]:
query = {
    "query": {
        "query_string" : {
            "fields" : ["Open", "High"],
            "query" : "116.519997 OR 116.50"
        }
    }
}
es.search(index='finance_history', body=query, sort='Date', _source=['Date', 'Open', 'High'])

The previous request can be slightly confused, because it is not so obvious. Let's rewrite it

In [None]:
query = {
    "query": {
        "query_string" : {
            "query" : "(Open:116.519997 OR High:116.50) OR (Open:116.50 OR High:116.519997)"            
        }
    },
    "sort": "Date"
}
es.search(index='finance_history', body=query, sort='Date', _source=['Date', 'Open', 'High'])

You can use `filter` in query. `filter` is a part of `bool` `query`. `bool` `query` have the occurrence types:

|Occurrence|Description|
|:---|------------|
|`must`|The clause (query) must appear in matching documents and will contribute to the score.|
|`filter`|The clause (query) must appear in matching documents. However unlike must the score of the query will be ignored. Filter clauses are executed in [filter context](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-filter-context.html), meaning that scoring is ignored and clauses are considered for caching.|
|`should`|The clause (query) should appear in the matching document. In a boolean query with no must or filter clauses, one or more should clauses must match a document. The minimum number of should clauses to match can be set using the [minimum_should_match](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-minimum-should-match.html) parameter.|
|`must_not`|The clause (query) must not appear in the matching documents. Clauses are executed in [filter context](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-filter-context.html) meaning that scoring is ignored and clauses are considered for caching. Because scoring is ignored, a score of 0 for all documents is returned.|

Let's filter the data and find those documents where `Date` contains 2015 year, `High` lies between 95 and 115 and show only one result.

In [None]:
query = {
    "query": {
        "bool" : {
            "filter": [
                { "range": { "High": { "gte": 95.00, "lte": 115.0} } },
                { "range": { "Date": { "gte": "2015-01-01", "lt": "2016-01-01" }}}
            ]
        }
    },
    "size": 1
}
es.search(index='finance_history', body=query, size=1)

The `range` query accepts the following parameters:

|Parameter|Description|
|:---|------------|
|`gte`|Greater-than or equal to|
|`gt`|Greater-than|
|`lte`|Less-than or equal to|
|`lt`|Less-than|
|`boost`|Sets the boost value of the query, defaults to 1.0 |

### Aggregation in Elasticsearch

Now let's meet with aggregation. Aggregation helps to provide aggregated data based on a search query. It is based on simple building blocks called aggregations, that can be composed in order to build complex summaries of the data. The aggregation structure looks like:
<code>
"aggregations" : {
    "&lt;aggregation_name&gt;" : {
        "&lt;aggregation_type&gt;" : {
            &lt;aggregation_body&gt;
        }
        [,"meta" : {  [&lt;meta_data_body&gt;] } ]?
        [,"aggregations" : { [&lt;sub_aggregation&gt;]+ } ]?
    }
    [,"&lt;aggregation_name_2&gt;" : { ... } ]*
}</code>

Aggregation in ElasticSearch can be of four types:

|Type|Description|
|:---|------------|
|[`Metric`](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics.html)|Aggregations that keep track and compute metrics over a set of documents.|
|[`Bucket`](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket.html)|Build buckets, where each bucket is associated with a key and a document criterion. When the aggregation is executed, all the buckets criteria are evaluated on every document in the context and when a criterion matches, the document is considered to "fall in" the relevant bucket. By the end of the aggregation process, we’ll end up with a list of buckets - each one with a set of documents that "belong" to it.|
|[`Pipeline`](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline.html)|Aggregations that aggregate the output of other aggregations and their associated metrics.|
|[`Matrix`](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-matrix.html)|Operate on multiple fields and produce a matrix result based on the values extracted from the requested document fields. Unlike metric and bucket aggregations, this aggregation family does not yet support scripting.|

Using the `Metric` aggregation find average, maximum, minimum, count and summary of `"Close"` values. 

In [None]:
query = {
    "size": 0,
    "aggs": {
        "avg_close" : { "avg" : { "field" : "Close" } },
        "max_close" : { "max" : { "field" : "Close" } },
        "min_close" : { "min" : { "field" : "Close" } },
        "total_close" : { "sum" : { "field" : "Close" } },
        "count_close" : { "value_count" : { "field" : "Close" } }
    }
}

es.search(index='finance_history', body=query)

But in `Metric` aggregation multi-value aggregation exists and the previous query can be replaced by `extended_stats` or `stats`

In [None]:
query = {
    "size": 0,
    "aggs": {
        "stats_close" : { "stats" : { "field" : "Close" } },
        "extended_stats_close" : { "extended_stats" : { "field" : "Close" } }
    }
}

es.search(index='finance_history', body=query)

For example, you need to find the average of difference between `Close` and `Low` values. In ElasticSearch you can write your own script of aggregation just need to use `scripted_metric` aggregation. The `scripted_metric` aggregation contain four executions:

|Execution|Description|
|:---|------------|
|`init_script`|Allows to set up any initial state.|
|`map_script`|Executed once per document collected. This is the only required script. If no **`combine_script`** is specified, the resulting state needs to be stored in an object named **_agg**.|
|`combine_script`|Executed once on each shard after document collection is complete. Allows the aggregation to consolidate the state returned from each shard. If a **`combine_script`** is not provided the combine phase will return the aggregation variable. |
|`reduce_script `|Executed once on the coordinating node after all shards have returned their results. The script is provided with access to a variable **`_aggs`** which is an array of the result of the **`combine_script`** on each shard. If a **`reduce_script`** is not provided the reduce phase will return the **`_aggs`** variable. |

In [None]:
query = {
    "size": 0,
    "aggs": {
        "average_of_diff" : { 
            "scripted_metric" : {
                "init_script" : "params._agg.transactions = []",
                "map_script" : "params._agg.transactions.add(doc.Close.value - doc.Low.value)", 
                "combine_script" : "float diff = 0; int count = 0; for (v in params._agg.transactions) { diff += v; count += 1 } return [diff, count]",
                "reduce_script" : "float diff = 0; int count = 0; for (v in params._aggs) { diff += v[0]; count += v[1] } return diff / count"
            }
        }
    }
}

es.search(index='finance_history', body=query)

* In the **`init_script`** we define the initial state. Imagine that in python it will be empty list **`[]`**.
* In the **`map_script`** we find difference between **`Close`** and **`Low`** values and add it in list.
* In the **`combine_script`** we define two varieble **`diff`** and **`count`**. Then we itterate throw all found differences, find summary of differences and count the number of it. Then return found summary and count of each shards(default shards is 5).
* In the **`reduce_script`** we find total summary of differences and count. Then return the average value.

### Mapping in Elasticsearch

Mapping is the process of defining how a document and its fields are stored and indexed.

To be able to treat date fields as dates, numeric fields as numbers, and string fields as full-text or exact value strings, Elasticsearch needs to know what type of data each field contains. This information is included in the mapping. We can use mappings to define:

* which string fields should be treated as full text fields.
* which fields contain numbers, dates, or geolocations.
* whether the values of all fields in the document should be indexed into the catch-all [_all](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-all-field.html) field.
* the [format](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-date-format.html) of date values.

What you should know about mapping:
* existing type and field mappings cannot be updated, because changing the mapping would mean invalidating already indexed documents. So, to avoid this situation, you should create a new index with the correct mappings and reindex your data into that index.
* fields and mapping types don’t need to be defined before using, because Elasticsearch has the [dynamic mapping](https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-mapping.html), new mapping types, and new field names will be added automatically, just by indexing a document. 

To see already exist mapping you need use **`indices`** attribute and **`get_mapping`** method. Elasticsearch instance has attributes cat, cluster, indices, ingest, nodes, snapshot and tasks that provide access to instances of [CatClient](https://elasticsearch-py.readthedocs.io/en/master/api.html#elasticsearch.client.CatClient), [ClusterClient](https://elasticsearch-py.readthedocs.io/en/master/api.html#elasticsearch.client.ClusterClient), [IndicesClient](https://elasticsearch-py.readthedocs.io/en/master/api.html#elasticsearch.client.IndicesClient), [IngestClient](https://elasticsearch-py.readthedocs.io/en/master/api.html#elasticsearch.client.IngestClient), [NodesClient](https://elasticsearch-py.readthedocs.io/en/master/api.html#elasticsearch.client.NodesClient), [SnapshotClient](https://elasticsearch-py.readthedocs.io/en/master/api.html#elasticsearch.client.SnapshotClient) and [TasksClient](https://elasticsearch-py.readthedocs.io/en/master/api.html#elasticsearch.client.TasksClient) respectively. 

In [None]:
es.indices.get_mapping(index='finance_history')

You can create indices with defined types of fields in documents with the help of mapping. Let's create new index with mapping. For example, we create an index `movies` of type `movie` that will contain such data about movies: `"title"` (movie's title), `"director"`, `"release date"` and `"rating"` with specific types

In [None]:
query = {
    "mappings": {
        "movie" : {
            "properties" : {
                "title": {"type": "string"},
                "director": {"type": "string"},
                "release date:": {"type": "date", "format": "yyyy-MM-dd"},
                "rating": {"type": "float"},
            }           
        }
    }
}

es.indices.create(index='movies', body=query)

Insert a new document

In [None]:
data = {
    "title": "Forrest Gump",
    "director": " Robert Zemeckis",
    "release date:": "1994-07-01",
    "rating": 9
}
# Create
es.create(index='movies', doc_type='movie', id=1, body=data)
# and check at once
es.get(index='movies', doc_type='movie', id=1)

In [None]:
es.indices.get_mapping(index='movies')

Despite of we write `integer` value into `"rating"` field it will be processed as `float` at aggregation, for example.  To delete index you need the **`delete`** method.

In [None]:
es.indices.delete(index='movies')

### Full-text search in Elasticsearch

Let’s show the difference between speeds of work of pure Python search and Elasticsearch. For this example you need to  download the [Gutenberg dataset](https://docs.google.com/uc?id=0B2Mzhc7popBga2RkcWZNcjlRTGM&export=download) and extract it. After that create mapping for new index.

In [None]:
query = {
    "mappings": {
        "book" : {
            "properties" : {
                "author": {"type": "string"},
                "title": {"type": "string"},
                "story:": {"type": "string"}
            }           
        }
    }
}
es.indices.create(index='books', body=query)

Get all names from Gutenberg dataset. If you download the dataset in folder with notebook, don't edit the path just choose your system (below we provide commands for both Linux and Windows operating systems), if not please add the path to the dataset.

In [None]:
# If you use Linux
book_names = !ls Gutenberg/txt/

In [None]:
# If you use Windows
book_names = !dir Gutenberg/txt/

Add dataset in Elasticsearch. It takes around 15 minutes.

In [None]:
# Import time to see difference
from time import time

# Please set the correct path to the dataset 
path = 'Gutenberg/txt/'

t0 = time()
for id_book, book in enumerate(book_names):
    # Open the downloaded dataset to understand why we process it in such way
    book_s = book.split('___')
    author, title = book_s[0], book_s[1][:-4].replace('"', '')
    
    with open(path + book, encoding='iso-8859-15') as f:
        story = " ".join([i for i in f.read().splitlines() if i != ''])

    doc = {
        "author": author,
        "title": title,
        "story": story
    }
    es.create(index='books', doc_type='book', id=id_book, body=doc)
es_add_t = time() - t0
print("Dataset added to Elasticsearch in {} seconds".format(es_add_t))

Find how match books are in the dataset.

In [None]:
es.search(index='books', size=0)

Thus, we have 3036 books in Elasticsearch. Let's take rows from the middle book, which is `John Dryden___Discourses on Satire and Epic Poetry.txt`. We choose this phrase **`Pulverulenta putrem sonitu quatit ungula campum.`** for its finding in all books with the help of Elasticsearch.

In [None]:
t0 = time()
query = {
    "query": {
        "match_phrase" : {
            "story" : "Pulverulenta putrem sonitu quatit ungula campum."
        }
    }
}
print(es.search(index='books', body=query, size=5, _source=['author', 'title']))
es_search_t = time() - t0
print("Elasticsearch find matches in {} seconds".format(es_search_t))

Now let's find this row with pure Python

In [None]:
t0 = time()
def python_search(s):
    python_data = []
    for id_book, book in enumerate(book_names):
        book_s = book.split('___')
        author, title = book_s[0], book_s[1][:-4].replace('"', '')

        with open(path + book, encoding='iso-8859-15') as f:
            story = " ".join([i for i in f.read().splitlines() if i != ''])
        
        if s in story:
            doc = {
                "author": author,
                "title": title,
                "story": story
            }
            python_data.append(doc)
    return python_data

s = "Pulverulenta putrem sonitu quatit ungula campum."
matches = python_search(s)
python_search_t = time() - t0
print("Python find matches in {} seconds".format(python_search_t))

As you can see Elasticsearch worked much more faster than Python.

### Visualization using Kibana

Elasticsearch outputs can be visualize with the help of [Kibana](https://www.elastic.co/products/kibana) data visualization plugin for Elasticsearch. Kibana enables visual exploration and real-time analysis of your data in Elasticsearch. 

At first you need to install Kibana. Go to this [page](https://www.elastic.co/downloads/kibana) and download Kibana to the folder where you saved Elasticsearch and Logstash. 

> Note, Kibana and Elasticsearhc must be of the same version!

To run Kibana, execute this command in the Kibana directory:

    ./bin/kibana

When Kibana is running you can open the url http://localhost:5601. 

On start Kibana will propose you to configurate an index. You need to set index to **`finance_history`** and disable **` Index contains time-based events `**, then click on the `Create` button

<img src="images/k01.png">

Now go to the **`Visualize`** tab and create a new **`Area chart`**.

<img src="images/k02.png">

When you choose area chart, you need to set index to the value **`finance_history`** (it is index we are currently using). Now you can build area chart. On Y-Axis we choose **`Average`** of **`Close`** values.

<img src="images/k03.png">

In the **`buckets`** area choose X-Axis and set **`Aggregation`** to **`Date Histogram`**, **`Field`** to **`Date`** and **`Interval`** to **`Monthly`**.

<img src="images/k04.png">

In the right corner click on the **`Save`** button and save the area chart. Then go back to **`Visualize`** and create new **`Vertical bar chart`**. 

On Y-Axis we choose **`Median`** of **`Volume`** values. On the X-Axis set field like in **`Area chart`** except for **`Interval`**, **`Interval`** set to **`Weekly`**. Also in the top right corner you can shoose the colors of bars. Finaly save it.

<img src="images/k05.png">

Next create the **`Metric`** and save it.

<img src="images/k06.png">

Then go to the **`Dashboard`** tab and add saved charts and metrics.

<img src="images/k07.png">

You can also save the dashboard. 

As you can see visualization with Kibana it is very easy. Visualization helps you to more cleary understand the data.