## Chapter 1 - Searching ElasticSearch

In [26]:
!curl -H "Content-Type: application/json" -XGET 'ec2-13-59-84-109.us-east-2.compute.amazonaws.com:9200/shakespeare/_search?pretty' -d '{"query" : {"match_phrase" : {"text_entry" : "to be or not to be"}}}'

{
  "took" : 67,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 13.889601,
    "hits" : [
      {
        "_index" : "shakespeare",
        "_type" : "_doc",
        "_id" : "34229",
        "_score" : 13.889601,
        "_source" : {
          "type" : "line",
          "line_id" : 34230,
          "play_name" : "Hamlet",
          "speech_number" : 19,
          "line_number" : "3.1.64",
          "speaker" : "HAMLET",
          "text_entry" : "To be, or not to be: that is the question:"
        }
      }
    ]
  }
}


## Chapter 2 - Mapping and Indexing Data

### Mappings
    A mapping is a schema definition - elastisearch has reasonable defaults, but sometimes you need to customize them.

Example 

    !curl -XPUT 'ec2-13-59-84-109.us-east-2.compute.amazonaws.com:9200/movies -d '{ "mappings" : { "properties" : { "year" : {"type": "date"}}}}'
***
### Common Mappings
***
#### Field types  
String, byte, short, integer, long, float, double, boolean, date

        "properties": {
            "user_id": {
                "type": 
        "long"
            }
        }

#### Field Index  
Do you want this field indexed for full-text search?  analyzed / not_analyzed / no

        "properties": {
            "genre": {
                "index": 
        "not_analyzed"
            }
        }

#### Field Analyzer
Define your tokenizer and token filter.  Standard / whitespace / simple / english etc.

        "properties": {
            "description": 
        {
                "analyzer": 
        "english"
            }
        }
***
### More About Analyzers
***
#### Character Filters
    Remove HTML encoding, convert & to and
    
#### Tokenizer
    Split strings on whitespace / punctuation / non-letters
    
#### Token Filter
    Lowercasing, stemming, synonyms, stopwords*
    
    *Stopwords might be difficult for searching phrases.

***
### Choices For Analyzers

#### Standard
    Splits on word boundaries, removes punctuation, lowercases.  Good choice if language is unknown.
#### Simple
    Splits on anything that isn't a letter, and lowercase.
#### Whitespace
    Splits on whitespace but doesn't lowercase.
#### Language
    Accounts for language-specific stopwords and stemming.
***

### Importing a Single Movie (JSON/REST)

In [28]:
!curl -H "Content-Type: application/json" -XPUT 'ec2-13-59-84-109.us-east-2.compute.amazonaws.com:9200/movies/_doc/109487' -d '{"genre":["IMAX","Sci-Fi"], "title": "Interstellar", "year": 2014}'

{"_index":"movies","_type":"_doc","_id":"109487","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}

In [29]:
!curl -H "Content-Type: application/json" -XGET 'ec2-13-59-84-109.us-east-2.compute.amazonaws.com:9200/movies/_mapping' 

{"movies":{"mappings":{"properties":{"genre":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"title":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"year":{"type":"long"}}}}}

In [30]:
!curl -H "Content-Type: application/json" -XPOST 'ec2-13-59-84-109.us-east-2.compute.amazonaws.com:9200/movies/_doc/109487' -d '{"genre":["IMAX","Sci-Fi"], "title": "Interstellar", "year": 2014}'

{"_index":"movies","_type":"_doc","_id":"109487","_version":2,"result":"updated","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":1,"_primary_term":1}

In [31]:
!curl -H "Content-Type: application/json" -XGET 'ec2-13-59-84-109.us-east-2.compute.amazonaws.com:9200/movies/_search?pretty' 

{
  "took" : 944,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "109487",
        "_score" : 1.0,
        "_source" : {
          "genre" : [
            "IMAX",
            "Sci-Fi"
          ],
          "title" : "Interstellar",
          "year" : 2014
        }
      }
    ]
  }
}


### Import Many Documents using Bulk

In [84]:
#!curl -H "Content-Type: application/json" -XPUT 'ec2-13-59-84-109.us-east-2.compute.amazonaws.com:9200/_bulk?pretty' --data-binary @movies.json
!curl -H "Content-Type: application/json" -XGET 'ec2-13-59-84-109.us-east-2.compute.amazonaws.com:9200/_search?pretty'

{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "109487",
        "_score" : 1.0,
        "_source" : {
          "genre" : [
            "IMAX",
            "Sci-Fi"
          ],
          "title" : "Interstellar",
          "year" : 2014
        }
      },
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "135569",
        "_score" : 1.0,
        "_source" : {
          "id" : "135569",
          "title" : "Star Trek Beyond",
          "year" : 2016,
          "genre" : [
            "Action",
            "Adventure",
            "Sci-Fi"
          ]
        }
      },
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "122886",
        "_score"

***
### Updating Data in Elasticsearch 
#### Versions
  * Every document has a _version field 
  * Elasticsearch documents are __immutable__ 
  * When you update an existing document: 
      * a new document is created with an incremented _version 
      * the old document is marked for deletion 
      
#### Partial Update API
    curl -H "Content-Type: application/json" -XPOST 127.0.0.1:9200/movies/_doc/109487/_update -d ' 
        { 
            "doc":{ 
                "title": "Interstellar" 
            } 
            }'

In [85]:
!curl -H "Content-Type: application/json" -XPUT  'ec2-13-59-84-109.us-east-2.compute.amazonaws.com:9200/movies/_doc/109487?pretty' -d '{"genres": ["IMAX", "Sci-Fi"], "title": "Interstellar foo","year":2014}'

{
  "_index" : "movies",
  "_type" : "_doc",
  "_id" : "109487",
  "_version" : 3,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 6,
  "_primary_term" : 1
}


In [86]:
!curl -H "Content-Type: application/json" -XGET 'ec2-13-59-84-109.us-east-2.compute.amazonaws.com:9200/movies/_doc/109487?pretty' 

{
  "_index" : "movies",
  "_type" : "_doc",
  "_id" : "109487",
  "_version" : 3,
  "_seq_no" : 6,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "genres" : [
      "IMAX",
      "Sci-Fi"
    ],
    "title" : "Interstellar foo",
    "year" : 2014
  }
}


In [88]:
!curl -H "Content-Type: application/json" -XPOST  'ec2-13-59-84-109.us-east-2.compute.amazonaws.com:9200/movies/_doc/109487/_update' -d '{"doc": {"title": "Interstellar","year":2014}}'

{"_index":"movies","_type":"_doc","_id":"109487","_version":4,"result":"updated","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":7,"_primary_term":1}

In [89]:
!curl -H "Content-Type: application/json" -XGET 'ec2-13-59-84-109.us-east-2.compute.amazonaws.com:9200/movies/_doc/109487?pretty' 

{
  "_index" : "movies",
  "_type" : "_doc",
  "_id" : "109487",
  "_version" : 4,
  "_seq_no" : 7,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "genres" : [
      "IMAX",
      "Sci-Fi"
    ],
    "title" : "Interstellar",
    "year" : 2014
  }
}


***
### Deleting Data in Elasticsearch
    Just use the DELETE method:
        curl -XDELETE 127.0.0.1:9200/movies/_doc/58559
        
 

In [90]:
!curl -H "Content-Type: application/json" -XGET 'ec2-13-59-84-109.us-east-2.compute.amazonaws.com:9200/movies/_search?q=Dark' 

{"took":704,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":1,"relation":"eq"},"max_score":1.5169399,"hits":[{"_index":"movies","_type":"_doc","_id":"58559","_score":1.5169399,"_source":{ "id": "58559", "title" : "Dark Knight, The", "year":2008 , "genre":["Action", "Crime", "Drama", "IMAX"] }}]}}

In [91]:
!curl -H "Content-Type: application/json" -XDELETE 'ec2-13-59-84-109.us-east-2.compute.amazonaws.com:9200/movies/_doc/58559?pretty' 

{
  "_index" : "movies",
  "_type" : "_doc",
  "_id" : "58559",
  "_version" : 2,
  "result" : "deleted",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 8,
  "_primary_term" : 1
}


In [92]:
!curl -H "Content-Type: application/json" -XGET 'ec2-13-59-84-109.us-east-2.compute.amazonaws.com:9200/movies/_search?q=Dark'

{"took":4,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":0,"relation":"eq"},"max_score":null,"hits":[]}}

***
### Summary of Insert/Update/Delete

In [93]:
!curl -H "Content-Type: application/json" -XPUT 'ec2-13-59-84-109.us-east-2.compute.amazonaws.com:9200/movies/_doc/200000?pretty' -d '{"title": "Jons movie in Elastic", "genres":["Documentary"], "year": 2019}'

{
  "_index" : "movies",
  "_type" : "_doc",
  "_id" : "200000",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 9,
  "_primary_term" : 1
}


In [94]:
!curl -H "Content-Type: application/json" -XGET 'ec2-13-59-84-109.us-east-2.compute.amazonaws.com:9200/movies/_doc/200000?pretty'

{
  "_index" : "movies",
  "_type" : "_doc",
  "_id" : "200000",
  "_version" : 1,
  "_seq_no" : 9,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "title" : "Jons movie in Elastic",
    "genres" : [
      "Documentary"
    ],
    "year" : 2019
  }
}


In [95]:
!curl -H "Content-Type: application/json" -XPOST 'ec2-13-59-84-109.us-east-2.compute.amazonaws.com:9200/movies/_doc/200000/_update' -d '{"doc": {"genres":["Documentary", "Comedy"]}}'

{"_index":"movies","_type":"_doc","_id":"200000","_version":2,"result":"updated","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":10,"_primary_term":1}

In [96]:
!curl -H "Content-Type: application/json" -XGET 'ec2-13-59-84-109.us-east-2.compute.amazonaws.com:9200/movies/_doc/200000?pretty'

{
  "_index" : "movies",
  "_type" : "_doc",
  "_id" : "200000",
  "_version" : 2,
  "_seq_no" : 10,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "title" : "Jons movie in Elastic",
    "genres" : [
      "Documentary",
      "Comedy"
    ],
    "year" : 2019
  }
}


In [97]:
!curl -H "Content-Type: application/json" -XDELETE 'ec2-13-59-84-109.us-east-2.compute.amazonaws.com:9200/movies/_doc/200000?pretty'

{
  "_index" : "movies",
  "_type" : "_doc",
  "_id" : "200000",
  "_version" : 3,
  "result" : "deleted",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 11,
  "_primary_term" : 1
}


***
### Dealing with Concurrency

What happens when two different clients are trying to do the same thing at the same time?

#### The Problem
    2 different clients want to view count for page
    both return 10
    both increment at the same time to 11 - but it should have been 12 because two requests would have happened.   This can become a problem at larger scales
    
#### Solution - Optimistic Concurrency Control
    2 different clients want to view count for page
    both return 10 with _seq_no and _primary_term
    Increment for _seq_no and _primary_term
    First one gets 11, second one gets Error.
    * Use retry_on_conflicts=N to automatically retry *

In [99]:
!curl -H "Content-Type: application/json" -XGET 'ec2-13-59-84-109.us-east-2.compute.amazonaws.com:9200/movies/_doc/109487?pretty'

{
  "_index" : "movies",
  "_type" : "_doc",
  "_id" : "109487",
  "_version" : 4,
  "_seq_no" : 7,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "genres" : [
      "IMAX",
      "Sci-Fi"
    ],
    "title" : "Interstellar",
    "year" : 2014
  }
}


**We can restrict this update to a specific _seq_no**

In [100]:
!curl -H "Content-Type: application/json" -XPUT 'ec2-13-59-84-109.us-east-2.compute.amazonaws.com:9200/movies/_doc/109487?if_seq_no=7&if_primary_term=1' -d '{"genres": ["IMAX", "Sci-Fi"], "title": "Interstellar foo","year":2014}'

{"_index":"movies","_type":"_doc","_id":"109487","_version":5,"result":"updated","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":12,"_primary_term":1}

In [101]:
!curl -H "Content-Type: application/json" -XPUT 'ec2-13-59-84-109.us-east-2.compute.amazonaws.com:9200/movies/_doc/109487?if_seq_no=7&if_primary_term=1' -d '{"genres": ["IMAX", "Sci-Fi"], "title": "Interstellar foo","year":2014}'

{"error":{"root_cause":[{"type":"version_conflict_engine_exception","reason":"[109487]: version conflict, required seqNo [7], primary term [1]. current document has seqNo [12] and primary term [1]","index_uuid":"6caWD2OkRTCH1cZTXOhwng","shard":"0","index":"movies"}],"type":"version_conflict_engine_exception","reason":"[109487]: version conflict, required seqNo [7], primary term [1]. current document has seqNo [12] and primary term [1]","index_uuid":"6caWD2OkRTCH1cZTXOhwng","shard":"0","index":"movies"},"status":409}

**We get an error because the sequence number is no longer correct, so we update the _seq_no to 12**

In [102]:
!curl -H "Content-Type: application/json" -XPUT 'ec2-13-59-84-109.us-east-2.compute.amazonaws.com:9200/movies/_doc/109487?if_seq_no=12&if_primary_term=1' -d '{"genres": ["IMAX", "Sci-Fi"], "title": "Interstellar foo","year":2014}'

{"_index":"movies","_type":"_doc","_id":"109487","_version":6,"result":"updated","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":13,"_primary_term":1}

In [113]:
!curl -H "Content-Type: application/json" -XPOST 'ec2-13-59-84-109.us-east-2.compute.amazonaws.com:9200/movies/_doc/109487?_update?retry_on_conflict=5' -d '{"doc": {"title": "Interstellar typo"}}'

{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"request [/movies/_doc/109487] contains unrecognized parameter: [_update?retry_on_conflict]"}],"type":"illegal_argument_exception","reason":"request [/movies/_doc/109487] contains unrecognized parameter: [_update?retry_on_conflict]"},"status":400}

In [110]:
!curl -H "Content-Type: application/json" -XGET 'ec2-13-59-84-109.us-east-2.compute.amazonaws.com:9200/movies/_doc/109487?pretty'

{
  "_index" : "movies",
  "_type" : "_doc",
  "_id" : "109487",
  "_version" : 6,
  "_seq_no" : 13,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "genres" : [
      "IMAX",
      "Sci-Fi"
    ],
    "title" : "Interstellar foo",
    "year" : 2014
  }
}


***
### Using Analyzers and Tokenizers
#### Controlling Full-Text Search
#### Using Analyzers
    Sometimes text fields should be exact-match
    * Use keyword mapping instead of text
    
    Search on analyzed text fields will return anything remotely relevant
    * Depending on the analyzer, results will be case-insensitive, stemmed, stopwords removed, synonyms applied, etc.
    * Searches with multiple terms need not match them all
    
    

In [114]:
!curl -H "Content-Type: application/json" -XGET 'ec2-13-59-84-109.us-east-2.compute.amazonaws.com:9200/_search?pretty' -d '{"query":{"match":{"title": "Star Trek"}}}'

{
  "took" : 28,
  "timed_out" : false,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 2.1566045,
    "hits" : [
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "135569",
        "_score" : 2.1566045,
        "_source" : {
          "id" : "135569",
          "title" : "Star Trek Beyond",
          "year" : 2016,
          "genre" : [
            "Action",
            "Adventure",
            "Sci-Fi"
          ]
        }
      },
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "122886",
        "_score" : 0.5480699,
        "_source" : {
          "id" : "122886",
          "title" : "Star Wars: Episode VII - The Force Awakens",
          "year" : 2015,
          "genre" : [
            "Action",
            "Adventure",
            "Fantasy",
            "Sci-Fi",
            "I

**Partial matches for this search - but match score was <1 for Star Wars**

In [115]:
!curl -H "Content-Type: application/json" -XGET 'ec2-13-59-84-109.us-east-2.compute.amazonaws.com:9200/movies/_search?pretty' -d '{"query":{"match_phrase": {"genre": "sci"}}}'

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 0.15275992,
    "hits" : [
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "1924",
        "_score" : 0.15275992,
        "_source" : {
          "id" : "1924",
          "title" : "Plan 9 from Outer Space",
          "year" : 1959,
          "genre" : [
            "Horror",
            "Sci-Fi"
          ]
        }
      },
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "135569",
        "_score" : 0.13786995,
        "_source" : {
          "id" : "135569",
          "title" : "Star Trek Beyond",
          "year" : 2016,
          "genre" : [
            "Action",
            "Adventure",
            "Sci-Fi"
          ]
        }
      },
      {
        "_index" : "movies",
        "_type" 

In [116]:
!curl -H "Content-Type: application/json" -XDELETE 'ec2-13-59-84-109.us-east-2.compute.amazonaws.com:9200/movies'

{"acknowledged":true}

In [117]:
!curl -H "Content-Type: application/json" -XPUT 'ec2-13-59-84-109.us-east-2.compute.amazonaws.com:9200/movies/' -d '{"mappings": {"properties": {"id": {"type": "integer"}, "year":{"type": "date"}, "genre":{"type":"keyword"}, "title": {"type": "text", "analyzer":"english"}}}}'

{"acknowledged":true,"shards_acknowledged":true,"index":"movies"}

In [119]:
#!curl -H "Content-Type: application/json" -XPUT 'ec2-13-59-84-109.us-east-2.compute.amazonaws.com:9200/_bulk?pretty' --data-binary '@movies.json'

In [120]:
!curl -H "Content-Type: application/json" -XGET 'ec2-13-59-84-109.us-east-2.compute.amazonaws.com:9200/movies/_search?pretty' -d '{"query":{"match_phrase": {"genre": "sci"}}}'

{
  "took" : 761,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}


**No longer analyzed so it doesn't work, must be an EXACT MATCH now**

In [121]:
!curl -H "Content-Type: application/json" -XGET 'ec2-13-59-84-109.us-east-2.compute.amazonaws.com:9200/movies/_search?pretty' -d '{"query":{"match_phrase": {"genre": "Sci-Fi"}}}'

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : 0.40025333,
    "hits" : [
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "135569",
        "_score" : 0.40025333,
        "_source" : {
          "id" : "135569",
          "title" : "Star Trek Beyond",
          "year" : 2016,
          "genre" : [
            "Action",
            "Adventure",
            "Sci-Fi"
          ]
        }
      },
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "122886",
        "_score" : 0.40025333,
        "_source" : {
          "id" : "122886",
          "title" : "Star Wars: Episode VII - The Force Awakens",
          "year" : 2015,
          "genre" : [
            "Action",
            "Adventure",
            "Fantasy",
            "Sci-Fi",
            

In [122]:
!curl -H "Content-Type: application/json" -XGET 'ec2-13-59-84-109.us-east-2.compute.amazonaws.com:9200/movies/_search?pretty' -d '{"query":{"match_phrase": {"title": "star wars"}}}'

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.7228093,
    "hits" : [
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "122886",
        "_score" : 1.7228093,
        "_source" : {
          "id" : "122886",
          "title" : "Star Wars: Episode VII - The Force Awakens",
          "year" : 2015,
          "genre" : [
            "Action",
            "Adventure",
            "Fantasy",
            "Sci-Fi",
            "IMAX"
          ]
        }
      }
    ]
  }
}


***
### Strategies For Relational Data

#### Normalized data
* Look up rating $\rightarrow$ Rating (userID, movieID, rating) $\rightarrow$ Look up title $\rightarrow$ Movie (movieID, title, genres)
    * Minimizes storage space, makes it easy to change titles
    * But requires two queries, and storage is cheap! 

Do I want the abilitly to easily change the data?   Can I handle the delayed traffic from the additional hops?   This will take multiple transations

#### Denormalized Data
* Look up rating $\rightarrow$ Rating (userID, rating, title)
    * Titles are duplicated, but only one query

In [1]:
!curl -H "Content-Type: application/json" -XPUT 'ec2-13-59-84-109.us-east-2.compute.amazonaws.com:9200/series' -d '{"mappings": {"properties": {"film_to_franchise": { "type": "join", "relations": {"franchise": "film"}}}}}'

{"acknowledged":true,"shards_acknowledged":true,"index":"series"}

In [4]:
#!curl -H "Content-Type: application/json" -XPUT 'ec2-13-59-84-109.us-east-2.compute.amazonaws.com:9200/_bulk?pretty' --data-binary @series.json

***
#### Search for Children of Franchise Parent

In [10]:
!curl -H "Content-Type: application/json" -XGET 'ec2-13-59-84-109.us-east-2.compute.amazonaws.com:9200/series/_search?pretty' -d '{"query": { "has_parent": {"parent_type": "franchise", "query": { "match": { "title": "Star Wars"}}}}}' 

{
  "took" : 872,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 7,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "series",
        "_type" : "_doc",
        "_id" : "260",
        "_score" : 1.0,
        "_routing" : "1",
        "_source" : {
          "id" : "260",
          "film_to_franchise" : {
            "name" : "film",
            "parent" : "1"
          },
          "title" : "Star Wars: Episode IV - A New Hope",
          "year" : "1977",
          "genre" : [
            "Action",
            "Adventure",
            "Sci-Fi"
          ]
        }
      },
      {
        "_index" : "series",
        "_type" : "_doc",
        "_id" : "1196",
        "_score" : 1.0,
        "_routing" : "1",
        "_source" : {
          "id" : "1196",
          "film_to_franchise" : {
            "name" : "film",
        

***
#### Search for a Child to find franchise associated with film title

In [11]:
!curl -H "Content-Type: application/json" -XGET 'ec2-13-59-84-109.us-east-2.compute.amazonaws.com:9200/series/_search?pretty' -d '{"query": { "has_child": {"type": "film", "query": { "match": { "title": "The Force Awakens"}}}}}'

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "series",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_routing" : "1",
        "_source" : {
          "id" : "1",
          "film_to_franchise" : {
            "name" : "franchise"
          },
          "title" : "Star Wars"
        }
      }
    ]
  }
}


***
### Flattened Datatype
#### Mapping Explosions
    If we need to handle documents with many inner fields, performance can start to suffer due to dynamic mapping.  
    To avoid mapping explosion, we should use flattened data type.
    
    Example Log
    {
        "@timestamp": "2020-03-09T"
        "message": "[5555:1:000/11111.1111",
        "fileset": {
            "name": "syslog"
        },
        "process": {
            "name": "org.gnome.Shell.desktop",
            "pid":3383
        },
        "host": {
            "hostname": "bionic",
            "name": bionic"
        }
    }
    
    Second Example Log - with new inner fields
        {
            "@timestamp": "2020-03-09T"
            "message": "[5555:1:000/11111.1111",
            "fileset": {
                "name": "syslog"
            },
            "process": {
                "name": "org.gnome.Shell.desktop",
                "pid":3383
            },
            "host": {
                "hostname": "bionic",
                "name": bionic"
                "osArchitecture": "x86_64",
                "osVersion": "Bionic Beaver"
            }
        }


In [19]:
#  Had to be executed in PuTTY
#
#         curl -XPUT "http://127.0.0.1:9200/demo-default/_doc/1" -d'{
#           "message": "[5592:1:0309/123054.737712:ERROR:child_process_sandbox_support_impl_linux.cc(79)] FontService unique font name matching request did not receive a response.",
#           "fileset": {
#             "name": "syslog"
#           },
#           "process": {
#             "name": "org.gnome.Shell.desktop",
#             "pid": 3383
#           },
#           "@timestamp": "2020-03-09T18:00:54.000+05:30",
#           "host": {
#             "hostname": "bionic",
#             "name": "bionic"
#           }
#         }'

!curl -H "Content-Type: application/json" -XGET 'ec2-13-59-84-109.us-east-2.compute.amazonaws.com:9200/demo-default/_mapping?pretty=true'
# Mapping of Demo-Default

{
  "demo-default" : {
    "mappings" : {
      "properties" : {
        "@timestamp" : {
          "type" : "date"
        },
        "fileset" : {
          "properties" : {
            "name" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            }
          }
        },
        "host" : {
          "properties" : {
            "hostname" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "name" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            }
          }
        },
        "message" :

**We did not define any mapping, but Elastic Search defined one on it's own**
***
This is stored in the cluster state

In [20]:
!curl -H "Content-Type: application/json" -XGET 'ec2-13-59-84-109.us-east-2.compute.amazonaws.com:9200/_cluster/state?pretty=true' >> es-cluster-state.json

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  193k  100  193k    0     0   675k      0 --:--:-- --:--:-- --:--:--  675k


#### ElasticSearch Cluster

 Node 2 $\leftrightarrow$ Node 1(Master) $\leftrightarrow$ Node 3
 
     Cluster states are passed between the nodes to run smoothly.  
     Upon recieving the state, nodes send a confirmation signal back to the master node. 
     For each new field, a mapping is added and the cluster state changes - other nodes sync which causes updates across all nodes. 
     Without updated state, indexing and searches cannot be performed. 
     When a cluster crashes because of too many fields, this is called mapping explosion.
 
 

#### Flattened Datatype Examples

In [22]:
!curl -H "Content-Type: application/json" -XPUT 'ec2-13-59-84-109.us-east-2.compute.amazonaws.com:9200/demo-flattened'

{"acknowledged":true,"shards_acknowledged":true,"index":"demo-flattened"}

In [23]:
!curl -H "Content-Type: application/json" -XPUT 'ec2-13-59-84-109.us-east-2.compute.amazonaws.com:9200/demo-flattened/_mapping' -d '{ "properties": {"host": {"type": "flattened"}}}'

{"acknowledged":true}

In [24]:
# Entered into PuTTY
# curl -XPUT "http://127.0.0.1:9200/demo-flattened/_doc/1" -d'{
#   "message": "[5592:1:0309/123054.737712:ERROR:child_process_sandbox_support_impl_linux.cc(79)] FontService unique font name matching request did not receive a response.",
#   "fileset": {
#     "name": "syslog"
#   },
#   "process": {
#     "name": "org.gnome.Shell.desktop",
#     "pid": 3383
#   },
#   "@timestamp": "2020-03-09T18:00:54.000+05:30",
#   "host": {
#     "hostname": "bionic",
#     "name": "bionic"
#   }
# }'
!curl -H "Content-Type: application/json" -XGET 'ec2-13-59-84-109.us-east-2.compute.amazonaws.com:9200/demo-flattened/_mapping?pretty=true'

{
  "demo-flattened" : {
    "mappings" : {
      "properties" : {
        "@timestamp" : {
          "type" : "date"
        },
        "fileset" : {
          "properties" : {
            "name" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            }
          }
        },
        "host" : {
          "type" : "flattened"
        },
        "message" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "process" : {
          "properties" : {
            "name" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            

**With the flattened mapping type, the field host contains inner fields but no longer shows inner fields.** \
Process was not mapped as a flattened type.  

In [25]:
!curl -H "Content-Type: application/json" -XPOST 'ec2-13-59-84-109.us-east-2.compute.amazonaws.com:9200/demo-flattened/_update/1' -d '{"doc" : {"host" : {"osVersion": "Bionic Beaver", "osArchitecture":"x86_64"}}}'

{"_index":"demo-flattened","_type":"_doc","_id":"1","_version":2,"result":"updated","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":1,"_primary_term":1}

In [26]:
!curl -H "Content-Type: application/json" -XGET 'ec2-13-59-84-109.us-east-2.compute.amazonaws.com:9200/demo-flattened/_doc/1?pretty=true'

{
  "_index" : "demo-flattened",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 2,
  "_seq_no" : 1,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "message" : "[5592:1:0309/123054.737712:ERROR:child_process_sandbox_support_impl_linux.cc(79)] FontService unique font name matching request did not receive a response.",
    "fileset" : {
      "name" : "syslog"
    },
    "process" : {
      "name" : "org.gnome.Shell.desktop",
      "pid" : 3383
    },
    "@timestamp" : "2020-03-09T18:00:54.000+05:30",
    "host" : {
      "hostname" : "bionic",
      "name" : "bionic",
      "osVersion" : "Bionic Beaver",
      "osArchitecture" : "x86_64"
    }
  }
}


In [27]:
!curl -H "Content-Type: application/json" -XGET 'ec2-13-59-84-109.us-east-2.compute.amazonaws.com:9200/demo-flattened/_mappings?pretty=true'

{
  "demo-flattened" : {
    "mappings" : {
      "properties" : {
        "@timestamp" : {
          "type" : "date"
        },
        "fileset" : {
          "properties" : {
            "name" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            }
          }
        },
        "host" : {
          "type" : "flattened"
        },
        "message" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "process" : {
          "properties" : {
            "name" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            

**New inner fields are not mapped, so this reduces the chance of a mapping explosion.**  \
Fields of flattened data type will be treated as key words  \
**Note:** No analyzers or tokenizers will be applied to the flattened fields
 

In [29]:
!curl -H "Content-Type: application/json" -XGET 'ec2-13-59-84-109.us-east-2.compute.amazonaws.com:9200/demo-flattened/_search?pretty=true' -d '{"query": {"match": {"host": "Bionic Beaver"}}}'

{
  "took" : 132,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.8713851,
    "hits" : [
      {
        "_index" : "demo-flattened",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.8713851,
        "_source" : {
          "message" : "[5592:1:0309/123054.737712:ERROR:child_process_sandbox_support_impl_linux.cc(79)] FontService unique font name matching request did not receive a response.",
          "fileset" : {
            "name" : "syslog"
          },
          "process" : {
            "name" : "org.gnome.Shell.desktop",
            "pid" : 3383
          },
          "@timestamp" : "2020-03-09T18:00:54.000+05:30",
          "host" : {
            "hostname" : "bionic",
            "name" : "bionic",
            "osVersion" : "Bionic Beaver",
            "osArchitecture" : "x86_64"
          

In [32]:
# Because the fields are not analyzed, partial matches do not return results. 
!curl -H "Content-Type: application/json" -XGET 'ec2-13-59-84-109.us-east-2.compute.amazonaws.com:9200/demo-flattened/_search?pretty=true' -d '{"query": {"match": {"host": "Beaver"}}}'

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}


#### Supported Queries for Flattened Datatype
* term, terms and terms_set
* prefix
* range (non numerical range operations)
* match and multi_match (exact keywords)
* query_string and simple_query_string
* exists

### Dealing with Mapping Extensions
#### Mappings

**Mapping**
* Process - Defining how JSON documents will be stored
* Result - the actual metadata resulting from the definition process

**Explicit Mapping**: Fields and their types are predefined. 

**Dynamic Mapping**: Fields and their types automatically defined by Elasticsearch. 

#### The Mapping Result

        {
            "mappings": {
                "properties": {
                    "timestamp": { "type": date" },
                    "service": { "type": keyword" },
                    "host_ip": { "type": "ip" },
                    "port": { "type": "integer" },
                    "message": { "type": "text"}
                }


* **Timestamp** mapped as date
* **Service** mapped as a keyword
* **IP** mapped as an ip datatype
* **Port** mapped as an integer
* **Message** mapped as text

**Explicit Mapping**
* Mapping exceptions when there's a mismatch

**Dynamic Mapping**
* May lead to a mapping explosion


In [37]:
!curl --request PUT 'http://localhost:9200/microservice-logs' --data-raw '{"mappings": {"properties": {"timestamp": { "type": "date"  }, "service": { "type": "keyword" }, "host_ip": { "type": "ip" }, "port": { "type": "integer" }, "message": { "type": "text" }}}'


curl: (6) Could not resolve host: PUT
{"error":"Incorrect HTTP method for uri [/microservice-logs] and method [POST], allowed: [PUT, DELETE, HEAD, GET]","status":405}