# Elasticsearch and Kibana

Whole notebook is based in Elasticsearch 8.0 documentation. In case something is missing, refer to current version [documentation](https://www.elastic.co/guide/en/elasticsearch/reference/8.0/).

1. Download Elasticseach, Kibana and Elasticsearch for Python
2. Configure Elasticsearch and Kibana
    1. Setup security
        - This is mandatory if system has multiple users
        - May be skipped if you are the only user of the system
3. Creating index in Python
    1. Creating index from structured data with explicit mapping
4. Searching with Python
5. Visualization with Kibana

## Download Elasticseach and Kibana
- Elasticsearch is built using Java, and includes a bundled version of OpenJDK from the JDK maintainers (GPLv2+CE) within each distribution 
- The bundled JVM is the recommended JVM and is located within the jdk directory of the Elasticsearch home directory.
- Elasticsearch directory is known as ```$ES_HOME```, add it to your .bashrc
- Kibana directory is knows as ```$KIBANA_HOME```, add it to your .bashrc
- Both programs can be stopped by ```CTRL + C```

In [None]:
#!wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.0.0-linux-x86_64.tar.gz
#!wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.0.0-linux-x86_64.tar.gz.sha512
#!shasum -a 512 -c elasticsearch-8.0.0-linux-x86_64.tar.gz.sha512
#!tar -xzf elasticsearch-8.0.0-linux-x86_64.tar.gz
#!wget https://artifacts.elastic.co/downloads/kibana/kibana-8.0.0-linux-x86_64.tar.gz
#!wget https://artifacts.elastic.co/downloads/kibana/kibana-8.0.0-linux-x86_64.tar.gz.sha512 | shasum -a 512 -c kibana-8.0.0-linux-x86_64.tar.gz.sha512
#!tar -xzf kibana-8.0.0-linux-x86_64.tar.gz
#Create new conda environment
#!conda create --name elasticsearch
#!conda activate elasticsearch
#!conda install -c conda-forge elasticsearch 

## Configuring Elasticsearch
1. Start Elasticsearch, this works only if you have set the bin to your PATH

```sh
elasticsearch
```
- When starting Elasticsearch for the first time, some security features are enabled and configured by default &rarr; the following security configuration occurs automatically:

- Authentication and authorization are enabled, and a password is generated for the elastic built-in superuser
- Certificates and keys for TLS are generated for the transport and HTTP layer, and TLS is enabled and configured with these keys and certificates
- An enrollment token is generated for Kibana, which is valid for 30 minutes.
- If you are not able to setup Kibana in  30min, run  ```elasticsearch-create-enrollment-token -s kibana```
- Save the ```elastic``` user password


2. Shutdown Elasticsearch with and start Elasticsearch as a daemon

```sh
elasticsearch -d -p pid
```
- The saved ```pid``` can be found in ```$ES_HOME```

3. Check that Elasticsearch is running

```sh
curl --cacert $ES_HOME/certs/http_ca.crt -u elastic https://localhost:9200 
```

4. Shutdown the daemon

```sh
kill pid

```

## Configure Kibana

1. **Encrypt traffic between Kibana and Elasticsearch**
    1. Add encryption keys for Kibana dashboards/visualizations, saved reports and session information
    
        1. Generate keys by running ```kibana-encryption-keys generate```, this outputs keys to console
        
        2. Add keys (```xpack.encryptedSavedObjects.encryptionKey, xpack.reporting.encryptionKey, 
        xpack.security.encryptionKey```) to ```$KIBANA_HOME/config/kibana.yml```
        
2. **Encrypt traffic between your browser and Kibana**
    1. Make local Certificate Authority
        1. Make ```certs``` directory
        2. Run ```openssl genrsa -des3 -out myCA.key 2048``` to become local Certificate Authority &rarr; give passphare
        3. Generate a root certificate ```openssl req -x509 -new -nodes -key myCA.key -sha512 -out myCA.pem``` &rarr; give passpharase &rarr; Give something recognizible for ```Common name``` eg. Self Produced CA
    2. Add the root certificate to Linux
        1. Install ca-certificates if not installed
        2. Copy ```myCA.pem``` to ```/usr/local/share/ca-certificates``` as ```myCA.crt```
        3. Update certificate store ```sudo update-ca-certificates```
        4. Test that the certificate has been installed ```awk -v cmd='openssl x509 -noout -subject' '/BEGIN/{close(cmd)};{print | cmd}' < /etc/ssl/certs/ca-certificates.crt | grep "Self Produced CA"```
    3. Generate a server certificate and private key for Kibana ```elasticsearch-certutil csr -name kibana-server -dns localhost``` &rarr;  generates a ```csr-bundle.zip``` file
        1. Unzip the ```csr-bundle.zip``` file to obtain the ```kibana-server.csr``` unsigned security certificate and the ```kibana-server.key``` unencrypted private key
        2. Sign your key
            1. Create X509 V3 certificate extension file for defining SAN for the certificate &rarr; ```kibana.ext``` to certs folder
            2. ```sh
              authorityKeyIdentifier=keyid,issuer 
              basicConstraints=CA:FALSE
              keyUsage = digitalSignature, nonRepudiation, keyEncipherment, dataEncipherment
              subjectAltName = @alt_names
              [alt_names]
              DNS.1 = localhost
              ```
            3. Sign the cert by running  ```openssl x509 -req -in kibana-server.csr -CA myCA.pem -CAkey myCA.key -CAcreateserial -out kibana-server.crt -days 825 -sha256 -extfile kibana.ext``` 
        3. Copy the signed certificate crs and key to ```$KIBANA_HOME/config```
        4. Add lines ```server.ssl.certificate: $KBN_PATH_CONF/kibana-server.crt```, ```server.ssl.key: $KBN_PATH_CONF/kibana-server.key```, ```server.ssl.enabled: true```, and ```elasticsearch.hosts: ["https://localhost:9200"]``` to ```$KIBANA_HOME/config/kibana.yml```
        5. Change ```xpack.fleet.outputs``` hosts to ```["https://localhost:9200"]```
    
3. **Start Kibana, this works only if you have set the bin to your PATH**
```sh
kibana
```
- when first time you’re starting Kibana, this command generates a unique link in your terminal to enroll your Kibana instance with Elasticsearch
- In your terminal, click the generated link to open Kibana in your browser
- In your browser, paste the enrollment token that was generated in the terminal when you started Elasticsearch, and then click the button to connect your Kibana instance with Elasticsearch
- Log in to Kibana as the elastic user with the password that was generated when you started Elasticsearch
- After first login Kibana login page can be found ```https://localhost:5601```
4. **Go to Stack management &rarr; Advanced Settings**
- Disable Usage Data

5. **Generate API key**
    1. Stack management &rarr; Security &rarr; API Keys
    2. Create API key
    3. Copy the key as json to safe place


## Connecting to Elasticsearch with API key

In [1]:
import warnings
#verify_certs=False creates a warning, but we can ignore this since we take good care of our API key
warnings.filterwarnings('ignore')
from elasticsearch import Elasticsearch
from seacrets import api_key_id, api_key
es = Elasticsearch(["https://localhost:9200"],api_key=(api_key_id,api_key),verify_certs=False)
es.info()['tagline']

'You Know, for Search'

## Creating index from structured data with explicit mapping

### Get the data

In [2]:
#Download Elevate Berlin record shop catalogy
#!wget -O elevate_berlin.json https://elevate.berlin/collections/all-vinyl.oembed
import json
with open('elevate_berlin.json','r') as f:
    data = json.load(f)

In [3]:
products = data['products']
#get the keys of product dict§
print(products[0].keys())
print(products[0])

dict_keys(['product_id', 'title', 'description', 'brand', 'offers', 'thumbnail_url'])
{'product_id': 'beste-modus-07', 'title': 'Beste Modus 07 (BESTE007)', 'description': '<iframe width="100%" height="166" scrolling="no" frameborder="no" allow="autoplay" src="https://w.soundcloud.com/player/?url=https%3A//api.soundcloud.com/tracks/311536058&amp;color=%23ff5500&amp;auto_play=false&amp;hide_related=true&amp;show_comments=false&amp;show_user=false&amp;show_reposts=false&amp;show_teaser=false"></iframe>\n<p>\xa0</p>\n<meta charset="utf-8">\n<div class="BigCoverZoom Mytransition">\n<div id="mainfinfo_left" class="contrastBG">\n<div class="DT_oneline">\n<div class="detail_artist lightCol mainColBG"></div>\n<div class="detail_artist lightCol mainColBG"></div>\n<div class="detail_artist lightCol mainColBG">A1: Cinthie - Back to garage</div>\n</div>\n</div>\n<div class="goback contrastBG">A2: Cinthie - Hold\'em</div>\n<div class="goback contrastBG">B1: stevn.aint.leavn - Isn\'t it</div>\n<div 

### Define settings and mapping


#### Settings
- Settings contain information about index itself

```python
{"settings": {
  "number_of_shards": 1,
  "number_of_replicas": 0
  "index": {
      "similarity": {
        "default": {
          "type": "boolean"
      }
    }
  }
}
            
            
}
```
1. number_of_shards &rarr; for production
2. number_of_replicas &rarr; for production
3. index
    1. similarity &rarr; defining document similarity
    
#### Mappings
- Mapping is the schema of document &rarr; how a document and the fields it contais are stored and indexed
- Each document is collection of fields which have their own datatype
- Use explicit mapping when you know more about your data than Elasticsearch can guess
- It is often useful to index the same field in different ways for different purposes
- In Elasticsearch, objects are mapped implicitly by using the “properties” mapping parameter at each level of the hierarchy

```python
"mappings": {"dynamic": "strict",
             "coerce": True,
                "properties": {
                    "product_id": {
                        "type": "text"
                    },
                    "title": {
                        "type": "text"
                    },
                    "description": {
                        "type": "text"
                    },
                    "brand": {
                        "type": "keyword"
                    },
                    "offers": {
                        "type": "nested",
                        "properties": {
                            "title":    { "type": "text"  },
                            "offer_id": { "type": "text"  },
                            "sku":     { "type": "integer"   },
                            "price":   { "type": "integer"   },
                            "currency_core":    { "type": "string"    }
                            "in_stoc":    { "type": "boolean"    }
                    ,
                    "thumbnail_url": {
                        "type": "text"
                    }
                }
            }
        }
```
1. dynamic: strict &rarr;  rejects the document if Elasticsearch encounters an unknown field
2. coare: true  &rarr; coercion is the conversion of one type of object to a another object of a different type with similar content
1. type: keyword &rarr; text specialization which we need to look for exact values &rarr; used to sort, aggregate and filter documents
2. type: nested &rarr; store multiple values for a field &rarr; flattens object hierarchies into a simple list of field names and values

In [4]:
#settings and mappings
settings = {
        "settings": {
            
            "number_of_shards": 1,
            "number_of_replicas": 0
        },
        "mappings": {
                "properties": {
                    "product_id": {
                        "type": "text"
                    },
                    "title": {
                        "type": "text"
                    },
                    "description": {
                        "type": "text"
                    },
                    "brand": {
                        "type": "keyword"
                    },
                    "offers": {
                        "type": "nested",
                        "properties": {
                            "title":    {"type": "text"},
                            "offer_id": {"type": "text"},
                            "sku":     {"type": "integer"},
                            "price":   {"type": "float"},
                            "currency_core":    {"type": "text"},
                            "in_stoc":    {"type": "boolean"},
                    
                    "thumbnail_url": {
                        "type": "text"
                    }
                }
            }
        }
}
    
}
#function to create index
def create_index(es_object,settings_dict,index_name):
    created=False
    if not es_object.indices.exists(index=index_name):
        es_object.indices.create(index=index_name,body=settings_dict)
        created = True
        
    return created



In [5]:
#create index
print(f'Index created: {create_index(es,settings,"record_data")}')
#delete index
#es.indices.delete(index='record_data', ignore=[400, 404])
#see if you return the correct mapping
print(es.indices.get_mapping(index='record_data'))

Index created: False
{'record_data': {'mappings': {'properties': {'brand': {'type': 'keyword'}, 'description': {'type': 'text'}, 'offers': {'type': 'nested', 'properties': {'currency_code': {'type': 'text', 'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}}, 'currency_core': {'type': 'text'}, 'in_stoc': {'type': 'boolean'}, 'in_stock': {'type': 'boolean'}, 'offer_id': {'type': 'text'}, 'price': {'type': 'float'}, 'sku': {'type': 'integer'}, 'thumbnail_url': {'type': 'text'}, 'title': {'type': 'text'}}}, 'product_id': {'type': 'text'}, 'thumbnail_url': {'type': 'text', 'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}}, 'title': {'type': 'text'}}}}}


In [7]:
#add data to index
def store_document(elastic_object, index_name, document,id):
    if elastic_object.exists(index=index_name,id=id):
        print(f"Document id: {id} already exists in index")
    else:
        elastic_object.index(index=index_name, body=document,id=id)
        print(f"Document id: {id} added to index")

In [None]:
#store all documents from our data, each record should be as dict
for i,d in enumerate(products):
    store_document(es,"record_data",d,i)

In [8]:
#get document with specific id
print(es.get(index="record_data", id=1))

{'_index': 'record_data', '_id': '1', '_version': 1, '_seq_no': 1, '_primary_term': 1, 'found': True, '_source': {'product_id': 'beste-modus-06', 'title': 'Beste Modus 06 [BESTE006]', 'description': '<iframe width="100%" height="166" scrolling="no" frameborder="no" allow="autoplay" src="https://w.soundcloud.com/player/?url=https%3A//api.soundcloud.com/tracks/226301864&amp;color=%23ff5500&amp;auto_play=false&amp;hide_related=true&amp;show_comments=false&amp;show_user=false&amp;show_reposts=false&amp;show_teaser=false"></iframe>\n<p>\xa0</p>\n<meta charset="utf-8"><meta charset="utf-8">\n<div class="BigCoverZoom Mytransition">\n<div id="mainfinfo_left" class="contrastBG">\n<div class="DT_oneline">\n<div class="detail_artist lightCol mainColBG"></div>\n</div>\n<div class="DT_oneline">\n<div class="followMe LAFollow " data-folge="beste-modus" data-art="label" onclick="toggleFollowMe(this);"></div>\n<div class="followMe LAFollow " id="la_c6a-16" data-folge="beste-modus" data-art="label" onc

## Searching with Query DSL
- DSL queries contais two clauses
    1. leaf query clauses that look for a specific value in a specific field (e.g. a ```match``` or ```range```)
    2. compound query clauses that are used to logically combine multiple queries (such as multiple leaf or compound queries) or to alter the behaviour of these queries
- When you run a query against your index (or indices), ES sorts the results by a relevance score (a float) that represents the quality of the match (the _score field shows its value for each “hit”

In [9]:
#search
def search(es_object, index_name, search):
    res = es_object.search(index=index_name, body=search)
    res = dict(res)
    return res

### Query and filter context
- In the query context, a query clause answers the question “How well does this document match this query clause?” 
- In a filter context, a query clause answers the question “Does this document match this query clause?”
- Below example query matches documents where all following conditions are met
- bool &rarr; The default query for combining multiple leaf or compound query clauses, as must, should, must_not, or filter clauses
- must &rarr; The clause (query) must appear in matching documents and will contribute to the score.
- match &rarr; The standard query for performing full text queries, including fuzzy matching and phrase or proximity queries
- term &rarr; returns documents that contain an exact term in a provided field
- range &rarr; Returns documents that contain terms within a provided range, gt Greater than, gte Greater than or equal to, lt Less than, lte, Less than or equal to.

### Nested query

- If you need to index arrays of objects and to maintain the independence of each object in the array, use the nested data type
- You can now use a multi-level nested query to match documents 



In [10]:
import re
tag_re = re.compile(r'(<!--.*?-->|<[^>]*>)')
# Remove well-formed tags, fixing mistakes by legitimate users
def query_maker(word,price):
    query={
        "query": {
        "bool": {
            "must": [
                {
                    "match": {
                        "description": word
                    }
                },
                {
                    "nested": {
                        "path": "offers",
                        "query": {
                            "bool": {
                                "must": [
                                    {
                                        "term": {
                                            "offers.in_stock":True
                                        }
                                    },
                                    {
                                        "range": {
                                            "offers.price": {
                                                "gte": price
                                            }
                                        }
                                    }
                                ]
                            }
                        }
                    }
                }
            ]
        }
    }
    }
    return query

1. ```description``` must contain "detroit"
2. ```in_stoct``` must be true &rarr;this is inside nested object, so nested query must be used
3. ```price``` must be 15 or larger &rarr; this is inside nested object, so nested query must be used

In [11]:
#make the query
query = query_maker("detroit",15.0)
#search
result=search(es,'record_data',query)
found_hits =result["hits"]['total']['value']
if found_hits>0:
    result=result["hits"]['hits']
    if found_hits==1:
        temp=result[0]
        title = temp['_source']['title']
        description=temp['_source']['description']
        price = temp['_source']['offers'][0]['price']
        # Remove well-formed tags, fixing mistakes by legitimate users
        no_tags_d = tag_re.sub('', description)
        no_tags_d = no_tags_d.strip()
        no_tags_t = tag_re.sub('', title)
        print(f"Title: {no_tags_t}")
        print(f"Description: {no_tags_d}")
        print(f"Price: {price}")

    else:
        for i,res in enumerate(result):
            temp=result[i]
            title = temp['_source']['title']
            description=temp['_source']['description']
            price = temp['_source']['offers'][0]['price']
            # Remove well-formed tags, fixing mistakes by legitimate users
            no_tags_d = tag_re.sub('', description)
            no_tags_d = no_tags_d.strip()
            no_tags_t = tag_re.sub('', title)
            print(f"Title: {no_tags_t} \n")
            print(f"Description: {no_tags_d} \n")
            print(f"Price: {price}")
            
else:
    print("No results found")

Title: V/A - Unity Vol.1 (Compiled by Norm Talley) (UAR005)
Description: Detroit Deep House from New label Upstairs Asylum Recordings.
Price: 30.0
