# Getting Started with Elasticsearch for Spatial Analysis

<span style="background:yellow">Introduction & background on Elasticsearch and document databases...</span>

## Launch Elasticsearch locally using Docker

Elasticsearch contains multiple services/components that need to communicate with each other.  This is hard to accomplish when using isolated Docker containers, as these containers are generally not set up to be mutually accessible to each other.  Instead, it is easier to use Docker Compose, a container orchestration utility that allows you to run multiple, linked services within networked containers that can communicate with each other.

```bash
docker-compose -f elasticsearch-docker-compose.yml up -d
```

-f specifies the filename of the .yml file that describes the cluster of services we want to run

-d tells Docker Compose to run the cluster in detached mode, so it runs in the background even if you quit your console

In [None]:
```bash
docker volume create elasticsearch_volume
```

<hr>

## Connect to the database

<hr/>

## Load data into the database

In [None]:
### Create an index and define its mappings

PUT twitter_sample
{
    "settings": {
        "number_of_shards": 1,
        "number_of_replicas": 0
    },
    "mappings": {
        "tweet": {
            "_source": { "enabled": true },
            "properties": {
                "text": {"type": "text" },
                "timestamp_ms": { "type": "date", "format": "epoch_millis" },
                "user": { 
                    "properties": { 
                        "location": { "type": "text" },
                        "description": { "type": "text" }
                    }
                },
                "place": { 
                    "properties": { 
                        "name": { "type": "keyword" },
                        "full_name": { "type": "keyword" },
                        "centroid": { "type": "geo_shape" },
                        "better_bounding_box": { "type": "geo_shape" },
                        "centroid_geohash": { "type": "geo_point" }
                    }
                }
            }
        }
    }
}

### Execute load scripts

<span style="background:yellow">Explain bulk insert functionality</span>

In [1]:
import Clean_Load_Scripts as cleanNLoad

In [2]:
#data_folder = '/Users/linkalis/Desktop/twitter_data/twitter_sample_5GB_split/'
#logs_folder = '/Users/linkalis/Desktop/twitter_data/twitter_sample_5GB_split/logs/'

data_folder = '/Users/linkalis/Desktop/twitter_data/twitter_sample_500MB_5000_split/'
logs_folder = '/Users/linkalis/Desktop/twitter_data/twitter_sample_500MB_5000_split/logs/'

In [3]:
extractor = cleanNLoad.Extractor(data_folder, logs_folder, initialize=True)

In [4]:
while extractor.next_file_available():
    next_file_data, next_file_name = extractor.get_next_file() # read in the next file
    cleaner = cleanNLoad.Cleaner(next_file_data, next_file_name, logs_folder) # clean the data (fix bounding boxes, add centroids, etc.)
    cleaned_data = cleaner.clean_data() 
    loader = cleanNLoad.Loader(cleaned_data, next_file_name, logs_folder) # initialize the loader
    loader.get_connection("elasticsearch", "localhost", "9200", db_name="twitter_small_with_source") # create a database connection
    #loader.load_batch_data() # load the file's data as a batch
    loader.load_data()

Extractor: Next file is: 500M_unicode_splitac.json
Extractor: Reading file: /Users/linkalis/Desktop/twitter_data/twitter_sample_500MB_5000_split/500M_unicode_splitac.json
Extractor: Read 5000 data rows.
Cleaner: Finished cleaning records.
Connected to ElasticSearch instance.
Checking for existence of index called: twitter_small_with_source
Loader: Loading records...
Loader: Finished loading records.
Extractor: Next file is: 500M_unicode_splitav.json
Extractor: Reading file: /Users/linkalis/Desktop/twitter_data/twitter_sample_500MB_5000_split/500M_unicode_splitav.json
Extractor: Read 5000 data rows.
Cleaner: Finished cleaning records.
Connected to ElasticSearch instance.
Checking for existence of index called: twitter_small_with_source
Loader: Loading records...
Loader: Finished loading records.
Extractor: Next file is: 500M_unicode_splitbi.json
Extractor: Reading file: /Users/linkalis/Desktop/twitter_data/twitter_sample_500MB_5000_split/500M_unicode_splitbi.json
Extractor: Read 5000 da

In [None]:
## Advanced Queries

Text searches!

## Resources

* Install Elasticsearch with Docker. [Elasticsearch documentation] https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html

* Learning Elasticstack. [Packt Publishing]

* Building an Elasticstack Index with Python. https://qbox.io/blog/building-an-elasticsearch-index-with-python