Skip to content

rwynn/monstache-showcase

Repository files navigation

Monstache showcase

This project shows how monstache can be applied to real data from data.gov. The mongoimport tool will be used to import 6.5 million records of crime data.

During the import monstache will be listening for change events on the entire MongoDB deployment and indexing those documents into Elasticsearch. Before importing monstache will do a little bit of transformation on the data using a golang plugin to enable certain aggregations in Kibana.

The golang plugin was used over a Javascript plugin after noticing a dramatic performance increase.

I recommend that your machine has at least 16GB RAM, 20GB free disk, and 4 or more CPU cores. You may be able to get away with less by decreasing the heap sizes for Elasticsearch in the docker-compose files.

First you will need to make sure you have docker and docker-compose installed. On desktop systems like Docker Desktop for Mac and Windows, Docker Compose is included as part of those desktop installs.

The versions at this project creation time were:

Client:
 Version:           18.09.3
 API version:       1.39
 Go version:        go1.10.8
 Git commit:        774a1f4
 Built:             Thu Feb 28 06:40:58 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.3
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.8
  Git commit:       774a1f4
  Built:            Thu Feb 28 05:59:55 2019
  OS/Arch:          linux/amd64
  Experimental:     false

docker-compose version 1.23.1, build b02f1306
docker-py version: 3.5.0
CPython version: 3.6.7
OpenSSL version: OpenSSL 1.1.0f  25 May 2017

Next you will want to download the public dataset. You will want the .CSV format. Please read all the rules and caveats associated with the public dataset before proceeding.

When you have downloaded this large 1.5GB file you should copy it to the following location:

monstache-showcase/mongodb/scripts/data/crimes.csv

Use the following command to note the number of documents to expect later during the import.

# subtract 1 for the csv header
wc -l monstache-showcase/mongodb/scripts/data/crimes.csv

You are now ready to run docker-compose and start the import.

cd monstache-showcase
./import-showcase.sh

The import will take a while. During the process you will a see line like this coming from mongoimport:

c-data       | 2019-03-12T20:34:57.586+0000     imported 6820156 documents

That means that all the data has been loaded into MongoDB. Now you must wait for the indexing to complete in Elasticsearch. The process will periodically query the document count in Elasticsearch.

You will see lines like this repeating forever:

c-config     | [
c-config     |   {
c-config     |     "health" : "green",
c-config     |     "status" : "open",
c-config     |     "index" : "chicago.crimes",
c-config     |     "uuid" : "4wShbV-LTq6-6paRsWataQ",
c-config     |     "pri" : "1",
c-config     |     "rep" : "0",
c-config     |     "docs.count" : "1198982",
c-config     |     "docs.deleted" : "0",
c-config     |     "store.size" : "359mb",
c-config     |     "pri.store.size" : "359mb"
c-config     |   }
c-config     | ]

The doc_count field in the response should eventually reach 1 less than the number you recorded from wc -l.

Once all the data is loaded into Elasticsearch you can bring down the containers with Ctrl-C or:

cd monstache-showcase
./stop-showcase.sh

At this point you have indexed all the data and no longer should run import-showcase.sh as that will index all the data again. The import process stores the Elasticsearch data in a docker volume so it will persist between runs until you delete the volume.

The last step is to fire up Kibana to analyze it. To do this start only Elasticsearch and Kibana with:

cd monstache-showcase
./view-showcase.sh

Once the containers are up and healthy you can go to http://localhost:5601 on the host to load Kibana and explore data.

In Kibana you can start from scratch and define an index-pattern. However, I recommend that you import the file named export.json from the root of monstache-showcase to get a head start.

To import you will want to go to Management -> Saved Objects and then click Import and upload export.json.

You will also want to go under Management -> Advanced Settings in Kibana and set Timezone for date formatting to UTC to display dates correctly.

When you are finished analyzing in Kibana you can run ./stop-showcase.sh to bring down the containers.

If you want to tear down everything and delete all the associated data you can run ./clean-showcase.sh.
This stops the containers and deletes the associated docker volumes.

Please open an issue with any feedback you might have. Thanks!