# ETL Data Pipeline: Tracking User Activity
-------

In this project, I will use a nested json data extracted from an ed tech firm database, who provides service that delivers various assessments to different customers in Tech. I will explain the detailed steps on how to extract, transform and load (ETL) the data through the pipeline and prepare the data ready for data scientists to run queries on.

The main goal of this project is explaining the pipeline as demonstrated in the following steps.


### Tasks

Prepare the infrastructure to land the data in the form and structure it needs to be in order to run queries. I will perform the following tasks:

- Publish and consume messages with Kafka, the message sent is the nested json data.
- Use Spark to transform the messages so that they can be landed in Hadoop (HDFS)
- Run queries, brief analysis of the transformed data  


### Data

To get the data, run in the CLI
```
curl -L -o assessment-attempts-20180128-121051-nested.json https://goo.gl/ME6hjp
```

Note on the data: This dataset is a nested JSON file, where it will need to be unwraped carefully to understand what's really being displayed.


## I. Publish & Consume massages with Kafka

### 1. Spin up the pipeline

Steps to spin up the pipeline: start with setting up the enviroment using docker-compose

* Navigate to the project folder where the docker-compose.yml file is stored
* Explanation about the docker-compose.yml file: It includes the following containers
    - zookeeper: Docker image for running Zookeeper, set expose to port 32181
    - kafka: Docker image for running Kafka, have single broker cluster with broker id = 1, set depends on zookeeper and connect to zookeeper through port 32181, expose to port 29092
    - cloudera: Docker image for running Hadoop
    - spark: Docker image for running Spark with Python, set depends on cloudera connect with Hadoop using namenode cloudera, expose to port 8888 for running Jupyter notebook
    - mids: Docker image for running Linux, jq, Python with datascience libraries


* Check what container exist before spinning up the cluster:

```
docker ps
```

* Spin up multiple Docker containers from the docker-compose.yml file, running the containers in the background:

```
docker-compose up -d
```

* Check what containers are running after spinning up the cluster:

```
docker-compose ps
```

* Look at the logs to check Kafka broker:  

```
docker-compose logs -f kafka

```

* Check (list) what files/ folders are currently in the /tmp/ directory in Hadoop:

```
docker-compose exec cloudera hadoop fs -ls /tmp/
```

### 2. Initial data exploratory with jq to understand the context

* Look through the structure of the data:
    
    - Code summary: Using jq pretty print display the first 200 lines of the data

```
cat assessment-attempts-20180128-121051-nested.json | jq . | head -n200
```

* Explore question: How many assessments in the data set?

    - Code summary: Print each json object in each line, and count the number of lines 
    
```
cat assessment-attempts-20180128-121051-nested.json | jq '.[]' -c | wc -l
```

* Explore question: What are the most common courses?

    - Code summary: Print the value of key `exam_name` in each line, get the unique values for exam_name, sort them from bigger to smaller and display the first 10 values.

```
cat assessment-attempts-20180128-121051-nested.json | jq '.[]|.exam_name' -c | sort | uniq -c | sort -gr |  head -10
```

### 3. Create a Kafka topic & publish messages

* Decide Kafka topic name: 
    - the data exploratory using jq show the data contain records of different exams that users took. Each exam recorded the exam name, exam id, user id, when they took the exam, and details of the exam result (each question's result, total of questions, and correct answers, etc.)
    - Therefore, I name the Kafka topic for this project `assessments` to reflect the summary of the data
    
    
* Create a Kafka topic:

    - Code summary: Running kafka. Create a Kafka topic name `assessments` with 1 partition, set replication factor to 1, and connect with zookeeper through port 32181

```
docker-compose exec kafka \
  kafka-topics \
    --create \
    --topic assessments \
    --partitions 1 \
    --replication-factor 1 \
    --if-not-exists \
--zookeeper zookeeper:32181
```

* Check the topic created:

    - Code summary: Running kafka, display an overview of topic `assessments`

```
docker-compose exec kafka \
  kafka-topics \
    --describe \
    --topic assessments \
--zookeeper zookeeper:32181
```

* Publish messages using the assessment data .json file:

    - Code summary: Using kafkacat, the producer (`P`) publishes messages to the kafka topic `assessments` through port 29092, each message is created from printting out a line object in the json data.

```
docker-compose exec mids \
  bash -c "cat /w205/project-2-latuyetmai/assessment-attempts-20180128-121051-nested.json \
    | jq '.[]' -c \
    | kafkacat -P -b kafka:29092 -t assessments"
```


### 4. Spin up Spark pyspark and open Jupyter notebook 

   - Code summary: Running spark pyspark (spark with python), using jupyter notebook connect on port 8888

```
docker-compose exec spark env PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS='notebook --no-browser --port 8888 --ip 0.0.0.0 --allow-root --notebook-dir=/ETL_Pipeline_Tracking_User_Activities' pyspark
```

## II. Use Spark to transform the messages

### 1. Consume the messages from Kafka

   - Code summary: Using spark consuming the messages from kafka topic `assessments`, connect to kafka through port 29092. Read the messages from earliest to latest offset.  

In [1]:
# Consume the messages from kafka topic `assessments`
raw_data = spark \
  .read \
  .format("kafka") \
  .option("kafka.bootstrap.servers", "kafka:29092") \
  .option("subscribe","assessments") \
  .option("startingOffsets", "earliest") \
  .option("endingOffsets", "latest") \
  .load()

In [2]:
# Check the data schema
raw_data.printSchema()

root
 |-- key: binary (nullable = true)
 |-- value: binary (nullable = true)
 |-- topic: string (nullable = true)
 |-- partition: integer (nullable = true)
 |-- offset: long (nullable = true)
 |-- timestamp: timestamp (nullable = true)
 |-- timestampType: integer (nullable = true)



In [3]:
# See raw messages
raw_data.show(5)

+----+--------------------+-----------+---------+------+--------------------+-------------+
| key|               value|      topic|partition|offset|           timestamp|timestampType|
+----+--------------------+-----------+---------+------+--------------------+-------------+
|null|[7B 22 6B 65 65 6...|assessments|        0|     0|1969-12-31 23:59:...|            0|
|null|[7B 22 6B 65 65 6...|assessments|        0|     1|1969-12-31 23:59:...|            0|
|null|[7B 22 6B 65 65 6...|assessments|        0|     2|1969-12-31 23:59:...|            0|
|null|[7B 22 6B 65 65 6...|assessments|        0|     3|1969-12-31 23:59:...|            0|
|null|[7B 22 6B 65 65 6...|assessments|        0|     4|1969-12-31 23:59:...|            0|
+----+--------------------+-----------+---------+------+--------------------+-------------+
only showing top 5 rows



### 2. Transform the messages with Spark

In [4]:
# cache() method save the dataset to storage level 
raw_data.cache()

DataFrame[key: binary, value: binary, topic: string, partition: int, offset: bigint, timestamp: timestamp, timestampType: int]

In [5]:
# cast binary data as string
raw_assessments = raw_data.select(raw_data.value.cast('string'))
raw_assessments.show(5)

+--------------------+
|               value|
+--------------------+
|{"keen_timestamp"...|
|{"keen_timestamp"...|
|{"keen_timestamp"...|
|{"keen_timestamp"...|
|{"keen_timestamp"...|
+--------------------+
only showing top 5 rows



In [6]:
# Check the data contained in the first row for understanding the data to unwrap later 
raw_data.select('value').take(1)[0].value

bytearray(b'{"keen_timestamp":"1516717442.735266","max_attempts":"1.0","started_at":"2018-01-23T14:23:19.082Z","base_exam_id":"37f0a30a-7464-11e6-aa92-a8667f27e5dc","user_exam_id":"6d4089e4-bde5-4a22-b65f-18bce9ab79c8","sequences":{"questions":[{"user_incomplete":true,"user_correct":false,"options":[{"checked":true,"at":"2018-01-23T14:23:24.670Z","id":"49c574b4-5c82-4ffd-9bd1-c3358faf850d","submitted":1,"correct":true},{"checked":true,"at":"2018-01-23T14:23:25.914Z","id":"f2528210-35c3-4320-acf3-9056567ea19f","submitted":1,"correct":true},{"checked":false,"correct":true,"id":"d1bf026f-554f-4543-bdd2-54dcf105b826"}],"user_submitted":true,"id":"7a2ed6d3-f492-49b3-b8aa-d080a8aad986","user_result":"missed_some"},{"user_incomplete":false,"user_correct":false,"options":[{"checked":true,"at":"2018-01-23T14:23:30.116Z","id":"a35d0e80-8c49-415d-b8cb-c21a02627e2b","submitted":1},{"checked":false,"correct":true,"id":"bccd6e2e-2cef-4c72-8bfa-317db0ac48bb"},{"checked":true,"at":"2018-01-23T14:23:41

In [7]:
# Extracted data with map function and json.loads, convert the data to dataframe
import json
from pyspark.sql import Row
extracted_data = raw_assessments.rdd.map(lambda x: Row(**json.loads(x.value))).toDF()

In [8]:
extracted_data.show(5)

+--------------------+-------------+--------------------+------------------+--------------------+------------------+------------+--------------------+--------------------+--------------------+
|        base_exam_id|certification|           exam_name|   keen_created_at|             keen_id|    keen_timestamp|max_attempts|           sequences|          started_at|        user_exam_id|
+--------------------+-------------+--------------------+------------------+--------------------+------------------+------------+--------------------+--------------------+--------------------+
|37f0a30a-7464-11e...|        false|Normal Forms and ...| 1516717442.735266|5a6745820eb8ab000...| 1516717442.735266|         1.0|Map(questions -> ...|2018-01-23T14:23:...|6d4089e4-bde5-4a2...|
|37f0a30a-7464-11e...|        false|Normal Forms and ...| 1516717377.639827|5a674541ab6b0a000...| 1516717377.639827|         1.0|Map(questions -> ...|2018-01-23T14:21:...|2fec1534-b41f-441...|
|4beeac16-bb83-4d5...|        false

In [9]:
# show Schema of the extracted data
extracted_data.printSchema()

root
 |-- base_exam_id: string (nullable = true)
 |-- certification: string (nullable = true)
 |-- exam_name: string (nullable = true)
 |-- keen_created_at: string (nullable = true)
 |-- keen_id: string (nullable = true)
 |-- keen_timestamp: string (nullable = true)
 |-- max_attempts: string (nullable = true)
 |-- sequences: map (nullable = true)
 |    |-- key: string
 |    |-- value: array (valueContainsNull = true)
 |    |    |-- element: map (containsNull = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: boolean (valueContainsNull = true)
 |-- started_at: string (nullable = true)
 |-- user_exam_id: string (nullable = true)



### 3. Run queries with Spark to answer business questions

####  Question 1: How many assesstments are in the dataset?

* Answer: 3,280 assessments

In [10]:
extracted_data.count()

3280

#### Question 2: How many people took *Learning Git*?

* Answer: 394 people took *Learning Git*

In [11]:
# Create a temp table name assessments for running spart.sql queries
extracted_data.registerTempTable('assessments')

In [12]:
spark.sql("select * \
    from (select exam_name, count(keen_id) as number_of_taken \
    from assessments group by exam_name) \
    where exam_name = 'Learning Git'").show()

+------------+---------------+
|   exam_name|number_of_taken|
+------------+---------------+
|Learning Git|            394|
+------------+---------------+



#### Question 3: What are the least common courses taken? 

* Answer:
    - Learning to Visualize Data with D3.js
    - Nulls, Three-valued Logic and Missing Information
    - Native Web Apps for Android
    - Operating Red Hat Enterprise Linux Servers

In [13]:
spark.sql("with t1 as \
    (select exam_name, count(keen_id) as number_of_taken \
    from assessments group by exam_name),\
    t2 as \
    (select min(number_of_taken) as min_taken\
    from t1) \
    select t1.exam_name, t1.number_of_taken \
    from t1 inner join t2 on t1.number_of_taken = t2.min_taken").show(10, False)

+-------------------------------------------------+---------------+
|exam_name                                        |number_of_taken|
+-------------------------------------------------+---------------+
|Learning to Visualize Data with D3.js            |1              |
|Nulls, Three-valued Logic and Missing Information|1              |
|Native Web Apps for Android                      |1              |
|Operating Red Hat Enterprise Linux Servers       |1              |
+-------------------------------------------------+---------------+



#### Question 4. What are the most common courses taken? 

* Answer:
    - Learning Git

In [14]:
spark.sql("with t1 as \
    (select exam_name, count(keen_id) as number_of_taken \
    from assessments group by exam_name),\
    t2 as \
    (select max(number_of_taken) as max_taken\
    from t1) \
    select t1.exam_name, t1.number_of_taken \
    from t1 inner join t2 on t1.number_of_taken = t2.max_taken").show(10, False)

+------------+---------------+
|exam_name   |number_of_taken|
+------------+---------------+
|Learning Git|394            |
+------------+---------------+



* **Observation:**
    - Could not access the data in the `sequences` column with the map lambda function above. Expected that the keys and values in the nested data do not always exist in all rows and therefore create errors. A customized function will be needed in order to extract keys and values in the `sequences` column.


## III. Land the transformed messages in HDFS

In [15]:
# write data to parquet file and land in Hadoop /tmp/ directory
extracted_data.write.parquet("/tmp/extracted_assessments")

* Check what data are currently in Hadoop after landing the parquet file:

    Note: running the following command line in a different terminal window
    
    - Code summary: list all files in hadoop /tmp/ directory

```
docker-compose exec cloudera hadoop fs -ls /tmp/
```

* Further check data in the extracted_assessments folder

    - Code summary: list all files in hadoop /tmp/extracted_assessments directory with human readable

```
docker-compose exec cloudera hadoop fs -ls -h /tmp/extracted_assessments
```

In [16]:
# Using Spark to read back the parquet file from Hadoop to check if the data was written as expected 
df = spark.read.parquet("/tmp/extracted_assessments")

In [17]:
df.show(5)

+--------------------+-------------+--------------------+------------------+--------------------+------------------+------------+--------------------+--------------------+--------------------+
|        base_exam_id|certification|           exam_name|   keen_created_at|             keen_id|    keen_timestamp|max_attempts|           sequences|          started_at|        user_exam_id|
+--------------------+-------------+--------------------+------------------+--------------------+------------------+------------+--------------------+--------------------+--------------------+
|37f0a30a-7464-11e...|        false|Normal Forms and ...| 1516717442.735266|5a6745820eb8ab000...| 1516717442.735266|         1.0|Map(questions -> ...|2018-01-23T14:23:...|6d4089e4-bde5-4a2...|
|37f0a30a-7464-11e...|        false|Normal Forms and ...| 1516717377.639827|5a674541ab6b0a000...| 1516717377.639827|         1.0|Map(questions -> ...|2018-01-23T14:21:...|2fec1534-b41f-441...|
|4beeac16-bb83-4d5...|        false

## IV. Further extract nested data & answer more business questions

### 1. Extract nested data with custom function

In [18]:
# Build customize function to access data in the nested sequences column. 
# Example of what data included in the key "counts" in "sequences" as follow:
#"counts":{"incomplete":1,"submitted":4,"incorrect":1,"all_correct":false,"correct":2,"total":4,"unanswered":0}

import numpy as np

def extract_sequences(row):
    exams = json.loads(row.value)
    
    exams_data = {"base_exam_id": exams["base_exam_id"],
                  "certification": exams["certification"],
                  "exam_name": exams["exam_name"],
                  "keen_created_at": exams["keen_created_at"],
                  "keen_id": exams["keen_id"],
                  "keen_timestamp": exams["keen_timestamp"],
                  "max_attempts": exams["max_attempts"],
                  "started_at": exams["started_at"],
                  "user_exam_id": exams["user_exam_id"],
                  "question_attempt": np.NaN,
                  "question_id": np.NaN,
                  "question_correct": np.NaN,
                  "question_number": np.NaN,
                  "score": np.NaN                  
                 }
    
    # Unwrap sequences column
    if "sequences" in exams.keys():
        if "attempt" in exams["sequences"].keys():
            exams_data["question_attempt"] = exams["sequences"]["attempt"]

        if "id" in exams["sequences"].keys():
            exams_data["question_id"] = exams["sequences"]["id"]

        if "questions" in exams["sequences"].keys():
            exams_data["questions"] = exams["sequences"]["questions"]                   

        if "counts" in exams["sequences"].keys():
            if "correct" in exams["sequences"]["counts"].keys():
                exams_data["question_correct"] = exams["sequences"]["counts"]["correct"]
                
            if "total" in exams["sequences"]["counts"].keys():
                exams_data["question_number"] = exams["sequences"]["counts"]["total"]

        if exams_data["question_number"] > 0:
            exams_data["score"] = 100* exams_data["question_correct"] / exams_data["question_number"]
       

    return Row(**exams_data)

In [19]:
# Create a new dataframe after applying customized extracted function
new_df = raw_assessments.rdd.map(extract_sequences).toDF()

In [20]:
new_df.show(5)

+--------------------+-------------+--------------------+------------------+--------------------+------------------+------------+----------------+----------------+--------------------+---------------+--------------------+-----+--------------------+--------------------+
|        base_exam_id|certification|           exam_name|   keen_created_at|             keen_id|    keen_timestamp|max_attempts|question_attempt|question_correct|         question_id|question_number|           questions|score|          started_at|        user_exam_id|
+--------------------+-------------+--------------------+------------------+--------------------+------------------+------------+----------------+----------------+--------------------+---------------+--------------------+-----+--------------------+--------------------+
|37f0a30a-7464-11e...|        false|Normal Forms and ...| 1516717442.735266|5a6745820eb8ab000...| 1516717442.735266|         1.0|               1|               2|5b28a462-7a3b-42e...|      

In [21]:
# Double check how many assessments in the further extracted dataset
new_df.count()

3280

In [22]:
# Print new dataframe schema
new_df.printSchema()

root
 |-- base_exam_id: string (nullable = true)
 |-- certification: string (nullable = true)
 |-- exam_name: string (nullable = true)
 |-- keen_created_at: string (nullable = true)
 |-- keen_id: string (nullable = true)
 |-- keen_timestamp: string (nullable = true)
 |-- max_attempts: string (nullable = true)
 |-- question_attempt: long (nullable = true)
 |-- question_correct: long (nullable = true)
 |-- question_id: string (nullable = true)
 |-- question_number: long (nullable = true)
 |-- questions: array (nullable = true)
 |    |-- element: map (containsNull = true)
 |    |    |-- key: string
 |    |    |-- value: boolean (valueContainsNull = true)
 |-- score: double (nullable = true)
 |-- started_at: string (nullable = true)
 |-- user_exam_id: string (nullable = true)



In [23]:
# Create a temp table for running spark.sql queries
new_df.registerTempTable('unwrap_assessments')

### 3. Run queries with Spark to answer more business questions

#### Question 5: How many different exams in the data set?

* Answer: 107 exams

In [24]:
spark.sql("select count(distinct base_exam_id) as total_exams \
    from unwrap_assessments").show()

+-----------+
|total_exams|
+-----------+
|        107|
+-----------+



#### Question 6: What are the top 10 most difficult exams (lowest score) among the common exams (>50 people take them)?

In [25]:
spark.sql("select exam_name, \
    round(avg(score),1) as average_score, \
    count(keen_id) as number_of_taken \
    from unwrap_assessments group by exam_name \
    having number_of_taken >= 50 \
    order by average_score, number_of_taken desc").show(10, False)

+-----------------------------------------------------------+-------------+---------------+
|exam_name                                                  |average_score|number_of_taken|
+-----------------------------------------------------------+-------------+---------------+
|Software Architecture Fundamentals Understanding the Basics|47.9         |109            |
|Intermediate Python Programming                            |51.3         |158            |
|Learning to Program with R                                 |54.5         |128            |
|Beginning C# Programming                                   |55.5         |95             |
|Learning Linux System Administration                       |55.5         |59             |
|Introduction to Python                                     |56.7         |162            |
|Mastering Git                                              |58.8         |77             |
|Practical Java Programming                                 |59.4         |53   

#### Question 7: What are the top 10 easiest exams (highest score)  among the common exams (>50 people take them)?

In [26]:
spark.sql("select exam_name, \
    round(avg(score),1) as average_score, \
    count(keen_id) as number_of_taken \
    from unwrap_assessments group by exam_name \
    having number_of_taken >= 50 \
    order by average_score desc").show(10, False)

+--------------------------------------------------------------+-------------+---------------+
|exam_name                                                     |average_score|number_of_taken|
+--------------------------------------------------------------+-------------+---------------+
|Introduction to Java 8                                        |87.6         |158            |
|Beginning Programming with JavaScript                         |76.6         |79             |
|Python Epiphanies                                             |74.2         |51             |
|Learning SQL                                                  |73.7         |57             |
|Advanced Machine Learning                                     |72.4         |67             |
|Learning Eclipse                                              |70.6         |85             |
|Introduction to Machine Learning                              |68.7         |119            |
|Learning Git                                     

#### Question 8: What are the top 10 exams having the most questions?

In [27]:
spark.sql("select exam_name, \
    max(question_number) as number_of_question, \
    count(keen_id) as number_of_taken \
    from unwrap_assessments group by exam_name \
    having number_of_question > 0 \
    order by number_of_question desc").show(10, False)

+--------------------------------------------+------------------+---------------+
|exam_name                                   |number_of_question|number_of_taken|
+--------------------------------------------+------------------+---------------+
|Operating Red Hat Enterprise Linux Servers  |20                |1              |
|Great Bash                                  |10                |14             |
|Learning Linux System Administration        |8                 |59             |
|Being a Better Introvert                    |7                 |10             |
|What's New in JavaScript                    |7                 |2              |
|Learning to Program with R                  |7                 |128            |
|Introduction to Data Science with R         |7                 |43             |
|Understanding the Grails 3 Domain Model     |7                 |2              |
|Using Web Components                        |6                 |3              |
|Introduction to

#### Question 9: Top 10 who take the most exams ranking by average score?

In [28]:
spark.sql("select user_exam_id, \
    count(keen_id) as number_of_exams_taken, \
    round(avg(score),1) as average_score \
    from unwrap_assessments group by user_exam_id \
    order by number_of_exams_taken desc, average_score desc").show(10, False)

+------------------------------------+---------------------+-------------+
|user_exam_id                        |number_of_exams_taken|average_score|
+------------------------------------+---------------------+-------------+
|a7e6fc04-245f-4e3c-9539-e2aac44c0eb8|3                    |100.0        |
|949aa36c-74c7-4fc1-a41f-42386c1beb37|3                    |100.0        |
|b7ac6d15-97e1-4e94-a09d-da819024b8cd|3                    |100.0        |
|bd96cfbe-1532-4ba2-a504-7e8a437a5065|3                    |100.0        |
|d4ab4aeb-1368-4866-bc5e-7eee69fd1608|3                    |100.0        |
|fa23b287-0d0a-4683-8d19-38a65b7f57d1|3                    |100.0        |
|37cf5b0c-4807-4214-8426-fb1731b57700|3                    |100.0        |
|cdc5859d-b332-4fb1-aae4-5cacb52cea5f|3                    |80.0         |
|1e325cc1-47a9-4808-8f6b-508b5459ed6d|3                    |75.0         |
|66d91177-c436-4ee1-b0b0-daa960e1b2d0|3                    |75.0         |
+------------------------

### 4. Land further extracted data in HDFS

In [29]:
# write data to parquet file and land in Hadoop /tmp/ directory
new_df.write.parquet("/tmp/futher_unnested_assessments")

* Check what data are currently in Hadoop after landing the parquet file:

    Note: running the following command line in a different terminal window
    
    - Code explanation: list all files in hadoop /tmp/ directory

```
docker-compose exec cloudera hadoop fs -ls /tmp/
```

* Further check data in the futher_unnested_assessments folder

    - Code explanation: list all files in hadoop /tmp/futher_unnested_assessments directory with human readable
    
```
docker-compose exec cloudera hadoop fs -ls -h /tmp/futher_unnested_assessments
```


In [30]:
# Using Spark to read back the parquet file from Hadoop to check if the data was written as expected 
df2 = spark.read.parquet("/tmp/futher_unnested_assessments")

In [31]:
df2.show(5)

+--------------------+-------------+--------------------+------------------+--------------------+------------------+------------+----------------+----------------+--------------------+---------------+--------------------+-----+--------------------+--------------------+
|        base_exam_id|certification|           exam_name|   keen_created_at|             keen_id|    keen_timestamp|max_attempts|question_attempt|question_correct|         question_id|question_number|           questions|score|          started_at|        user_exam_id|
+--------------------+-------------+--------------------+------------------+--------------------+------------------+------------+----------------+----------------+--------------------+---------------+--------------------+-----+--------------------+--------------------+
|37f0a30a-7464-11e...|        false|Normal Forms and ...| 1516717442.735266|5a6745820eb8ab000...| 1516717442.735266|         1.0|               1|               2|5b28a462-7a3b-42e...|      

### 5. Spin down Docker containers

* Spin down docker containers from the docker-compose.yml file:

```
docker-compose down
```

* Check if the containers are down:

```
docker-compose ps
```