# LocalCart scenario part 2: MessageHub to CSV streaming pipelines

<a id="intro"></a>
## Introduction


A web or mobile app will trigger events as a user navigates a web site. These clickstream events indicate when a customer logs in, adds something to a basket, completes an order, and logs out. The events are placed into configured Message Hub (Apache Kafka) that provides a scalable way to buffer the data before it is saved, analysed, and rendered. 

[Notebook #1 - Creating a Kafka Producer of ClickStream events](https://apsportal.ibm.com/analytics/notebooks/c3aee820-01af-478f-bd0f-07d80866863f/view?projectid=81238e6c-a19b-4c5c-9e45-753dfe7b7de3&context=analytics) generates clickstream events for LocalCart and sends them to Message Hub to show how data can be collected offline and streamed to the cloud later. A [Java app](https://localcartkafkaproducer.mybluemix.net/LocalCartKafkaProducer/) continuously feeds a simulated stream of events to Message Hub. 

This notebook creates streaming pipelines that ingest those clickstream events, and writes them to CSV format on Object Storage for later analysis.

These files can be concatenated and loaded into a Jupyter notebook. We can use [Pixiedust](https://github.com/ibm-cds-labs/pixiedust) to analyse the data. This type of analysis with Pixiedust is done in [Notebook #4: Visualize streaming data](https://apsportal.ibm.com/analytics/notebooks/d9fd6d78-d55f-4e83-b8ae-d465f7af256f/view?projectid=81238e6c-a19b-4c5c-9e45-753dfe7b7de3&context=analytics).

This notebook runs on Python 2 with Spark 2.0.

## Table of contents

1. [Introduction](#intro)<br>
2. [Scenario](#process)<br>
3. [Collect data from Message Hub](#collect)<br>
4. [Steps](#steps)<br>

<a id="process"></a>
## Scenario 

In this notebook, our aim is to persist the incoming events as CSV files by using the streaming pipelines service. The following graphic shows LocalCart clickstream events that are generated and sent from the Message Hub service. 
<img src='https://github.com/ibm-watson-data-lab/advo-beta-producer/blob/master/graphics/NB2a_CSV_PIPELINE.png?raw=true'></img>


<a id="collect"></a>
## Collect data from Message Hub

First we need to create a streaming pipeline that collects data from a Message Hub operator.


***

<a id="steps"></a>
### Steps

In IBM Data Science Experience, do these steps:

1. Select a project that you want to contain the streaming pipeline.
1. Click the **Analytics Assets** tab
1. In the Streaming Pipelines section, click **add streaming pipelines**.
1. In the Create Streaming Pipeline window, click **Create with a Wizard**. 
1. In the Select Source window, click **MessageHub**.
1. Under the Instance drop-down menu, select your MessageHub instance.
1. Under the Topic drop-down menu, select **add_to_cart**. Click the Continue button.
1. Wait for the Data Preview to load. Click the Continue button.
1. In the Select Target window, click **Object Storage**.
1. Under the Object Storage Instance drop-down menu, select your Object Storage instance.
   <br>
   > Take note of the  Object Storage instance name. You will need this information in notebook 3B (TODO) when you load and analyze the clickstream events.
1. Under the Container drop-down menu, select the Object Storage container you want to write to. 
   <br>
   > Take note of the  Object Storage container name. You will need this information in notebook 3B (TODO) when you load and analyze the clickstream events.
1. Under File Name, type **add_to_cart-TIMESTAMP.csv** (note: **TIMESTAMP** is a reserved word which will be replaced with an actual timestamp when the file is written).
1. Under Format, select **csv**.
1. Under Delimiter, select **Comma (,)**.
1. Type in a name for the pipeline, such as **addtocart2csv**, and then click **Save**.
1. In the next window, click the **Run** icon.
1. Repeat the steps above for each MessageHub topic: browsing, checkout, clickStream, login, logout_with_purchase, and logout_without_purchase

## Accessing CSV files on Object Storage
1. Login to [Bluemix](https://console.bluemix.net/) using your DSX credentials.
1. Navigate to the space where the Object Storage instance is located that you've selected when you created the DSX project.
1. Open the Object Storage instance.

### Accessing CSV files on Object Storage manually
1. Open the _Manage_ tab and select the container you've specified when you created the data collection pipeline. 
1. Select a csv file and "Select Action" > "Download File" to view it.

### Accessing CSV files on Object Storage programatically
1. Open the _Service credentials_ tab and select _View credentials_.
1. Copy the credentials and provide this information whenever you want to load data files programatically, such as in notebook 3B (TODO).
