# LocalCart scenario one: dynamic data analysis and visualization
***


## Introduction 

This dynamic data analysis scenario is divided into two parts.

<img src="https://raw.githubusercontent.com/ibm-watson-data-lab/localcart-at-index-conf/master/images/dynamic_analysis_flow.png"></img>

[Part 1](#part1): A web or mobile app will trigger events as a user navigates a web site. These clickstream events indicate when a customer logs in, adds something to a basket, completes an order, and logs out. The events are placed into configured Message Hub (Apache Kafka) that provides a scalable way to buffer the data before it is saved, analysed, and rendered. A streams flow aggregates these events and stores the aggregated data in a Compose for Redis database.

[Part 2](#part2): A Node.js app monitors the Compose for Redis database and visualizes the aggregated data in a simple dashboard user interface. By the end of the notebook, you'll understand how to deploy a dashboard app to IBM Cloud to visualize streaming data, and how to simulate streaming data if you don't have a streaming data source.


This notebook runs on Python 2 with Spark 2.1. When running it on the IBM Cloud, ensure that the notebook is in "edit mode", which you can enable by clicking the pencil icon in the navigation bar.

<a id="part1"></a>

***
# Part 1: Capturing clickstream events for real-time analysis
***


<img src="https://raw.githubusercontent.com/ibm-watson-data-lab/localcart-at-index-conf/master/images/dynamic_analysis_flow_part_1.png"></img>


In this first example you will create a streams flow that ingests `login`, `add_to_basket` and `checkout` clickstream events, aggregates them according to our business needs and stores the aggregated data in a Redis database, which will be monitored by a real-time dashboard:

<img src='https://raw.githubusercontent.com/ibm-watson-data-lab/localcart-at-index-conf/master/images/dynamic_analysis_streams_flow.png'></img>

## Part 1 table of contents

 [1.1 Redis setup](#redis)<br>
 [1.2 Create a streams flow](#create_p1) <br>
 [1.3 Process login clickstream events](#login) <br>
 [1.4 Process add_to_cart clickstream events](#addtocart) <br>
 [1.5 Process checkout clickstream events](#checkout) <br>
 [1.6 Run the flow](#run_1)<br>


<a id="redis"></a>
***

## 1.1 Redis setup

Redis is an in-memory database. It stores its data in RAM, making it a very fast way of storing and retrieving data. It provides a set of primitive data structures, but we only concern ourselves with [hashes](https://redis.io/commands#hash) for this exercise.

A Redis hash is a data structure that allows several keys to be stored together. We are going to configure a Redis hash called `funnel` that contains the following output:

- login_count - the number of people who logged into LocalCart
- basket_count - the number of items added into a shopping cart
- checkout_count - the number of purchases made
- basket_total - the total price of items added into a shopping cart
- checkout_total - the total price of items purchased

These are the outputs of the aggregation functions in our streaming pipeline. 


### 1.1.1 Collect your Redis connection information

1. Open your <a target="_blank" href="https://apsportal.ibm.com/settings/services?context=analytics">IBM Cloud Data Services list</a>. A list of your provisioned services is displayed.
1. Locate the pre-provisioned **Compose for Redis** service and click on the service instance name.
1. Open the _Service Credentials_ tab and view the credentials.
```
{
  "db_type": "redis",
  "maps": [],
  "name": "b...b",
  "uri_cli": "redis-cli -h **HOSTNAME** -p **PORT** -a **PASSWORD**",
  "deployment_id": "5...2",
  "uri": "redis://admin:**PASSWORD**@**HOSTNAME**:**PORT**"
}
```

Note your `**HOSTNAME**`, `**PORT**`, `**PASSWORD**`, and `uri` information.


### 1.1.2 Verify your redis connectivity
You can verify your redis connectivity information in this notebook by installing the Python Redis library with the following command:

In [None]:
!pip install redis

We import the library and connect to Redis with the following command. Replace the credential placeholders with your credentials.

In [None]:
import redis
# TODO replace **uri** with your Redis uri
r = redis.StrictRedis.from_url(**uri**)

We can then create a hash called `funnel` to store our real-time data to the database by using the `hset` function:

In [None]:
r.hset('funnel', 'basket_count', 554);
r.hset('funnel', 'basket_total', 951);
r.hset('funnel', 'checkout_count', 21);
r.hset('funnel', 'checkout_total', 5400);
r.hset('funnel', 'login_count', 100);

We can also use this connection to retrieve all the values from our `funnel` hash using `hgetall`:

In [None]:
r.hgetall('funnel')

**Note:** 
The Redis connection above seems to freeze in this notebook after a minute or so. In this case, you will need to restart the notebook kernel to restore it.
<BR>
We can now create streams flows that store aggregated data in Redis.

<a id="create_p1"></a>
***

## 1.2 Create a streams flow

In IBM Watson Data Platform, do these steps:

1. Select a project that you want to contain the streams flows. Note that this project must be attached to Cloud Object Storage and not Object Storage (Swift).
1. Click the **Assets** tab and scroll to the _Streams flows_ section. (If no section with the name is displayed the selected project is not attached to Cloud Object Storage.)
1. Click **+ New streams flow**.
1. In the _New Streams Flow_ window, 
  1. Enter name `aggregate_for_redis`
  1. Select an existing Streaming Analytics service or create a new one (choosing the _Lite_ plan, which is free.) 
  1. Select **Manually**. 
  1. Click **Create**.

An empty canvas is displayed, along with a list of _Source_, _Target_, _Processing and Analytics_ and _Alerts_ operators that you can choose from. Source operators load data and target operators store data.

<a id="login"></a>
***

## 1.3 Process login clickstream events

First we need to collect `login` data from Message Hub and calculate the number of logins during a rolling one hour time window. The incoming `login` event payload has the following structure:
```
  {
    "customer_id": "13872",
    "click_event_type": "login",
    "total_price_of_basket": "0.0",
    "total_number_of_items_in_basket": "0",
    "total_number_of_distinct_items_in_basket": "0",
    "event_time": "2017-07-11 20:10:52 UTC"
  }
```


### 1.3.1 Configure the source

1. Drag a **MessageHub** source operator into the pipeline canvas.
1. Configure the MessageHub operator:
	1. Add a connection to your Message Hub instance. For the "brokers" field, note that you should enter a comma-separated string (no spaces) of all brokers listed in your service credentials.
	1. Select the `login` topic.
	1. Click **Edit Schema** to specify the payload properties this operator will make available to operators that are connected to its output port. Since we only want to count the number of login events we only make the `customer_id` available.
    1. Choose
            - Attribute Name: `customer_id`
            - Type: `Number` 
            - Path: `/customer_id` 
    1. Click **Save** and **Close**.         


Our streams flow now has its first operator and looks like this: 

<img src='https://raw.githubusercontent.com/ibm-watson-data-lab/localcart-at-index-conf/master/images/dynamic_analysis_mh_config.png'></img>


### 1.3.2 Set up aggregation functions

Streaming data can be aggregated by applying functions such as sum, count, minimum, or maximum. The results of the aggregation can be done on the aggregation before it is written to the Redis database. Our aim is to calculate the number of people who logged into LocalCart for a sliding one-hour window.

In the streams flow canvas, do these steps:

1. Drag an **Aggregation** operator from the _Processing and Analytics_ area, and then drop it on the canvas next to the Message Hub operator.
2. Drag your mouse pointer from the output port of the Message Hub operator to the input port of the Aggregation operator to connect them.
3. Click the **Aggregation** operator to open its _Properties_ pane. Set the following _Aggregation Window_ parameters:
    - Type - `sliding`
    - Time Units - `hour`
    - Number of Time Units - `1`
    - Partition By - leave unchanged
    - Group By - leave unchanged
4. In the **Functions** area of the _Aggregation Properties_ pane, define one aggregation:
    - Aggregation 1: count the logins
        - Output Field Name - `login_count`
        - Function Type - `Count`
        
    Note: To identify how many different customers have logged in during the rolling 1 hour time window, we would use the `CountDistinct` function and apply it to `customer_id`.

Our streams flow now has two connected operators: a source operator and an aggregation operator. Hover over the arrow to review the data flow between them.

<img src='https://raw.githubusercontent.com/ibm-watson-data-lab/localcart-at-index-conf/master/images/dynamic_analysis_flow_mh_a.png'></img>



### 1.3.3 Configure the target

Next, add a Redis target operator. In the streams flow canvas, do these steps:

1. Drag a **Redis** operator from the _Target_ area, and then drop it on the canvas next to the Aggregation operator.
1. Drag your mouse pointer from the output port of the Aggregation operator to the input port of the Redis operator to connect them.
1. Click the **Redis** operator to open its Properties pane. 
    - Add a connection to your Redis instance.
      - Type in the `**HOST**`, `**PORT**` and `**PASSWORD**` credentials of your Compose for Redis service.
    - In the **Key Template** field, type in `funnel`. 
1. Save the streams flow. The setup for `login` event processing is complete.

  <img src='https://raw.githubusercontent.com/ibm-watson-data-lab/localcart-at-index-conf/master/images/dynamic_analysis_flow_mh_a_r.png'></img>    



***

<a id="addtocart"></a>
## 1.4 Process add_to_cart clickstream events

Next we need to collect `add_to_cart` event data from Message Hub and calculate the number of shopping baskets and their combined value during a rolling one hour time window. The incoming `add_to_cart` event payload has the following structure:

```
{
    "customer_id": "13859",
    "click_event_type": "add_to_cart",
    "product_name": "Oatmeal",
    "product_category": "Food",
    "product_price": "2.49",
    "total_price_of_basket": "153.41",
    "total_number_of_items_in_basket": "19",
    "total_number_of_distinct_items_in_basket": "6",
    "event_time": "2017-06-23 12:56:18 UTC"
}
```

### 1.4.1 Configure the source, aggregation function and target for add_to_cart events

1. Drag another **Message Hub** source operator into the canvas.
1. Configure the Message Hub operator by doing these steps in the Properties pane:
	1. Select the Message Hub connection you've created earlier.
	1. Select the `add_to_cart` topic.
	1. Click **Edit Schema** to make the customer id and cart value available to connected operators. 
    1. The message schema can be automatically detected if a producer has already generated messages for the selected topic. Click **Detect Schema** and **Show preview**.
      > If no messsages are displayed and the schema is not populated verify that your producer is running.
    1. Remove all attributes except `customer_id` and `total_price_of_basket`:
      - Attribute Name: `customer_id`
            - Type: `Number` 
            - JSON Path: `/customer_id` 
      - Attribute Name: `total_price_of_basket` 
            - Type: `Number` 
            - JSON Path: `/total_price_of_basket` 
    1. Click **Save** and **Close**.
1. Drag an **Aggregation** operator from the **Processing and Analytics** area, and then drop it on the canvas next to the Message Hub operator.
1. Drag your mouse pointer from the output port of the Message Hub operator to the input port of the Aggregation operator to connect them.
1. Click the **Aggregation** operator to open its _Properties_ pane. Set the following _Aggregation Window_ parameters:
    - Type - `sliding`
    - Time Units - `hour`
    - Number of Time Units - `1`
    - Partition By - leave unchanged
    - Group By - leave unchanged
1. In the **Functions** area of the _Aggregation Properties_ pane, define two aggregations:
    - Aggregation 1: count the baskets
        - Output Field Name - `basket_count`
        - Function Type - `Count`
    - Aggregation 2: Sum up basket values
        - Output Field Name - `basket_total`
        - Function Type - `Sum`
        - Apply Function to - `total_price_of_basket`
        
1. Copy the existing **Redis** operator that's already on the canvas and paste it next to the _Aggregation_ Operator. 
1. Drag your mouse pointer from the output port of the Aggregation operator to the input port of the Redis operator to connect them.

 Your pipeline is now configured to stream and aggregate `login` and `add_to_cart` events:
    
 <img src='https://raw.githubusercontent.com/ibm-watson-data-lab/localcart-at-index-conf/master/images/dynamic_analysis_flow_mh_a_r_2.png'></img>    

1. Save your streams flow. No errors should be reported.




<a id="checkout"></a>
***

## 1.5 Process checkout clickstream events

First we need to create a stream that collects `checkout` event data from a Message Hub operator and calculates the number of checkouts and their combined value during a rolling one hour time window. The incoming `checkout` event payload has the following structure:

```
{
    "customer_id": "11828",
    "click_event_type": "checkout",
    "total_price_of_basket": "72.80000000000001",
    "total_number_of_items_in_basket": "20",
    "total_number_of_distinct_items_in_basket": "5",
    "session_duration": "440",
    "event_time": "2017-06-23 13:09:12 UTC"
}
```

### 1.5.1 Set up pipeline source, aggregation function and target for checkout events

1. Drag another **Message Hub** source operator into the canvas.
1. Configure the MessageHub operator by doing these steps in the Properties pane:
	1. Select the Message Hub connection you created earlier.
	1. Select the `checkout` topic.
	1. Click **Edit Schema** to specify the message attributes that will be consumed. Define the following attributes (by entering them manually or customizing the auto-detected schema):
      - Attribute Name: `customer_id` 
            - Type: `Number` 
            - Path: `/customer_id` 
      - Attribute Name: `total_price_of_basket` 
            - Type: `Number` 
            - Path: `/total_price_of_basket` 
1. Drag an **Aggregation** operator from the _Processing and Analytics_ area, and then drop it on the canvas next to the Message Hub operator.
1. Drag your mouse pointer from the output port of the Message Hub operator to the input port of the Aggregation operator to connect them.
1. Click the **Aggregation** operator to open its _Properties_ pane. Set the following _Aggregation Window_ parameters:
    - Type - `sliding`
    - Time Units - `hour`
    - Number of Time Units - `1`
    - Partition By - leave unchanged
    - Group By - leave unchanged
1. In the **Functions** area of the _Aggregation Properties_ pane, define two aggregations:
    - Aggregation 1: count checkouts
        - Output Field Name - `checkout_count`
        - Function Type - `Count`
    - Aggregation 2: Sum basket values
        - Output Field Name - `checkout_total`
        - Function Type - `Sum`
        - Apply Function to - `total_price_of_basket`
        
1. Copy the existing **Redis** operator that's already on the canvas and paste it next to the _Aggregation_ Operator. 
1. Drag your mouse pointer from the output port of the Aggregation operator to the input port of the Redis operator to connect them. The completed stream now looks as follows: <br>
   <img src='https://raw.githubusercontent.com/ibm-watson-data-lab/localcart-at-index-conf/master/images/dynamic_analysis_flow_mh_a_r_3.png'></img>    

1. Save the stream flow. No errors should be reported.

<a id="run_1"></a>
## 1.6 Run the stream flow

1. Click **Run**. 
1. If the flow does not start verify your stream flow. If no events are flowing from Message Hub operators make sure that your producer (simulating user activity), which you've launched in notebook 1, is running. 
1. Click on any operator to display throughput information.

<img src= "https://raw.githubusercontent.com/ibm-watson-data-lab/localcart-at-index-conf/master/images/dynamic_analysis_flow_run.png"></img>

Congratulations! You just created a flow that ingests clickstream data from Message Hub, aggregates data and stores it in Redis storage.

Next you will deploy a simple Node.js application that monitors the Redis database and visualizes the aggregated data in real-time.

<a id="part2"></a>

***
# Part 2: Visualizing streaming data in a real-time dashboard
***

## Table of contents
2.1 [Setup](#setup)<br>
2.2 [Install and deploy dashboard app to IBM Cloud](#install_deploy)<br>
    2.2.1 [Install and deploy the dashboard app automatically](#install_auto)<br>
    2.2.2 [Install and deploy the dashboard app manually](#install_manually)<br>
    2.2.3 [Install the dashboard app locally](#install_locally)<br>
    

<a id="setup"></a>
## 2.1 Setup

Before you use the example code in this notebook, follow these setup steps:

### Collect Redis connection information

1. Open your <a target="_blank" href="https://apsportal.ibm.com/settings/services?context=analytics">IBM Cloud Data Services list</a>. A list of your provisioned services is displayed.
1. Locate the pre-provisioned **Compose for Redis** service and click on the service instance name.
1. In the _Overview_ tab locate the HTTPS connection string
```
redis://admin:**PASSWORDX@**HOSTNAME**:**PORT**
```

1. Note your `HOSTNAME`, `PORT`, `PASSWORD` and `uri` information.

**If you have successfully completed part 1 of this notebook skip the next section and proceed to section [Install and deploy dashboard app to IBM Cloud](#install_deploy).**

### Simulate clickstream data
             
If you have not completed the first part of this notebook you need to simulate the output of a streams flow.
<br>
In the next cell replace `**uri**` with your Redis database's URI and then run the cell.

In [None]:
# @hidden_cell
redis_uri='**uri**'

In [None]:
!pip install redis
import redis
import time

# Connect to Redis
r = redis.StrictRedis.from_url(redis_uri)
print 'Inserting aggregated dummy data into Redis '
i = 0
while True:
    i = i + 1
    # Insert dummy aggregated data values for demonstration purposes
    print '.',
    r.hset('funnel', 'basket_count', 3*i);
    r.hset('funnel', 'basket_total', 2*i);
    r.hset('funnel', 'checkout_count', i);
    r.hset('funnel', 'checkout_total', i*75);
    r.hset('funnel', 'login_count', 5*i);
    time.sleep(2)       
    if (i > 100):
        print '\nSimulation complete'
        break;

<a id="install_deploy"></a>
## 2.2 Install and deploy dashboard app to IBM Cloud

You can install the dashboard app and deploy it to IBM Cloud either automatically through the click of a button, manually or locally.

<a id="install_auto"></a>
### 2.2.1 Install and deploy the dashboard app automatically

**Note**: Automatic deployment will only succeed if your Compose for Redis service instance in IBM Cloud is named `ComposeForRedis-WDPBeta`.

To install the app automatically:

1. Click the following button: <a target="_blank" href="https://bluemix.net/deploy?repository=https://github.com/ibm-watson-data-lab/advo-beta-dashboard">
    <img src="http://bluemix.net/deploy/button.png" alt="Deploy to IBM Cloud"/>
</a> <br/>
This button installs the Node.js app that will act as the real-time dashboard to visualize the streaming data. The code of this Node.js app is open-source and published on this  <a target="_blank" href="https://github.com/ibm-watson-data-lab/advo-beta-dashboard">GitHub repository</a>.
1. In the _Deploy to Bluemix_ wizard select **betatest** as your space.
<img src='https://raw.githubusercontent.com/ibm-watson-data-lab/localcart-at-index-conf/master/images/dynamic_analysis_deploy.png'></img>

1. Click **Deploy** to deploy the app to IBM Cloud and follow the instructions to view the application.
 
<img src='https://raw.githubusercontent.com/ibm-watson-data-lab/localcart-at-index-conf/master/images/dynamic_analysis_deploy_2.png'></img>

> Note: The application deployment might take a couple of minutes. Once in the dashboard, any hyperlinks are placeholders. They are styled for example purposes only.

<img src='https://raw.githubusercontent.com/ibm-watson-data-lab/localcart-at-index-conf/master/images/dynamic_analysis_dashboardapp.gif' width='1500'></img>

<a id="install_manually"></a>
### 2.2.2 Install and deploy the dashboard app manually to IBM Cloud

You can choose to modify the app code yourself then install it and deploy it manually from the public GitHub repository:

```sh
# clone the code
git clone https://github.com/ibm-watson-data-lab/advo-beta-dashboard

# change directory
cd advo-beta-dashboard

# deploy the dashboard to IBM Cloud
cf push

```

**Note**: If your Compose for Redis service instance is not named `ComposeForRedis-WDPBeta` your deployment will fail with error _Could not find service ComposeForRedis-WDPBeta to bind to dashboard_. To resolve the issue, open `manifest.yml` and replace all occurrences of `ComposeForRedis-WDPBeta` with the name of your service instance name.


<a id="install_locally"></a>
### 2.2.3 Install and run the dashboard app locally

You can choose to run the dashboard on your own machine with <a target="_blank" href="https://nodejs.org/en/download/">Node.js</a>[]() installed. To do so, add the Compose for Redis credentials and run the following commands:

```sh
# clone the code
git clone https://github.com/ibm-watson-data-lab/advo-beta-dashboard

# change directory
cd advo-beta-dashboard

# install dependencies
npm install

# TODO: Replace **HOSTNAME**, **PORT** and **PASSWORD** with your Redis credentials
export REDIS_URL="redis://x:**PASSWORD**@**HOSTNAME**:**PORT**"

# run the dashboard
npm start
```

At the end of the setup, the dashboard app displays the port that the app is using, for example:

```
Dashboard app listening on port 6039
```

Simply go to the http://localhost:6039 in your web browser to access the dashboard app and visualize the streaming data.

<a id="summary"></a>
## Summary and next steps

You successfully completed this notebook! You learned how to create a streams flow to selectively aggregate data on the fly and visualize it in a Node.js dashboard web application.

Check out other notebooks in this series: 
 - Localcart scenario two: Static data analysis using Python and PixieDust
 - Localcart scenario three: Build a product recommendation engine
 - Localcart scenario four: Build a revenue dashboard using PixieApps

Copyright © 2017,2018 IBM. This notebook and its source code are released under the terms of the MIT License.