# LocalCart scenario part 6a: Creating an aggregation streams flow


## Introduction 

This notebook is divided into two parts. The first part (this notebook), describes how to import a streams flow that performs data enrichment, filtering and agregation. The second part implements a dashboard that visualizes the aggregated data.

In [Notebook #1 - Creating a Kafka Producer of ClickStream events](https://github.com/wdp-beta/get-started/blob/beta_2/notebooks/localcart-scenario-part-1.ipynb) we generate clickstream events for LocalCart and send them to Message Hub to show how data can be collected offline and streamed to the cloud later. In this notebook we listen to those clickstream events that indicate that a customer has made a purchase and  enrich these events by adding geolocation information. Using this information we aggregate the total revenue by ZIP code (for US-based transactions) or country code (for international transactions) and periodically write totals (representing revenue) to generic cloud storage (represented in this scenario by Cloud Object Storage).

<img src="https://raw.githubusercontent.com/wdp-beta/get-started/beta_2/notebooks/images/nb6_streams_flow.png"></img>


## Table of contents

* [1.1 Import a streams flow](#import_flow) <br>
* [1.2 Customize the streams flow](#customize_flow) <br>
* [1.3 Run the flow](#run_flow)<br>
* [1.4 Summary and next steps](#summary)<br>


<a id="import__flow"></a>
***

## 1.1 Import a streams flow

In previous notebooks you've created streams flows from scratch using a wizard and manually. In this notebook you'll import and customize a streams flow that aggregates sales transactions and writes them to Cloud Object Storage.

First

1. Download https://raw.githubusercontent.com/wdp-beta/get-started/beta_2/streams_flows/revenue_by_state_or_country.stp to your local machine. This file contains the streams flow definition you'll be working with.

Next, complete the following steps in IBM Watson Data Platform:

1. Select a project that you want to contain the streams flow. Note that this project must be attached to Cloud Object Storage and not Object Storage (Swift).
1. Click the **Assets** tab and scroll to the _Streams flows_ section. 
 > If no section with the name is displayed the selected project is not attached to Cloud Object Storage.
1. Click **+ New streams flow**.
1. In the _New Streams Flow_ window, 
  1. Select **from file**
  1. Select an existing Streaming Analytics service or create a new one (choosing the _Lite_ plan, which is free.) 
  1. Browse to the streams flow file you've downloaded. 
   > The flow name and description are populated for you. You can change the default if desired.
  1. Click **Create**. Wait for the import to complete.
  
1. Review the flow. It comprises of one source operator (Message Hub), a [Python] code operator (retrieving customer geolocation information from the Cloud), two filter operators (separating US transactions from international transactions), two aggregation operators (one for each major geography, calculating the revenue) and two Cloud Object Storage target operators, saving the aggregated data for later processing.
 
 > You'll notice that the run button is disabled, because the flow it is not yet properly configured for your environment. 

1. To identify the issues, click the highlighted notification icon on the right hand side.
<img src="https://raw.githubusercontent.com/wdp-beta/get-started/beta_2/notebooks/images/nb6_import_notifications.png"></img>
1. Click on the notification to open the canvas and expand the highlighted error list icon
 > Note that the (Message Hub) source operator and the two Cloud Object Storage target operators are tagged as invalid. This is expected because they are associated with service instances that you don't have access to.

<a id="customize_flow"></a>
***

## 1.2 Customize the streams flow


#### Resolve the Message Hub operator issue
1. Open the Message Hub operator
 > Note that no connection and no topic are assigned to it.
1. Select your existing Message Hub connection or create a new one. 
1. From the _Topic_ dropdown select **logout_with_purchase**.
1. Customize the schema
  1. Detect the schema.
  1. Change the **customer_id** attribute type from _Number_ to **Text**.
  1. Save your changes.
1. Save the streams flow.
> The Source Message Hub operator should no longer be flagged as invalid.
 
#### Resolve the Cloud Object Storage operator issues

1. Open the first _Cloud Object Storage_ operator
 > Note that no connection and no file path is assigned to it because your Watson Data Platform environment is different from the environment where the flow was created.
1. Select your existing Cloud Object Storage connection or create a new one. 
1. Customize the file path, which defines where operator will write the output to.
  1. Open the data asset selector and select an existing bucket.
   > Don't choose an existing object from the list. You'll specify a new generic name in the next step.
  1. Append `/us_revenue_%TIME` to the file path, to specify the object name pattern
   > `%TIME` will be replaced with a timestamp. Your path should look as follows: `/my-existing-bucket/us_revenue_%TIME`
1. Review the other settings but do not make any other changes.
 > Pay attention to the file writing policy, which defines how frequently aggregated data is written to storage.
1. Save the streams flow.
 > The first Cloud Object Storage operator should no longer be flagged as invalid.

 ***

1. Open the second _Cloud Object Storage_ operator
1. Select your existing Cloud Object Storage connection. 
1. Customize the file path, which defines where operator will write the output to.
  1. Open the data asset selector and select an existing bucket.
  1. Append `/foreign_revenue_%TIME` to the file path, to specify the object name pattern
   > `%TIME` will be replaced with a timestamp
1. Review the other settings but do not make any other changes.
1. Save the streams flow.
> The streams flow should now be valid.


<a id="run_flow"></a>
***

## 1.3 Run the flow

1. Run the customized flow. After a minute or two data should be streaming from the source to the targets.
1. Wait until at least one data file containing revenue information for US transactions and international transactions has been written to the sepcified Cloud Object Storage bucket before continuing.

<a id="summary"></a>
***

## 1.4 Summary and next steps

You've learned how to import and custommize a streams flow and got aquainted with the code operator, the filter operator and the aggregation operator.

Next, learn how to click-stream data in Notebook 6b: TBD.



***

### Authors

Patrick Titzler is a Developer Advocate for Watson Data Platform at IBM. 

***
Copyright © IBM Corp. 2017. This notebook and its source code are released under the terms of the MIT License.