# Pipy pipeline demo
## An example workflow

#### Introduction

`pipy` is a library that allows developers to build data pipelines in `sk-learn` or others and turn them into interactive pipelines using `ipywidgets`.

#### Goal

The ultimate goal of `pipy` is to create an environment where Python Engineers, Data Scientists, and end users throughout businesses can work together seemlessly and efficiently.

*Note:* this is a demo and it shows work in progress

### 1) Importing the library

In [1]:
import pipy

### 2) Extracting data from our Postgres database

In [2]:
csv = pipy.pipeline.extract.CSV({'path': './dummy_data.csv'})

So, we used code to setup the `SQL` stage and can use the mouse and keyboard to make changes to it.

#### 2.1) Running the Postgres extract

We can _run_ each stage by calling `transform` (or, for `Extract` stages we can use `extract` as well):

In [3]:
df = csv.extract()  # or use `csv.transform()`
df.head()

Unnamed: 0,firm,date,col_a,col_b
0,ABCD,2019-01-01,12,10
1,ABCD,2019-01-02,13,9
2,ABCD,2019-01-03,19,11
3,ABCD,2019-01-04,3,3
4,ABCD,2019-01-05,13,4


### 3) Creating a pipeline

Let's create some more stages so we can build a pipeline:

In [5]:
weekday = pipy.pipeline.transform.DayOfWeek(columns={'in': 'date'})

#### 3.1) Using `sklearn` for Machine Learning models

`sklearn` is the most popular way by Data Scientists to build Machine Learning models. (see: https://github.blog/2019-01-24-the-state-of-the-octoverse-machine-learning/)

`pipy` has a `SkLearnModelWrapper` so we can use all `sklearn` models within our own pipeline, meaning that also `sklearn` models are interactive.

In [6]:
from sklearn.linear_model import LinearRegression

In [7]:
ols = pipy.pipeline.model.SkLearnModelWrapper(
    columns={
        'target': 'sales', 
        'features': ['date|DayOfWeek'],
    },
    params={
        'sklearn_model': LinearRegression(fit_intercept=True, normalize=False),
    }
)

#### 3.2) Useing pipy to connect to tableau

In [7]:
from pipy import tableau

tableau.set_connection(
    server='http://ny4tableau01.bats.com',
    username='svc_tableau_uk',
    password='j@Ln-volcano-7jaFAXK7fV',
    tableau_site='CboeEurope',
)

In [8]:
load = pipy.pipeline.load.Tableau({'project_name': 'General Sandbox (Non-Certified)', 'datasource_name': 'tableau-data-pipeline-demo'})

#### 3.3) Combining the stages to create a pipeline

In [11]:
pipe = pipy.pipeline.Pipeline({'steps': [sql, weekday, ols, load]})

Again, we can simply print the pipeline to get interactive controls to make adjustment to the pipeline:

In [12]:
pipe

Tab(children=(Accordion(children=(VBox(children=(HTML(value='<b>Parameters:</b>'), HBox(children=(Box(layout=L…

### 4) Running the pipeline to extract, transform, and load the data to Tableau

In [13]:
df = pipe.run()

  self._column_static_type = self._dataframe.apply(lambda x: pandleau.data_static_type(x), axis=0)
processing table: 0it [00:00, ?it/s]

Table 'Extract' does not exist in extract /tmp/tmpweb3eh79/tmp.hyper, creating.


processing table: 350310it [00:07, 45849.30it/s]


In [14]:
df.head()

Unnamed: 0,firm_id,symbol_name,date,added_notional,removed_notional,date|DayOfWeek,added_notional|LinearRegression
0,ABGS,AAKs,2019-01-03,858.9,8589.0,3,2513938.0
1,ABGS,AAKs,2019-01-10,26058.0,386.76,3,2513938.0
2,ABGS,ABBs,2019-01-02,0.0,84600.0,2,2540808.0
3,ABGS,ABBs,2019-01-03,0.0,812306.45,3,2513938.0
4,ABGS,ABBs,2019-01-10,0.0,318007.6,3,2513938.0
