# Pipy pipeline demo
## An example workflow

#### Introduction

`pipy` is a library that allows developers to build data pipelines in `sk-learn` or others and turn them into interactive pipelines using `ipywidgets`.

#### Goal

The ultimate goal of `pipy` is to create an environment where Python Engineers, Data Scientists, and end users throughout businesses can work together seemlessly and efficiently.

*Note:* this is a demo and it shows work in progress

### 1) Importing the library

In [1]:
import pipy

### 2) Extracting data from our Postgres database

In [2]:
csv = pipy.pipeline.extract.CSV({'path': './dummy_data.csv'})

So, we used code to setup the `SQL` stage and can use the mouse and keyboard to make changes to it.

#### 2.1) Running the Postgres extract

We can _run_ each stage by calling `transform` (or, for `Extract` stages we can use `extract` as well):

In [3]:
df = csv.extract()  # or use `csv.transform()`
df.head()

Unnamed: 0,firm,date,sales,price
0,ABCD,2019-01-01,12,10
1,ABCD,2019-01-02,13,9
2,ABCD,2019-01-03,19,11
3,ABCD,2019-01-04,3,3
4,ABCD,2019-01-05,13,4


### 3) Creating a pipeline

Let's create some more stages so we can build a pipeline:

In [4]:
weekday = pipy.pipeline.transform.DayOfWeek(columns={'in': 'date'})

#### 3.1) Using `sklearn` for Machine Learning models

`sklearn` is the most popular way by Data Scientists to build Machine Learning models. (see: https://github.blog/2019-01-24-the-state-of-the-octoverse-machine-learning/)

`pipy` has a `SkLearnModelWrapper` so we can use all `sklearn` models within our own pipeline, meaning that also `sklearn` models are interactive.

In [5]:
from sklearn.linear_model import LinearRegression

In [6]:
ols = pipy.pipeline.model.SkLearnModelWrapper(
    columns={
        'target': 'sales', 
        'features': ['date|DayOfWeek'],
    },
    params={
        'sklearn_model': LinearRegression(fit_intercept=True, normalize=False),
    }
)

#### 3.3) Combining the stages to create a pipeline

In [7]:
pipe = pipy.pipeline.Pipeline({'steps': [csv, weekday, ols]})

Again, we can simply print the pipeline to get interactive controls to make adjustment to the pipeline:

In [9]:
pipe

Tab(children=(Accordion(children=(VBox(children=(HTML(value='<b>Parameters:</b>'), HBox(children=(Box(layout=L…

### 4) Running the pipeline to extract, transform, and load the data to Tableau

In [10]:
df = pipe.run()

NumExpr defaulting to 8 threads.


In [11]:
df.head()

Unnamed: 0,firm,date,sales,price,date|DayOfWeek,sales|LinearRegression
0,ABCD,2019-01-01,12,10,1,18.6
1,ABCD,2019-01-02,13,9,2,17.8
2,ABCD,2019-01-03,19,11,3,17.0
3,ABCD,2019-01-04,3,3,4,16.2
4,ABCD,2019-01-05,13,4,5,15.4
