# SAME Project Example
The [SAME Project](https://sameproject.ml/) is a library to turn your notebooks into pipelines, in our case, Pachyderm Pipelines.

This example shows how to create a simple CSV description pipeline using python and `pandas`.

## Step 0. Setup
Assuming you have a Pachyderm cluster set up, you can run the following commands to set up this project. 

Install sameproject: 
```
pip3 install --upgrade sameproject
```

Create a Pachyderm repo with a csv file in it. 
```bash
pachctl create repo csv_data

pachctl put file csv_data@master:housing-simplified.csv -f https://raw.githubusercontent.com/pachyderm/examples/example/automl/housing-prices/data/housing-simplified-1.csv
```


## Step 1: Write our code
All code written in code cells in the notebook will be executed in the pipeline when it runs. 

In [2]:
import pandas as pd

In [3]:
data = pd.read_csv('/pfs/csv_data/housing-simplified.csv')

In [6]:
print(data.describe())

               RM       LSTAT    PTRATIO           MEDV
count  100.000000  100.000000  100.00000     100.000000
mean     6.234410   10.772900   18.69000  468489.000000
std      0.490838    5.700031    1.69893  124487.368143
min      5.399000    1.980000   15.10000  266700.000000
25%      5.926250    6.702500   17.90000  396900.000000
50%      6.130500    9.465000   18.70000  451500.000000
75%      6.433000   13.315000   19.70000  518700.000000
max      8.069000   30.810000   21.10000  919800.000000


## Step 2: Deploy Pipeline

### Initialize
Use `same init` from the commandline to configure your `same.yaml`. 

```bash
$ same init
Name of this config: [default_config]:    same_test 

Notebook path [same_test.ipynb]: 

Notebook name [same_test]:   

Default docker image [combinatorml/jupyterlab-tensorflow-opencv:0.9]: 

No requirements.txt found in current directory - would you like to create one? [Y/n]: Y

Would you like SAME to fill in the requirements.txt for you? [Y/n]: n
Wrote empty requirements file to /home/jovyan/examples/same/requirements.txt.

About to write to /home/jovyan/examples/same/same.yaml:

apiVersion: sameproject.ml/v1alpha1
environments:
  default:
    image_tag: combinatorml/jupyterlab-tensorflow-opencv:0.9
metadata:
  labels: []
  name: default_config
  version: 0.0.0
notebook:
  name: same_test
  path: same_test.ipynb
  requirements: requirements.txt
run:
  name: default_config run

Is this okay? [Y/n]: Y

Wrote config file to /home/jovyan/examples/same/same.yaml.

You can now run 'same verify' to check that everything is configured correctly
(requires docker locally), or you can run 'same run' to deploy the pipeline to a
configured backend (e.g. Kubeflow Pipelines in a Kubernetes cluster file pointed
to by ~/.kube/config or set in the KUBECONFIG environment variable).
```

### Run Pipeline
Run the following command to deploy your notebook as a Pachyderm pipeline. 
```
same run --target pachyderm --input-repo csv_data
```