# Cape Python API Demo

This notebook illustrates how a data scientist can:
- Connect to Cape
- List and select a project
- List and add data views to a project
- Train an encrypted machine learning model on encrypted data using the data views associated to a specific project. 

In [1]:
from cape import Cape
from cape.api.dataview import DataView

## 1. Login to Cape

To use Cape through the Python API, you need to first login to your user account.
The access token can be generated from the User Testings section in the Cape UI.

In [2]:
c = Cape()
c.login(token='01EWE9F3JAKW5NE7TBM7NFQ289,AQ7M3j1y14ohdJdtuEY4q2ZafHrCBXnqRw')

## 2. Select a Project

You can then list the projects you have access to using the method `list_projects`:

In [3]:
c.list_projects()

[<Project (id=01EWE9YVM13HAVR5RJAFRGMRVS, name=demo-ds-api, label=demo-ds-api)>]

Currently the DS has to create a project from the Cape UI. But we could add the option for the DS to create a project from Python API. 

Once you have identified the relevant project, the DS can select the appropriate project with the method `get_project`.

In [4]:
demo_project = c.get_project(id="01EWE9YVM13HAVR5RJAFRGMRVS")

## 3. Add a Data View to a Project

For each project, it's possible to add a data view. This data view never hold any raw data. It's just a reference to the dataset which will be later use to train a machine learning model on the encrypted dataset.

In [5]:
demo_project.list_dataviews()

[<DataView (id=01EWED32H92AV9EGT07SZ3X1JZ)>]

To register a data view, you just needs to create a DataView object with the name and the path to the dataset (e.g. stored in `gs://` or `s3://` bucket) then call `add_dataview` on the project.

In [6]:
X_view = DataView(name="X-data", uri="https://storage.googleapis.com/worker-data/x_data.csv")
demo_project.add_dataview(X_view)

<DataView (id=01EWEQHZW6NGVJ9NMGK2M5FRYV)>

In [10]:
X_view.schema

{'index': 'integer',
 'transaction_date': 'datetime',
 'state': 'string',
 'transaction_amount': 'integer'}

In [7]:
demo_project.list_dataviews()

[<DataView (id=01EWED32H92AV9EGT07SZ3X1JZ)>,
 <DataView (id=01EWEQHZW6NGVJ9NMGK2M5FRYV)>]

You can get a specific view associated to a project as fellow: 

In [None]:
y_view =  demo_project.get_dataview(id="01EWED32H92AV9EGT07SZ3X1JZ")

## 4. Train an Encrypted Machine Learning Model

To train a machine learning model, you have to create a job matching the model type required by the use case. For example if the use case requires a Linear Regression model where one organization own the X inputs and another organization the target Y, the DS can create a `VerticalLinearRegressionJob` job. 

In [None]:
lr_job = VerticalLinearRegressionJob(
    x_train_dataview=X_view['state_ca', 'transaction_amount'],
    y_train_dataview=y_view['actual_sales_amount'],
    include_metrics=['rmse', 'mape', 'r-squared'],
    save_outputs_to=model_folder
)

demo_project.create_job(job=lr_job)

Question: to select specific columns from a dataview we could use the following syntax: `dataview['actual_sales_amount']` (Pandas syntax). Or is the following syntax more intuitive  (x_train_data_cols & y_train_data_cols attributes):
```
lr_job = VerticalLinearRegressionJob(
    x_train_dataview=dataview_1,
    x_train_data_cols=['state_ca', 'estimated_sales_amount']
    y_train_dataview=dataview_2,
    y_train_data_cols=['actual_sales_amount'],
    include_metrics=['rmse', 'mape', 'r-squared'],
    save_outputs_to=model_folder,
)
```

You are now ready to submit the job to Cape.

In [None]:
lr_job.submit_job()

You can check the status (ready, success, failed etc.) of the job in the Cape UI or by calling `get_status`

In [None]:
lr_job.get_status()

Once training is done, you can access the model weights, metrics etc. by calling `get_results`.

In [None]:
lr_job.get_results()