[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/openlayer-ai/examples-gallery/blob/main/development/quickstart/traditional-ml/tabular-quickstart.ipynb)


# <a id="top">Development quickstart</a>

This notebook illustrates a typical development flow using Openlayer.


## <a id="toc">Table of contents</a>

1. [**Creating a project**](#project)   

2. [**Uploading datasets**](#dataset)

3. [**Uploading a model**](#model)

4. [**Committing and pushing**](#push)

## <a id="project"> 1. Creating a project</a>

[Back to top](#top)

In [None]:
!pip install openlayer

In [None]:
import openlayer
from openlayer.tasks import TaskType

client = openlayer.OpenlayerClient("YOUR_API_KEY_HERE")

project = client.create_or_load_project(
    name="Churn Prediction",
    task_type=TaskType.TabularClassification,
)

# Or 
# project = client.load_project(name="Your project name here")

## <a id="dataset"> 2. Uploading datasets </a>

[Back to top](#top)

### <a id="download-datasets"> Downloading the training and validation sets </a>

In [None]:
%%bash

if [ ! -e "churn_train.csv" ]; then
    curl "https://openlayer-static-assets.s3.us-west-2.amazonaws.com/examples-datasets/tabular-classification/documentation/churn_train.csv" --output "churn_train.csv"
fi

if [ ! -e "churn_val.csv" ]; then
    curl "https://openlayer-static-assets.s3.us-west-2.amazonaws.com/examples-datasets/tabular-classification/documentation/churn_val.csv" --output "churn_val.csv"
fi

In [None]:
import pandas as pd

train_df = pd.read_csv("./churn_train.csv")
val_df = pd.read_csv("./churn_val.csv")

Now, imagine that we have trained a model using this training set. Then, we used the trained model to get the predictions for the training and validation sets. Let's add these predictions as an extra column called `predictions`: 

In [None]:
train_df["predictions"] = pd.read_csv("https://openlayer-static-assets.s3.us-west-2.amazonaws.com/examples-datasets/tabular-classification/documentation/training_preds.csv") 
val_df["predictions"] = pd.read_csv("https://openlayer-static-assets.s3.us-west-2.amazonaws.com/examples-datasets/tabular-classification/documentation/validation_preds.csv")

In [None]:
val_df.head()

### <a id="upload-datasets"> Uploading the datasets to Openlayer </a>

In [None]:
dataset_config = {
    "categoricalFeatureNames": ["Gender", "Geography"],
    "classNames": ["Retained", "Exited"],
    "featureNames": [
        "CreditScore", 
        "Geography",
        "Gender",
        "Age", 
        "Tenure",
        "Balance",
        "NumOfProducts",
        "HasCrCard",
        "IsActiveMember",
        "EstimatedSalary",
        "AggregateRate",
        "Year"
    ],
    "labelColumnName": "Exited",
    "label": "training",  # This becomes 'validation' for the validation set
    "predictionsColumnName": "predictions"
}

In [None]:
project.add_dataframe(
    dataset_df=train_df,
    dataset_config=dataset_config
)

In [None]:
dataset_config["label"] = "validation"

project.add_dataframe(
    dataset_df=val_df,
    dataset_config=dataset_config
)

## <a id="model"> 3. Uploading a model</a>

[Back to top](#top)

Since we added predictions to the datasets above, we also need to specify the model used to get them. Feel free to refer to the documentation for the other model upload options.

In [None]:
model_config = {
    "metadata": {  # Can add anything here, as long as it is a dict
        "model_type": "Gradient Boosting Classifier",
        "regularization": "None",
        "encoder_used": "One Hot",
        "imputation": "Imputed with the training set's mean"
    },
    "classNames": dataset_config["classNames"],
    "featureNames": dataset_config["featureNames"],
    "categoricalFeatureNames": dataset_config["categoricalFeatureNames"],
}

In [None]:
project.add_model(
    model_config=model_config
)

## <a id="push"> 4. Committing and pushing</a>

[Back to top](#top)

In [None]:
project.commit("Initial commit!")

In [None]:
project.status()

In [None]:
project.push()