[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/openlayer-ai/examples-gallery/blob/main/tabular-classification/sklearn/iris-classifier/iris-tabular-sklearn.ipynb)


# <a id="top">Iris classification using sklearn</a>

This notebook illustrates how sklearn models can be uploaded to the Openlayer platform.

## <a id="toc">Table of contents</a>

1. [**Getting the data and training the model**](#1)
    - [Downloading the dataset](#download)
    - [Preparing the data](#prepare)
    - [Training the model](#train)
    

2. [**Using Openlayer's Python API**](#2)
    - [Instantiating the client](#client)
    - [Creating a project](#project)
    - [Uploading datasets](#dataset)
    - [Uploading models](#model)
    - [Committing and pushing to the platform](#commit)

In [None]:
%%bash

if [ ! -e "requirements.txt" ]; then
    curl "https://raw.githubusercontent.com/openlayer-ai/examples-gallery/main/tabular-classification/sklearn/iris-classifier/requirements.txt" --output "requirements.txt"
fi

In [None]:
!pip install -r requirements.txt

## <a id="1"> 1. Getting the data and training the model </a>

[Back to top](#top)

In this first part, we will get the dataset, pre-process it, split it into training and validation sets, and train a model. Feel free to skim through this section if you are already comfortable with how these steps look for an sklearn model.   

In [None]:
import numpy as np

from sklearn import datasets
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split

### <a id="download">Downloading the dataset </a>

In [None]:
iris = datasets.load_iris()
X = iris.data[:, 0:2]  # we only take the first two features for visualization
y = iris.target

### <a id="prepare">Preparing the data</a>

In [None]:
x_train, x_val, y_train, y_val = train_test_split(X, y, test_size = 0.2, random_state = 0)

### <a id="train">Training the model</a>

In [None]:
sklearn_model = LogisticRegression(random_state=1300)
sklearn_model.fit(x_train, y_train)

In [None]:
print(classification_report(y_val, sklearn_model.predict(x_val)))

## <a id="2"> 2. Using Openlayer's Python API</a>

[Back to top](#top)

Now it's time to upload the datasets and model to the Openlayer platform.

In [None]:
!pip install openlayer

### <a id="client">Instantiating the client</a>

In [None]:
import openlayer

client = openlayer.OpenlayerClient("YOUR_API_KEY_HERE")

### <a id="project">Creating a project on the platform</a>

In [None]:
from openlayer.tasks import TaskType

project = client.create_or_load_project(name="Iris Prediction", 
                                        task_type=TaskType.TabularClassification,
                                        description="Evaluation of ML approaches to predict the iris")

### <a id="dataset">Uploading datasets</a>

In [None]:
import pandas as pd

feature_names = iris.feature_names[:2]
class_names = iris.target_names.tolist()

df_train = pd.DataFrame(x_train, columns=feature_names)
df_train["target"] = y_train
df_val = pd.DataFrame(x_val, columns=feature_names)
df_val["target"] = y_val

In [None]:
from openlayer.datasets import DatasetType

# Validation set
project.add_dataframe(
    df=df_val,
    dataset_type=DatasetType.Validation,
    class_names=class_names,
    label_column_name='target',
    feature_names=feature_names,
)

# Training set
project.add_dataframe(
    df=df_train,
    dataset_type=DatasetType.Training,
    class_names=class_names,
    label_column_name='target',
    feature_names=feature_names,
)

We can check that both datasets are now staged using the `project.status()` method. 

In [None]:
project.status()

### <a id="model">Uploading models</a>

To upload a model to Openlayer, you will need to create a model package, which is nothing more than a folder with all the necessary information to run inference with the model. The package should include the following:
1. A `requirements.txt` file listing the dependencies for the model.
2. Serialized model files, such as model weights, encoders, etc., in a format specific to the framework used for training (e.g. `.pkl` for sklearn, `.pb` for TensorFlow, and so on.)
3. A `prediction_interface.py` file that acts as a wrapper for the model and implements the `predict_proba` function. 
4. A `model_config.yaml` file that provides information about the model to the Openlayer platform, such as the framework used, feature names, and categorical feature names.

Lets prepare the model package one piece at a time
 

In [None]:
# Creating the model package folder (we'll call it `model_package`)
!mkdir model_package

**1. Adding the `requirements.txt` to the model package**

In [None]:
!scp requirements.txt model_package

**2. Serializing the model**

In [None]:
import pickle 

# Trained model
with open('model_package/model.pkl', 'wb') as handle:
    pickle.dump(sklearn_model, handle, protocol=pickle.HIGHEST_PROTOCOL)

**3. Writing the `prediction_interface.py` file**

In [None]:
%%writefile model_package/prediction_interface.py

import pickle
from pathlib import Path

import pandas as pd

PACKAGE_PATH = Path(__file__).parent


class SklearnModel:
    def __init__(self):
        """This is where the serialized objects needed should
        be loaded as class attributes."""

        with open(PACKAGE_PATH / "model.pkl", "rb") as model_file:
            self.model = pickle.load(model_file)

    def predict_proba(self, input_data_df: pd.DataFrame):
        """Makes predictions with the model. Returns the class probabilities."""
        return self.model.predict_proba(input_data_df)


def load_model():
    """Function that returns the wrapped model object."""
    return SklearnModel()

**4. Creating the `model_config.yaml`**

In [None]:
import yaml 

model_config = {
    "name": "Iris classification model",
    "model_type": "sklearn",
    "class_names": class_names,
    "feature_names":feature_names
}

with open('model_package/model_config.yaml', 'w') as model_config_file:
    yaml.dump(model_config, model_config_file, default_flow_style=False)

Lets check that the model package contains everything needed:

In [None]:
test_ = df_val.loc[:, df_val.columns != 'target']

In [None]:
from openlayer.validators import ModelValidator

model_validator = ModelValidator(
    model_package_dir="model_package", 
    sample_data = test_.iloc[:10, :]
)
model_validator.validate()

Now, we are ready to add the model:

In [None]:
project.add_model(
    model_package_dir="model_package",
    sample_data=test_.iloc[:10, :]
)

We can check that both datasets and model are staged using the `project.status()` method.

In [None]:
project.status()

### <a id="commit"> Committing and pushing to the platform </a>

Finally, we can commit the first project version to the platform. 

In [None]:
project.commit("Initial commit!")

In [None]:
project.status()

In [None]:
project.push()