[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/openlayer-ai/examples-gallery/blob/main/tabular-classification/sklearn/fetal-health/fetal-health-sklearn.ipynb)


# Fetal health using sklearn

This notebook illustrates how sklearn models can be upladed to the Openlayer platform.

In [None]:
%%bash

if [ ! -e "requirements.txt" ]; then
    curl "https://raw.githubusercontent.com/openlayer-ai/examples-gallery/main/tabular-classification/sklearn/fetal-health/requirements.txt" --output "requirements.txt"
fi

In [None]:
!pip install -r requirements.txt

## Importing the modules and loading the dataset

In [1]:
import numpy as np
import pandas as pd


from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split

We have stored the dataset on the following S3 bucket. If, for some reason, you get an error reading the csv directly from it, feel free to copy and paste the URL in your browser and download the csv file. Alternatively, you can also find the dataset on [this Kaggle competition](https://www.kaggle.com/datasets/andrewmvd/fetal-health-classification?select=fetal_health.csv).

In [2]:
DATASET_URL = "https://openlayer-static-assets.s3.us-west-2.amazonaws.com/examples-datasets/tabular-classification/fetal_health.csv"

In [3]:
df = pd.read_csv(DATASET_URL)

In [4]:
df['fetal_health'] = df.fetal_health.astype(int)
df['fetal_health'] = df['fetal_health'].map({3: 0, 1:1, 2:2})

In [5]:
df

Unnamed: 0,baseline value,accelerations,fetal_movement,uterine_contractions,light_decelerations,severe_decelerations,prolongued_decelerations,abnormal_short_term_variability,mean_value_of_short_term_variability,percentage_of_time_with_abnormal_long_term_variability,...,histogram_min,histogram_max,histogram_number_of_peaks,histogram_number_of_zeroes,histogram_mode,histogram_mean,histogram_median,histogram_variance,histogram_tendency,fetal_health
0,120.0,0.000,0.000,0.000,0.000,0.0,0.0,73.0,0.5,43.0,...,62.0,126.0,2.0,0.0,120.0,137.0,121.0,73.0,1.0,2
1,132.0,0.006,0.000,0.006,0.003,0.0,0.0,17.0,2.1,0.0,...,68.0,198.0,6.0,1.0,141.0,136.0,140.0,12.0,0.0,1
2,133.0,0.003,0.000,0.008,0.003,0.0,0.0,16.0,2.1,0.0,...,68.0,198.0,5.0,1.0,141.0,135.0,138.0,13.0,0.0,1
3,134.0,0.003,0.000,0.008,0.003,0.0,0.0,16.0,2.4,0.0,...,53.0,170.0,11.0,0.0,137.0,134.0,137.0,13.0,1.0,1
4,132.0,0.007,0.000,0.008,0.000,0.0,0.0,16.0,2.4,0.0,...,53.0,170.0,9.0,0.0,137.0,136.0,138.0,11.0,1.0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2121,140.0,0.000,0.000,0.007,0.000,0.0,0.0,79.0,0.2,25.0,...,137.0,177.0,4.0,0.0,153.0,150.0,152.0,2.0,0.0,2
2122,140.0,0.001,0.000,0.007,0.000,0.0,0.0,78.0,0.4,22.0,...,103.0,169.0,6.0,0.0,152.0,148.0,151.0,3.0,1.0,2
2123,140.0,0.001,0.000,0.007,0.000,0.0,0.0,79.0,0.4,20.0,...,103.0,170.0,5.0,0.0,153.0,148.0,152.0,4.0,1.0,2
2124,140.0,0.001,0.000,0.006,0.000,0.0,0.0,78.0,0.4,27.0,...,103.0,169.0,6.0,0.0,152.0,147.0,151.0,4.0,1.0,2


## Splitting the data into training and validation sets

In [6]:
train, test = train_test_split(df, test_size=0.2)

In [7]:
x_train = train.loc[:, train.columns != 'fetal_health'].to_numpy()
y_train = train['fetal_health'].to_numpy()
x_test = test.loc[:, test.columns != 'fetal_health'].to_numpy()
y_test = test['fetal_health'].to_numpy()

## Training and evaluating the model's performance

In [8]:
sklearn_model = LogisticRegression(C=10, 
                                   penalty='l1',
                                   solver='saga',
                                   multi_class='multinomial',
                                   max_iter=10000)

In [9]:
sklearn_model.fit(x_train, y_train)

LogisticRegression(C=10, max_iter=10000, multi_class='multinomial',
                   penalty='l1', solver='saga')

In [10]:
print(classification_report(y_test, sklearn_model.predict(x_test)))

              precision    recall  f1-score   support

           0       0.81      0.65      0.72        34
           1       0.89      0.98      0.94       325
           2       0.74      0.46      0.57        67

    accuracy                           0.87       426
   macro avg       0.82      0.70      0.74       426
weighted avg       0.86      0.87      0.86       426



## Openlayer part!

### pip installing openlayer

In [None]:
!pip install openlayer

### Instantiating the client

In [11]:
import openlayer

client = openlayer.OpenlayerClient("YOUR_API_KEY_HERE")

### Creating a project on the platform

In [None]:
from openlayer.tasks import TaskType

project = client.create_or_load_project(name="Fetal Health Prediction",
                                        task_type=TaskType.TabularClassification,
                                        description="Evaluation of ML approaches to predict health")

### Uploading the validation set

In [None]:
dataset = project.add_dataframe(
    df=test,
    class_names=["Pathological", "Normal", "Suspect"],
    label_column_name='fetal_health',
    commit_message='this is my fetal health validation dataset',
    feature_names=test.loc[:, test.columns != 'fetal_health'].columns.values.tolist(),
)

### Uploading the model

First, it is important to create a `predict_proba` function, which is how Openlayer interacts with your model

In [16]:
def predict_proba(model, input_features: np.ndarray):
    return model.predict_proba(input_features)

Let's test the `predict_proba` function to make sure the input-output format is consistent with what Openlayer expects:

In [17]:
predict_proba(sklearn_model, test.iloc[:3, :-1])

array([[2.05737599e-02, 3.78148952e-01, 6.01277288e-01],
       [2.12130596e-06, 9.99412193e-01, 5.85685717e-04],
       [3.56683608e-03, 8.50312336e-01, 1.46120828e-01]])

Now, we can upload the model:

In [None]:
from openlayer.models import ModelType

model = project.add_model(
    function=predict_proba, 
    model=sklearn_model,
    model_type=ModelType.sklearn,
    class_names=["Pathological", "Normal", "Suspect"],
    name='Fetal Classifier - N3',
    commit_message='this is my first tabular classification model',
    feature_names=test.loc[:, test.columns != 'fetal_health'].columns.values.tolist(),
    train_sample_df=train[:100],
    train_sample_label_column_name='fetal_health',
    requirements_txt_file='requirements.txt'
)