<a href="https://colab.research.google.com/github/ntua-unit-of-control-and-informatics/jaqpot-google-collab-examples/blob/main/Scikit-learn-models/evaluate-a-model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Evaluate a Model

In this example, we will demonstrate how to evaluate the robustness of a model using `jaqpotpy`. We will use a RandomForestRegressor model and perform various evaluations including cross-validation, external evaluation, and a randomization test.

In [1]:
# Install `jaqpotpy`
!pip install jaqpotpy

# Import necessary libraries
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestRegressor
from jaqpotpy.models import SklearnModel
from jaqpotpy.datasets import JaqpotTabularDataset
from jaqpotpy.descriptors import RDKitDescriptors

Collecting jaqpotpy
  Downloading jaqpotpy-7.1.0-py3-none-any.whl.metadata (4.0 kB)
Collecting jaqpot-api-client>=6.43.0 (from jaqpotpy)
  Downloading jaqpot_api_client-7.0.3-py3-none-any.whl.metadata (1.7 kB)
Collecting jaqpot-python-sdk>=6.0.2 (from jaqpotpy)
  Downloading jaqpot_python_sdk-6.2.3-py3-none-any.whl.metadata (2.0 kB)
Collecting onnx==1.18.0 (from jaqpotpy)
  Downloading onnx-1.18.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.9 kB)
Collecting onnxmltools>=1.12.0 (from jaqpotpy)
  Downloading onnxmltools-1.14.0-py2.py3-none-any.whl.metadata (8.1 kB)
Collecting onnxruntime>=1.19.0 (from jaqpotpy)
  Downloading onnxruntime-1.23.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (5.1 kB)
Collecting polling2>=0.5.0 (from jaqpotpy)
  Downloading polling2-0.5.0-py2.py3-none-any.whl.metadata (2.7 kB)
Collecting python-keycloak>=4.3.0 (from jaqpotpy)
  Downloading python_keycloak-7.0.2-py3-none-any.whl.metadata (6.0 kB)
Collecting rd

We start by creating a sample dataset with molecular structures represented as SMILES strings, along with temperature and activity values.

In [2]:
# Create sample data
data = pd.DataFrame(
    {
        "smiles": ["CC", "CCO", "CCC", "CCCl",
        "CCBr", "COC", "CCOCC", "CCCO",
        "CCCC", "CCCCCC",
    ],

        "temperature": np.random.randint(20, 37, size=10),

        "activity": [80, 81, 81, 84, 83.5,
        83, 89, 90, 91, 97,
    ],
    }
)

Next, we prepare the dataset for training using `JaqpotTabularDataset` and `RDKitDescriptors` for feature extraction.

In [4]:
featurizer = RDKitDescriptors()

# Prepare the dataset for training with Jaqpotpy
train_dataset = JaqpotTabularDataset(
    df=data,
    x_cols=["temperature"],
    y_cols=["activity"],
    smiles_cols=["smiles"],
    task="REGRESSION",
    featurizers=featurizer,
)

We then initialize a RandomForestRegressor model and wrap it with `SklearnModel` from `jaqpotpy`. The model is trained on the prepared dataset.

In [5]:
model = RandomForestRegressor(random_state=42)
jaqpot_model = SklearnModel(dataset=train_dataset, model=model)
jaqpot_model.random_seed = 1231
jaqpot_model.fit()

Goodness-of-fit metrics on training set:
{'r2': 0.9476271155760702, 'mae': 0.9069999999999994, 'rmse': 1.2027759558621054}


To estimate the model's performance, we perform cross-validation on the training data.

In [8]:
# Perform cross-validation on the training data
jaqpot_model.cross_validate(train_dataset, n_splits=5)

{'r2': 0.34961769766804024,
 'mae': 2.5109999999999983,
 'rmse': 2.7835837835765496}

We define a test dataset for external evaluation and prepare it using `JaqpotTabularDataset`.

In [9]:
# Define test data for external evaluation
X_test = pd.DataFrame(
    {
        "smiles": ["CCCOC", "CO"],
        "temperature": [27.0, 22.0],
        "activity": [89.0, 86.0],
    }
)

# Prepare the test dataset with Jaqpotpy
test_dataset = JaqpotTabularDataset(
    df=X_test,
    smiles_cols="smiles",
    x_cols=["temperature"],
    y_cols=["activity"],
    task="REGRESSION",
    featurizers=featurizer,
)

We evaluate the model on the test dataset to assess its performance on new/unseen data.

In [10]:
# Evaluate the model on the test dataset
jaqpot_model.evaluate(test_dataset)

predictions = jaqpot_model.predict(test_dataset)
print(predictions)

[89.48 82.01]


Finally, we conduct a randomization test to assess the model's robustness against randomization of target labels.

In [11]:
# Conducts a randomization test to assess the model's robustness
jaqpot_model.randomization_test(
    train_dataset=train_dataset,
    test_dataset=test_dataset,
    n_iters=10,
)

{'iteration_0': {'Train': {'r2': 0.7854517150873384,
   'mae': 1.8369999999999975,
   'rmse': 2.434411633228858},
  'Test': {'r2': -2.113644444444445,
   'mae': 2.6400000000000006,
   'rmse': 2.646828290615015}},
 'iteration_1': {'Train': {'r2': 0.7746349895918184,
   'mae': 1.9780000000000002,
   'rmse': 2.4950240479802988},
  'Test': {'r2': -2.537344444444437,
   'mae': 2.799999999999997,
   'rmse': 2.821174400847984}},
 'iteration_2': {'Train': {'r2': 0.8917207892116936,
   'mae': 1.521499999999999,
   'rmse': 1.7294341560175093},
  'Test': {'r2': -2.381138888888881,
   'mae': 2.052499999999995,
   'rmse': 2.758181012914124}},
 'iteration_3': {'Train': {'r2': 0.8332226445832202,
   'mae': 1.649499999999999,
   'rmse': 2.1463474788579786},
  'Test': {'r2': 0.3735444444444471,
   'mae': 1.1749999999999972,
   'rmse': 1.187234180774793}},
 'iteration_4': {'Train': {'r2': 0.742278124717169,
   'mae': 2.1724999999999994,
   'rmse': 2.668130900087175},
  'Test': {'r2': -10.765694444444456