<a href="https://colab.research.google.com/github/ntua-unit-of-control-and-informatics/jaqpot-google-collab-examples/blob/main/Scikit-learn-models/create-a-model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Create a Model

This example demonstrates how to create a model using `jaqpotpy` with a scikit-learn model. The following code will guide you through generating a dataset, training a logistic regression model, and making predictions.

First, we import the necessary libraries:

In [2]:
import pandas as pd
from sklearn.datasets import make_classification
from jaqpotpy.datasets import JaqpotpyDataset
from sklearn.linear_model import LogisticRegression
from jaqpotpy.models import SklearnModel

Next, we generate a small binary classification dataset:

In [3]:
X, y = make_classification(n_samples=100, n_features=4, random_state=42)

We then create a DataFrame with the features and target:

In [4]:
df = pd.DataFrame(X, columns=["X1", "X2", "X3", "X4"])
df["y"] = y

Now, we initialize a `JaqpotpyDataset` with the DataFrame:

In [5]:
dataset = JaqpotpyDataset(
    df=df,
    x_cols=["X1", "X2", "X3", "X4"],
    y_cols=["y"],
    task="binary_classification",
)

We wrap the scikit-learn model with Jaqpotpy's `SklearnModel`:

In [6]:
jaqpot_model = SklearnModel(dataset=dataset, model=LogisticRegression())

Next, we fit the model to the dataset:

In [7]:
jaqpot_model.fit()

Goodness-of-fit metrics on training set:
{'accuracy': 0.99, 'balancedAccuracy': 0.99, 'precision': array([1.  , 0.98]), 'recall': array([0.98039216, 1.        ]), 'f1Score': array([0.99009901, 0.98989899]), 'jaccard': array([0.98039216, 0.98      ]), 'matthewsCorrCoef': 0.9801960588196069, 'confusionMatrix': array([[[49,  1],
        [ 0, 50]],

       [[50,  0],
        [ 1, 49]]])}


We generate a small prediction dataset:

In [8]:
X_test, _ = make_classification(n_samples=5, n_features=4, random_state=42)

We create a DataFrame with the features:

In [9]:
df_test = pd.DataFrame(X_test, columns=["X1", "X2", "X3", "X4"])

We initialize a `JaqpotpyDataset` for prediction:

In [10]:
test_dataset = JaqpotpyDataset(
    df=df_test,
    x_cols=["X1", "X2", "X3", "X4"],
    y_cols=None,
    task="binary_classification",
)

Finally, we use the trained model to predict the classes of the new data and the estimate their claissification probabilities and print the predictions:

In [11]:
predictions = jaqpot_model.predict(test_dataset)
probabilities = jaqpot_model.predict_proba(test_dataset)
print(predictions)

[0 1 1 0 1]


This code snippet covers the entire process from dataset creation to model training and prediction using `jaqpotpy` and scikit-learn's `LogisticRegression`.