# Predict US stocks closing movements

![](images/us.jpg)

**Description**: 
This dataset is extracted from the following Kaggle competition : https://www.kaggle.com/competitions/optiver-trading-at-the-close/overview. 

Kaggle competition brief : 
This dataset contains historic data for the daily ten minute closing auction on the NASDAQ stock exchange. Your challenge is to predict the future price movements of stocks relative to the price future price movement of a synthetic index composed of NASDAQ-listed stocks.

This is a forecasting competition using the time series API. The private leaderboard will be determined using real market data gathered after the submission period closes.

The objective of this project is to create a proof of concept of the use of mlflow without autlog with optuna, and best model extraction from mlflow.

[![](https://img.shields.io/badge/Python-white?logo=Python)](#) [![](https://img.shields.io/badge/sklearn-white?logo=scikit-learn)](#) [![](https://img.shields.io/badge/Google-white?logo=mlflow)](#) [![](https://img.shields.io/badge/Optuna-white?logo=)](#) [![](https://img.shields.io/badge/Dagshub-white?logo=)](#)


Skills developed: Model optimization 

# 0. Imports

In [211]:
!pip install pandas
!pip install mlflow
!pip install sklearn
!pip install optuna
!pip install dagshub
!pip install icecream
!pip install pickle
!pip install plotly
!pip install session_info

[31mERROR: Could not find a version that satisfies the requirement pickle (from versions: none)[0m[31m
[0m[31mERROR: No matching distribution found for pickle[0m[31m


In [212]:
# EDA
import pandas as pd
import seaborn as sns
import plotly.express as px

# Model
from sklearn.ensemble import GradientBoostingRegressor, RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, mean_absolute_error

# Optimisation & tracking
import mlflow
from mlflow.models import infer_signature
import optuna
import dagshub
import mlflow.pyfunc
import mlflow.sklearn
from mlflow import MlflowClient
from mlflow.entities import ViewType
from mlflow.data.pandas_dataset import PandasDataset

# Debugging
from icecream import ic
import pickle
import session_info

In [213]:
dagshub.init("US_stocks_prediction_project", "petoulemonde", mlflow=True)

ic.disable()

# 1. Load data

In [214]:
train_df = pd.read_csv("/kaggle/input/optiver-trading-at-the-close/train.csv")

print('The shape of the train data:', train_df.shape)

The shape of the train data: (5237980, 17)


# 2. EDA

In [215]:
print(train_df.info())
print("----")
print(train_df.describe().T)
print("----")
print(train_df.head())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5237980 entries, 0 to 5237979
Data columns (total 17 columns):
 #   Column                   Dtype  
---  ------                   -----  
 0   stock_id                 int64  
 1   date_id                  int64  
 2   seconds_in_bucket        int64  
 3   imbalance_size           float64
 4   imbalance_buy_sell_flag  int64  
 5   reference_price          float64
 6   matched_size             float64
 7   far_price                float64
 8   near_price               float64
 9   bid_price                float64
 10  bid_size                 float64
 11  ask_price                float64
 12  ask_size                 float64
 13  wap                      float64
 14  target                   float64
 15  time_id                  int64  
 16  row_id                   object 
dtypes: float64(11), int64(5), object(1)
memory usage: 679.4+ MB
None
----
                             count          mean           std          min  \
stock_id   

In [216]:
train_df = train_df.dropna().drop("row_id", axis = 1)

def summary(df):
    sum = pd.DataFrame(df.dtypes, columns=['dtypes'])
    sum['missing#'] = df.isna().sum()
    sum['missing%'] = (df.isna().sum())/len(df)
    sum['uniques'] = df.nunique().values
    sum['count'] = df.count().values
    #sum['skew'] = df.skew().values
    return sum

print(summary(train_df))

                          dtypes  missing#  missing%  uniques    count
stock_id                   int64         0       0.0      200  2343638
date_id                    int64         0       0.0      481  2343638
seconds_in_bucket          int64         0       0.0       25  2343638
imbalance_size           float64         0       0.0  1194405  2343638
imbalance_buy_sell_flag    int64         0       0.0        3  2343638
reference_price          float64         0       0.0    27174  2343638
matched_size             float64         0       0.0  1303496  2343638
far_price                float64         0       0.0    95739  2343638
near_price               float64         0       0.0    76175  2343638
bid_price                float64         0       0.0    26816  2343638
bid_size                 float64         0       0.0  1641319  2343638
ask_price                float64         0       0.0    26706  2343638
ask_size                 float64         0       0.0  1657589  2343638
wap   

In [217]:
# print(train_df.hist())

In [218]:
# print(sns.pairplot(train_df, hue = "target", corner = True))

In [219]:
# sns.heatmap(train_df[numerical_columns].corr(), annot = True, cmap='Blues', fmt='.2f')

# 3. Preprocessing & Feature engineering

In [220]:
X_train, X_val, y_train, y_val = train_test_split(train_df.drop("target", axis = 1).head(n = 500),
                                                   train_df["target"].head(n = 500), 
                                                   test_size = 0.3, 
                                                   random_state = 42)
ic(X_train.head())
ic(X_val.head())
ic(y_train.head())
ic(y_val.head())

6099    1.440048
5805    1.000166
6112   -5.440116
5889   -2.819896
5838   -4.510283
Name: target, dtype: float64

# 4. Models

In [221]:
tags = {"team": "Pierre-Etienne",
        "dataset": "US_stocks"}
ic("Initiate cell")

train_mlflow: PandasDataset = mlflow.data.from_pandas(pd.concat([X_train, y_train]))

with open("X_training_dataset.csv", "wb") as f:
    pickle.dump(X_train, f)
with open("y_training_dataset.csv", "wb") as f:
    pickle.dump(y_train, f)

def objective(trial):
    mlflow.start_run()

    classifier_name = trial.suggest_categorical('classifier', ['GradientBoostingRegressor', 'RandomForestRegressor'])

    if classifier_name == 'GradientBoostingRegressor':
        try : 
            n_estimators = trial.suggest_int('n_estimators', 50, 1000)
            subsample = trial.suggest_float('subsample', 0.1, 0.9)
            
            classifier_obj = GradientBoostingRegressor(
                n_estimators = n_estimators,
                subsample = subsample)
            
            predictions = classifier_obj.fit(X_train, y_train).predict(X_val)
            
            score = mean_squared_error(y_val, predictions)
            
            mlflow.log_param("n_estimators", n_estimators)
            mlflow.log_param('subsample', subsample)
            mlflow.log_param('model', classifier_name)
            mlflow.log_metric('rmse', score)
            mlflow.log_input(train_mlflow, context="training")
            
            signature = infer_signature(X_val, y_val)

            mlflow.sklearn.log_model(
                sk_model = classifier_obj,
                artifact_path = "sklearn-model",
                signature = signature,
                registered_model_name = "sk-learn-gradient-boosting-reg-model",
                input_example = train_df.iloc[[0]] , 
            )
            
            mlflow.set_tags(tags)

        except Exception as e : 
            print("Error in mlflow tracking")
            print(e)
        
    else:
        try : 
            n_estimators = trial.suggest_int('n_estimators', 50, 1000)
            
            classifier_obj = RandomForestRegressor(n_estimators = n_estimators)
        
            predictions = classifier_obj.fit(X_train, y_train).predict(X_val)
            
            score = mean_squared_error(y_val, predictions)
            
            mlflow.log_param("n_estimators", n_estimators)
            mlflow.log_param('model', classifier_name)
            mlflow.log_metric('rmse', score)
            mlflow.log_input(train_mlflow, context="training")
            
            signature = infer_signature(X_val, y_val)

            mlflow.sklearn.log_model(
                sk_model = classifier_obj,
                artifact_path = "sklearn-model",
                signature = signature,
                registered_model_name = "sk-learn-random-forest-reg-model",
                input_example = train_df.iloc[[0]] , 
            )
            
            mlflow.set_tags(tags)

        except Exception as e : 
            print("Error in mlflow tracking")
            print(e)

    mlflow.end_run()
    
    print("\n-----------\n")

    return score

study = optuna.create_study(study_name = "US_stocks_accuracy")
study.optimize(objective, n_trials = 20)

[I 2023-11-26 18:38:45,365] A new study created in memory with name: US_stocks_accuracy

Hint: Inferred schema contains integer column(s). Integer columns in Python cannot represent missing values. If your input data contains missing values at inference time, it will be encoded as floats and will cause a schema enforcement error. The best way to avoid this problem is to infer the model schema based on a realistic data sample (training dataset) that includes missing values. Alternatively, you can declare integer columns as doubles (float64) whenever these columns may have missing values. See `Handling Integers With Missing Values <https://www.mlflow.org/docs/latest/models.html#handling-integers-with-missing-values>`_ for more details.


Distutils was imported before Setuptools, but importing Setuptools also replaces the `distutils` module in `sys.modules`. This may lead to undesirable behaviors or errors. To avoid these issues, avoid using distutils directly, ensure that setuptools is i


-----------




Hint: Inferred schema contains integer column(s). Integer columns in Python cannot represent missing values. If your input data contains missing values at inference time, it will be encoded as floats and will cause a schema enforcement error. The best way to avoid this problem is to infer the model schema based on a realistic data sample (training dataset) that includes missing values. Alternatively, you can declare integer columns as doubles (float64) whenever these columns may have missing values. See `Handling Integers With Missing Values <https://www.mlflow.org/docs/latest/models.html#handling-integers-with-missing-values>`_ for more details.


Distutils was imported before Setuptools, but importing Setuptools also replaces the `distutils` module in `sys.modules`. This may lead to undesirable behaviors or errors. To avoid these issues, avoid using distutils directly, ensure that setuptools is installed in the traditional way (e.g. not an editable install), and/or make sure that se


-----------




Hint: Inferred schema contains integer column(s). Integer columns in Python cannot represent missing values. If your input data contains missing values at inference time, it will be encoded as floats and will cause a schema enforcement error. The best way to avoid this problem is to infer the model schema based on a realistic data sample (training dataset) that includes missing values. Alternatively, you can declare integer columns as doubles (float64) whenever these columns may have missing values. See `Handling Integers With Missing Values <https://www.mlflow.org/docs/latest/models.html#handling-integers-with-missing-values>`_ for more details.


Distutils was imported before Setuptools, but importing Setuptools also replaces the `distutils` module in `sys.modules`. This may lead to undesirable behaviors or errors. To avoid these issues, avoid using distutils directly, ensure that setuptools is installed in the traditional way (e.g. not an editable install), and/or make sure that se


-----------




Hint: Inferred schema contains integer column(s). Integer columns in Python cannot represent missing values. If your input data contains missing values at inference time, it will be encoded as floats and will cause a schema enforcement error. The best way to avoid this problem is to infer the model schema based on a realistic data sample (training dataset) that includes missing values. Alternatively, you can declare integer columns as doubles (float64) whenever these columns may have missing values. See `Handling Integers With Missing Values <https://www.mlflow.org/docs/latest/models.html#handling-integers-with-missing-values>`_ for more details.


Distutils was imported before Setuptools, but importing Setuptools also replaces the `distutils` module in `sys.modules`. This may lead to undesirable behaviors or errors. To avoid these issues, avoid using distutils directly, ensure that setuptools is installed in the traditional way (e.g. not an editable install), and/or make sure that se


-----------




Hint: Inferred schema contains integer column(s). Integer columns in Python cannot represent missing values. If your input data contains missing values at inference time, it will be encoded as floats and will cause a schema enforcement error. The best way to avoid this problem is to infer the model schema based on a realistic data sample (training dataset) that includes missing values. Alternatively, you can declare integer columns as doubles (float64) whenever these columns may have missing values. See `Handling Integers With Missing Values <https://www.mlflow.org/docs/latest/models.html#handling-integers-with-missing-values>`_ for more details.


Distutils was imported before Setuptools, but importing Setuptools also replaces the `distutils` module in `sys.modules`. This may lead to undesirable behaviors or errors. To avoid these issues, avoid using distutils directly, ensure that setuptools is installed in the traditional way (e.g. not an editable install), and/or make sure that se


-----------




Hint: Inferred schema contains integer column(s). Integer columns in Python cannot represent missing values. If your input data contains missing values at inference time, it will be encoded as floats and will cause a schema enforcement error. The best way to avoid this problem is to infer the model schema based on a realistic data sample (training dataset) that includes missing values. Alternatively, you can declare integer columns as doubles (float64) whenever these columns may have missing values. See `Handling Integers With Missing Values <https://www.mlflow.org/docs/latest/models.html#handling-integers-with-missing-values>`_ for more details.


Distutils was imported before Setuptools, but importing Setuptools also replaces the `distutils` module in `sys.modules`. This may lead to undesirable behaviors or errors. To avoid these issues, avoid using distutils directly, ensure that setuptools is installed in the traditional way (e.g. not an editable install), and/or make sure that se


-----------




Hint: Inferred schema contains integer column(s). Integer columns in Python cannot represent missing values. If your input data contains missing values at inference time, it will be encoded as floats and will cause a schema enforcement error. The best way to avoid this problem is to infer the model schema based on a realistic data sample (training dataset) that includes missing values. Alternatively, you can declare integer columns as doubles (float64) whenever these columns may have missing values. See `Handling Integers With Missing Values <https://www.mlflow.org/docs/latest/models.html#handling-integers-with-missing-values>`_ for more details.


Distutils was imported before Setuptools, but importing Setuptools also replaces the `distutils` module in `sys.modules`. This may lead to undesirable behaviors or errors. To avoid these issues, avoid using distutils directly, ensure that setuptools is installed in the traditional way (e.g. not an editable install), and/or make sure that se


-----------




Hint: Inferred schema contains integer column(s). Integer columns in Python cannot represent missing values. If your input data contains missing values at inference time, it will be encoded as floats and will cause a schema enforcement error. The best way to avoid this problem is to infer the model schema based on a realistic data sample (training dataset) that includes missing values. Alternatively, you can declare integer columns as doubles (float64) whenever these columns may have missing values. See `Handling Integers With Missing Values <https://www.mlflow.org/docs/latest/models.html#handling-integers-with-missing-values>`_ for more details.


Distutils was imported before Setuptools, but importing Setuptools also replaces the `distutils` module in `sys.modules`. This may lead to undesirable behaviors or errors. To avoid these issues, avoid using distutils directly, ensure that setuptools is installed in the traditional way (e.g. not an editable install), and/or make sure that se


-----------




Hint: Inferred schema contains integer column(s). Integer columns in Python cannot represent missing values. If your input data contains missing values at inference time, it will be encoded as floats and will cause a schema enforcement error. The best way to avoid this problem is to infer the model schema based on a realistic data sample (training dataset) that includes missing values. Alternatively, you can declare integer columns as doubles (float64) whenever these columns may have missing values. See `Handling Integers With Missing Values <https://www.mlflow.org/docs/latest/models.html#handling-integers-with-missing-values>`_ for more details.


Distutils was imported before Setuptools, but importing Setuptools also replaces the `distutils` module in `sys.modules`. This may lead to undesirable behaviors or errors. To avoid these issues, avoid using distutils directly, ensure that setuptools is installed in the traditional way (e.g. not an editable install), and/or make sure that se


-----------




Hint: Inferred schema contains integer column(s). Integer columns in Python cannot represent missing values. If your input data contains missing values at inference time, it will be encoded as floats and will cause a schema enforcement error. The best way to avoid this problem is to infer the model schema based on a realistic data sample (training dataset) that includes missing values. Alternatively, you can declare integer columns as doubles (float64) whenever these columns may have missing values. See `Handling Integers With Missing Values <https://www.mlflow.org/docs/latest/models.html#handling-integers-with-missing-values>`_ for more details.


Distutils was imported before Setuptools, but importing Setuptools also replaces the `distutils` module in `sys.modules`. This may lead to undesirable behaviors or errors. To avoid these issues, avoid using distutils directly, ensure that setuptools is installed in the traditional way (e.g. not an editable install), and/or make sure that se


-----------




Hint: Inferred schema contains integer column(s). Integer columns in Python cannot represent missing values. If your input data contains missing values at inference time, it will be encoded as floats and will cause a schema enforcement error. The best way to avoid this problem is to infer the model schema based on a realistic data sample (training dataset) that includes missing values. Alternatively, you can declare integer columns as doubles (float64) whenever these columns may have missing values. See `Handling Integers With Missing Values <https://www.mlflow.org/docs/latest/models.html#handling-integers-with-missing-values>`_ for more details.


Distutils was imported before Setuptools, but importing Setuptools also replaces the `distutils` module in `sys.modules`. This may lead to undesirable behaviors or errors. To avoid these issues, avoid using distutils directly, ensure that setuptools is installed in the traditional way (e.g. not an editable install), and/or make sure that se


-----------




Hint: Inferred schema contains integer column(s). Integer columns in Python cannot represent missing values. If your input data contains missing values at inference time, it will be encoded as floats and will cause a schema enforcement error. The best way to avoid this problem is to infer the model schema based on a realistic data sample (training dataset) that includes missing values. Alternatively, you can declare integer columns as doubles (float64) whenever these columns may have missing values. See `Handling Integers With Missing Values <https://www.mlflow.org/docs/latest/models.html#handling-integers-with-missing-values>`_ for more details.


Distutils was imported before Setuptools, but importing Setuptools also replaces the `distutils` module in `sys.modules`. This may lead to undesirable behaviors or errors. To avoid these issues, avoid using distutils directly, ensure that setuptools is installed in the traditional way (e.g. not an editable install), and/or make sure that se


-----------




Hint: Inferred schema contains integer column(s). Integer columns in Python cannot represent missing values. If your input data contains missing values at inference time, it will be encoded as floats and will cause a schema enforcement error. The best way to avoid this problem is to infer the model schema based on a realistic data sample (training dataset) that includes missing values. Alternatively, you can declare integer columns as doubles (float64) whenever these columns may have missing values. See `Handling Integers With Missing Values <https://www.mlflow.org/docs/latest/models.html#handling-integers-with-missing-values>`_ for more details.


Distutils was imported before Setuptools, but importing Setuptools also replaces the `distutils` module in `sys.modules`. This may lead to undesirable behaviors or errors. To avoid these issues, avoid using distutils directly, ensure that setuptools is installed in the traditional way (e.g. not an editable install), and/or make sure that se


-----------




Hint: Inferred schema contains integer column(s). Integer columns in Python cannot represent missing values. If your input data contains missing values at inference time, it will be encoded as floats and will cause a schema enforcement error. The best way to avoid this problem is to infer the model schema based on a realistic data sample (training dataset) that includes missing values. Alternatively, you can declare integer columns as doubles (float64) whenever these columns may have missing values. See `Handling Integers With Missing Values <https://www.mlflow.org/docs/latest/models.html#handling-integers-with-missing-values>`_ for more details.


Distutils was imported before Setuptools, but importing Setuptools also replaces the `distutils` module in `sys.modules`. This may lead to undesirable behaviors or errors. To avoid these issues, avoid using distutils directly, ensure that setuptools is installed in the traditional way (e.g. not an editable install), and/or make sure that se


-----------




Hint: Inferred schema contains integer column(s). Integer columns in Python cannot represent missing values. If your input data contains missing values at inference time, it will be encoded as floats and will cause a schema enforcement error. The best way to avoid this problem is to infer the model schema based on a realistic data sample (training dataset) that includes missing values. Alternatively, you can declare integer columns as doubles (float64) whenever these columns may have missing values. See `Handling Integers With Missing Values <https://www.mlflow.org/docs/latest/models.html#handling-integers-with-missing-values>`_ for more details.


Distutils was imported before Setuptools, but importing Setuptools also replaces the `distutils` module in `sys.modules`. This may lead to undesirable behaviors or errors. To avoid these issues, avoid using distutils directly, ensure that setuptools is installed in the traditional way (e.g. not an editable install), and/or make sure that se


-----------




Hint: Inferred schema contains integer column(s). Integer columns in Python cannot represent missing values. If your input data contains missing values at inference time, it will be encoded as floats and will cause a schema enforcement error. The best way to avoid this problem is to infer the model schema based on a realistic data sample (training dataset) that includes missing values. Alternatively, you can declare integer columns as doubles (float64) whenever these columns may have missing values. See `Handling Integers With Missing Values <https://www.mlflow.org/docs/latest/models.html#handling-integers-with-missing-values>`_ for more details.


Distutils was imported before Setuptools, but importing Setuptools also replaces the `distutils` module in `sys.modules`. This may lead to undesirable behaviors or errors. To avoid these issues, avoid using distutils directly, ensure that setuptools is installed in the traditional way (e.g. not an editable install), and/or make sure that se


-----------




Hint: Inferred schema contains integer column(s). Integer columns in Python cannot represent missing values. If your input data contains missing values at inference time, it will be encoded as floats and will cause a schema enforcement error. The best way to avoid this problem is to infer the model schema based on a realistic data sample (training dataset) that includes missing values. Alternatively, you can declare integer columns as doubles (float64) whenever these columns may have missing values. See `Handling Integers With Missing Values <https://www.mlflow.org/docs/latest/models.html#handling-integers-with-missing-values>`_ for more details.


Distutils was imported before Setuptools, but importing Setuptools also replaces the `distutils` module in `sys.modules`. This may lead to undesirable behaviors or errors. To avoid these issues, avoid using distutils directly, ensure that setuptools is installed in the traditional way (e.g. not an editable install), and/or make sure that se


-----------




Hint: Inferred schema contains integer column(s). Integer columns in Python cannot represent missing values. If your input data contains missing values at inference time, it will be encoded as floats and will cause a schema enforcement error. The best way to avoid this problem is to infer the model schema based on a realistic data sample (training dataset) that includes missing values. Alternatively, you can declare integer columns as doubles (float64) whenever these columns may have missing values. See `Handling Integers With Missing Values <https://www.mlflow.org/docs/latest/models.html#handling-integers-with-missing-values>`_ for more details.


Distutils was imported before Setuptools, but importing Setuptools also replaces the `distutils` module in `sys.modules`. This may lead to undesirable behaviors or errors. To avoid these issues, avoid using distutils directly, ensure that setuptools is installed in the traditional way (e.g. not an editable install), and/or make sure that se


-----------




Hint: Inferred schema contains integer column(s). Integer columns in Python cannot represent missing values. If your input data contains missing values at inference time, it will be encoded as floats and will cause a schema enforcement error. The best way to avoid this problem is to infer the model schema based on a realistic data sample (training dataset) that includes missing values. Alternatively, you can declare integer columns as doubles (float64) whenever these columns may have missing values. See `Handling Integers With Missing Values <https://www.mlflow.org/docs/latest/models.html#handling-integers-with-missing-values>`_ for more details.


Distutils was imported before Setuptools, but importing Setuptools also replaces the `distutils` module in `sys.modules`. This may lead to undesirable behaviors or errors. To avoid these issues, avoid using distutils directly, ensure that setuptools is installed in the traditional way (e.g. not an editable install), and/or make sure that se


-----------




Hint: Inferred schema contains integer column(s). Integer columns in Python cannot represent missing values. If your input data contains missing values at inference time, it will be encoded as floats and will cause a schema enforcement error. The best way to avoid this problem is to infer the model schema based on a realistic data sample (training dataset) that includes missing values. Alternatively, you can declare integer columns as doubles (float64) whenever these columns may have missing values. See `Handling Integers With Missing Values <https://www.mlflow.org/docs/latest/models.html#handling-integers-with-missing-values>`_ for more details.


Distutils was imported before Setuptools, but importing Setuptools also replaces the `distutils` module in `sys.modules`. This may lead to undesirable behaviors or errors. To avoid these issues, avoid using distutils directly, ensure that setuptools is installed in the traditional way (e.g. not an editable install), and/or make sure that se


-----------



# 5. Optuna results

In [222]:
print(study.best_trial.params)

print("Best rmse obtained : ", study.best_trial.value)

{'classifier': 'RandomForestRegressor', 'n_estimators': 129}
Best rmse obtained :  44.435061677682725


In [224]:
fig = optuna.visualization.plot_optimization_history(study)
fig.show()

In [225]:
fig = optuna.visualization.plot_param_importances(study)
fig.show()

# 6. Conclusion

In [226]:
logged_model = 'runs:/fb9d6131cbd54adca3a90b3b3f174e1e/sklearn-model' # Identifiction of best Mlflow with mlflow UI

# Load model as a PyFuncModel.
loaded_model = mlflow.pyfunc.load_model(logged_model)

# Predict on a Pandas DataFrame.
import pandas as pd
predictions = loaded_model.predict(X_val)
print("Best RMSE obtained from all trials : ", mean_squared_error(y_val, predictions))

Best RMSE obtained from all trials :  44.48718451296336


In [227]:
with open("my_object.pickle", "wb") as f:
    pickle.dump(loaded_model, f)

In [228]:
with open("my_object.pickle", "rb") as f:
    loaded_object = pickle.load(f)
    
# Predict on a Pandas DataFrame.
import pandas as pd
predictions = loaded_model.predict(X_val)
print(mean_squared_error(y_val, predictions))   

print("The best model is saved !")

px.scatter(
           y_val, 
           predictions, 
           labels = {
               "x" : "Target variable of validation dataset (x) ", 
               "index": "Predictions from best model (y) "}
           , title = "Comparison between predictions values (x) and real values (y)")

44.48718451296336
The best model is saved !


# 7. Session info