## How to serialize my model

Giza_mlutils offers various functionalities that help us have a model with the necessary characteristics to be transpilable, and therefore, able to generate proofs of its inferences.
In this case, we will talk about the serialization process, which involves saving your model in a format that can be interpreted by other Giza tools.

Currently, the two supported models are XGBoost and LightGBM for both classification and regression. It is preferable that the training is done using the scikit-learn API.

Let's give a very simple example of how to perform this serialization.

### Train your model

In this case, we will train the four types of models supported by the package: lightgbm for classification and regression, and xgboost for classification and regression.
The datasets will be test datasets from scikit-learn: load_diabetes for regression and load_breast_cancer for classification.

In [2]:
# For this example, it is necessary to have both xgboost and lightgbm installed, but it is not necessary to have all packages installed to use giza_mlutils. 
# For this reason, we include this cell to ensure the notebook works correctly.

!pip install xgboost
!pip install lightgbm

Collecting xgboost
  Obtaining dependency information for xgboost from https://files.pythonhosted.org/packages/03/e6/4aef6799badc2693548559bad5b56d56cfe89eada337c815fdfe92175250/xgboost-2.0.3-py3-none-macosx_12_0_arm64.whl.metadata
  Using cached xgboost-2.0.3-py3-none-macosx_12_0_arm64.whl.metadata (2.0 kB)
Using cached xgboost-2.0.3-py3-none-macosx_12_0_arm64.whl (1.9 MB)
Installing collected packages: xgboost
Successfully installed xgboost-2.0.3

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Collecting lightgbm
  Using cached lightgbm-4.3.0-py3-none-macosx_14_0_arm64.whl
Installing collected packages: lightgbm
Successfully installed lightgbm-4.3.0

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m24.0

In [1]:
from sklearn.datasets import load_diabetes, load_breast_cancer
from sklearn.model_selection import train_test_split
import xgboost as xgb
import lightgbm as lgbm

In [2]:
data = load_diabetes()
X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

n_estimators = 30
max_depth = 4

xgb_reg = xgb.XGBRegressor(n_estimators=n_estimators, max_depth=max_depth)
xgb_reg.fit(X_train, y_train)

lgbm_reg = lgbm.LGBMRegressor(n_estimators=n_estimators, max_depth=max_depth)
lgbm_reg.fit(X_train, y_train)

[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000295 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 595
[LightGBM] [Info] Number of data points in the train set: 353, number of used features: 10
[LightGBM] [Info] Start training from score 153.736544


In [3]:
data = load_breast_cancer()
X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

n_estimators = 100
max_depth = 3

xgb_clf = xgb.XGBClassifier(n_estimators=n_estimators, max_depth=max_depth)
xgb_clf.fit(X_train, y_train)

lgbm_clf = lgbm.LGBMClassifier(n_estimators=n_estimators, max_depth=max_depth)
lgbm_clf.fit(X_train, y_train)

[LightGBM] [Info] Number of positive: 286, number of negative: 169
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000442 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 4548
[LightGBM] [Info] Number of data points in the train set: 455, number of used features: 30
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.628571 -> initscore=0.526093
[LightGBM] [Info] Start training from score 0.526093


### Serialize It

Once our models are trained, all we need to know is:

- The path where we want to save the model.
- The name we want to give to the model. The name must end in .json.

Note that it is not necessary to specify the type of model we have trained. The serializer package will understand which model it is and will apply the necessary transformations without needing to specify anything else.

In [4]:
from giza_mlutils.serializer.serialize import serialize_model

serialize_model(xgb_reg, "YOUR_PATH", "xgb_reg.json")
serialize_model(lgbm_reg, "YOUR_PATH", "lgbm_reg.json")
serialize_model(xgb_clf, "YOUR_PATH", "xgb_clf.json")
serialize_model(lgbm_clf, "YOUR_PATH", "lgbm_clf.json")

That simple! We now have our models saved in the correct format to use the rest of the Giza stack! But not so fast...
In this example, the models are very simple (few trees and shallow depth), but for other problems, the optimal architecture might be much more complex and not compatible with our current technology. In this case, we will have to use another of the functionalities offered by Giza_mlutils beforehand: our model_complexity_reducer.

To understand how the model_complexity_reducer (mcr) works, in this same folder you will find the notebook reduce_model_complexity.ipynb with a detailed explanation of its operation and how to run it before serializing your model.