# Documentation du script de training de recommandations produits en cas de rupture de stock 

## 1. Objectifs du script

Ce script a pour but de :
- Construire un dataset de substitution de produits (transactions, produits, substitutions).
- Effectuer du feature engineering pour g√©n√©rer des variables pertinentes pour pr√©dire l‚Äôacceptation d‚Äôune substitution.
- Pr√©parer les donn√©es via un pipeline scikit-learn pour la normalisation et l‚Äôencodage.
- Tester des mod√®les LGBMClassifier (classification binaire) et LGBMRanker (ranking) avec Hyperopt pour l‚Äôoptimisation des hyperparam√®tres.
- Enregistrer automatiquement tous les r√©sultats, m√©triques et mod√®les dans MLflow.
- Impl√©menter early stopping et pruning pour arr√™ter les mod√®les non performants rapidement.

In [3]:
print("""
Pipeline Recommandation Produits de Substitution
================================================

1Ô∏è‚É£ Chargement des donn√©es
   - substitutions.csv
   - produits.csv
   - transactions.csv

2Ô∏è‚É£ Construction du dataset + feature engineering
   [Transactions] --merge--> [Substitutions] --merge--> [Produits]
        |                                   |
        |                                   v
        |                             [Features Original / Subst]
        v
   [Label binaire: estAcceptee_bin]
   [Features suppl√©mentaires: DiffPrix, MemeMarque, MemeNutriscore, MemeBio, Month, Day_of_week_name, ...]

3Ô∏è‚É£ Preprocessing
   - Num√©rique: Imputer (median) + StandardScaler
   - Cat√©goriel: Imputer (most_frequent) + OneHotEncoder
   - Combine via ColumnTransformer

4Ô∏è‚É£ Split train / val (temporel)
   - X_train, y_train
   - X_val, y_val
   - group_train, group_val (pour LGBMRanker)

5Ô∏è‚É£ Hyperopt + MLflow
   - D√©finition des espaces de recherche (num_leaves, learning_rate, n_estimators, etc.)
   - Objectif:
       - LGBMClassifier: maximiser AUC
       - LGBMRanker: maximiser NDCG@3
   - Early stopping natif LightGBM
   - Logging dans MLflow:
       - params
       - metrics (AUC, NDCG, hit_rate, etc.)
       - mod√®le
   - Pruning simple:
       - classifier: abandon si auc < 0.55
       - ranker: abandon si ndcg@3 < 0.05

6Ô∏è‚É£ R√©sultat
   - Best hyperparameters
   - Best model enregistr√© pour production
   - Comparaison automatique via MLflow UI

7Ô∏è‚É£ Production
   - Pipeline complet pr√™t √† servir pour la recommandation de produits de substitution
""")



Pipeline Recommandation Produits de Substitution

1Ô∏è‚É£ Chargement des donn√©es
   - substitutions.csv
   - produits.csv
   - transactions.csv

2Ô∏è‚É£ Construction du dataset + feature engineering
   [Transactions] --merge--> [Substitutions] --merge--> [Produits]
        |                                   |
        |                                   v
        |                             [Features Original / Subst]
        v
   [Label binaire: estAcceptee_bin]
   [Features suppl√©mentaires: DiffPrix, MemeMarque, MemeNutriscore, MemeBio, Month, Day_of_week_name, ...]

3Ô∏è‚É£ Preprocessing
   - Num√©rique: Imputer (median) + StandardScaler
   - Cat√©goriel: Imputer (most_frequent) + OneHotEncoder
   - Combine via ColumnTransformer

4Ô∏è‚É£ Split train / val (temporel)
   - X_train, y_train
   - X_val, y_val
   - group_train, group_val (pour LGBMRanker)

5Ô∏è‚É£ Hyperopt + MLflow
   - D√©finition des espaces de recherche (num_leaves, learning_rate, n_estimators, etc.)
   - Objecti

#### A. Import et pr√©paration de l'environnement

In [2]:
import pandas as pd
import numpy as np

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
from sklearn.metrics import (
    roc_auc_score, average_precision_score, log_loss,
    precision_score, recall_score, f1_score
)

import mlflow
import mlflow.lightgbm
from hyperopt import fmin, tpe, hp, Trials, STATUS_OK
from lightgbm import LGBMClassifier, LGBMRanker, early_stopping, log_evaluation

from pathlib import Path
from sklearn.linear_model import LogisticRegression
from xgboost import XGBClassifier
from catboost import CatBoostClassifier


  import pkg_resources


#### B. D√©tection de la racine du projet et chemins relatifs

In [15]:
def find_project_root(marker=".git"):
    path = Path().resolve()
    while path != path.parent:
        if (path / marker).exists():
            return path
        path = path.parent
    raise FileNotFoundError(f"Project root with {marker} not found")
ROOT_DIR = find_project_root()
DATA_RAW_DIR = ROOT_DIR / "data" / "raw"
MLFLOW_TRACKING_URI = "http://localhost:5555"
EXP_NAME = "stockout_substitution_hyperopt_classifier_ranker_6"


#### C. Chargement des donn√©es

In [4]:
#TODO connecter directement via utils de GCP
substitutions = pd.read_csv(DATA_RAW_DIR / "substitutions" / "raw_substitutions_substitutions.csv")
produits = pd.read_csv(DATA_RAW_DIR / "produits" / "raw_produits_produits.csv")
transactions = pd.read_csv(DATA_RAW_DIR / "transactions" / "raw_transactions_transactions.csv")

#### D. Construction du dataset et feature engineering

Explications :
- On fait un merge pour r√©cup√©rer toutes les informations sur le produit original et la substitution.
- Cr√©ation de features binaires et num√©riques pertinentes :
- MemeMarque, MemeNutriscore, DiffPrix, etc.
- Extraction du mois et jour de la semaine depuis la date.

In [5]:
def merge_and_add_suffix(df_add_suffix, df_keep, suffix, column_to_merge):
    df_add = df_add_suffix.copy()
    df_add = df_add.rename(columns={c: c + suffix for c in df_add.columns if c != column_to_merge})
    return pd.merge(df_keep, df_add, left_on=column_to_merge, right_on=column_to_merge, how='left')

def build_dataset(transactions, substitutions, produits):
    subs_prod_orig = merge_and_add_suffix(produits, substitutions, 'Original', 'idProduitOriginal')
    subs_prod_orig_subst = merge_and_add_suffix(produits, subs_prod_orig, 'Substitution', 'idProduitSubstitution')
    df = pd.merge(transactions, subs_prod_orig_subst,
                  left_on=['idProduit','idTransaction'],
                  right_on=['idProduitOriginal','idTransaction'], how='inner')
    df['estAcceptee_bin'] = (~df['estAcceptee']).astype(int)
    df['date'] = pd.to_datetime(df['dateHeureTransaction'])
    df['Month'] = df['date'].dt.month
    df['Day_of_week_name'] = df['date'].dt.day_name()
    df["MemeMarque"] = (df["marqueOriginal"] == df["marqueSubstitution"]).astype(int)
    df["MemeNutriscore"] = (df["nutriscoreOriginal"] == df["nutriscoreSubstitution"]).astype(int)
    df["MemeConditionnement"] = (df["conditionnementOriginal"] == df["conditionnementSubstitution"]).astype(int)
    df["MemeTypeMarque"] = (df["typeMarqueOriginal"] == df["typeMarqueSubstitution"]).astype(int)
    df["DiffPrix"] = df["prixSubstitution"] - df["prixOriginal"]
    df["MemeBio"] = ((df["estBioOriginal"] == True) & (df["estBioSubstitution"] == True)).astype(int)
    return df

df = build_dataset(transactions, substitutions, produits)


#### E. Preprocessing

Explications:
- Standardisation et imputation pour les variables num√©riques.
- One-Hot Encoding et imputation pour les variables cat√©gorielles.
- ColumnTransformer combine les deux types de transformations.

In [6]:
features_num = ["DiffPrix", "MemeMarque", "MemeNutriscore", "MemeBio",
                "prixOriginal", "MemeConditionnement", "MemeTypeMarque", "estBioOriginal", "Month"]
features_cat = ["categorieOriginal", "marqueOriginal", "typeMarqueOriginal", "nutriscoreOriginal",
                "origineOriginal", "conditionnementOriginal", "categorieSubstitution",
                "typeMarqueSubstitution", "origineSubstitution", "Day_of_week_name"]

X = df[features_num + features_cat]
y = df["estAcceptee_bin"]

# Split temporel
cutoff_idx = int(len(df) * 0.8)
X_train_raw, X_val_raw = X.iloc[:cutoff_idx], X.iloc[cutoff_idx:]
y_train, y_val = y.iloc[:cutoff_idx], y.iloc[cutoff_idx:]

# Pipeline
numeric_transformer = Pipeline([
    ("imputer", SimpleImputer(strategy="median")),
    ("scaler", StandardScaler())
])
categorical_transformer = Pipeline([
    ("imputer", SimpleImputer(strategy="most_frequent")),
    ("onehot", OneHotEncoder(handle_unknown='ignore'))
])
preprocessor = ColumnTransformer([
    ("num", numeric_transformer, features_num),
    ("cat", categorical_transformer, features_cat)
])

X_train = preprocessor.fit_transform(X_train_raw)
X_val = preprocessor.transform(X_val_raw)

#### F. Gestion des groupes pour LGBMRanker

Explications:
- LGBMRanker n√©cessite un vecteur indiquant le nombre de produits par stockout event.
- Chaque idTransaction repr√©sente un groupe.

In [7]:
group_train = df.iloc[:cutoff_idx].groupby('idTransaction').size().to_numpy()
group_val = df.iloc[cutoff_idx:].groupby('idTransaction').size().to_numpy()

#### G. Espace de recherche Hyperopt (space_lgbm_ranker)

Explications:
- D√©finit les hyperparam√®tres que Hyperopt va explorer pour optimiser le LGBMRanker.
- num_leaves, n_estimators, min_child_samples sont des entiers (d‚Äôo√π la conversion plus tard).
- learning_rate est sur une √©chelle logarithmique pour couvrir efficacement les petits et grands taux d‚Äôapprentissage.
- subsample et colsample_bytree contr√¥lent la variance du mod√®le et la r√©gularisation.

In [8]:
space_lgbm_ranker = {
    "num_leaves": hp.quniform("num_leaves", 31, 127, 1),
    "learning_rate": hp.loguniform("learning_rate", np.log(0.01), np.log(0.2)),
    "n_estimators": hp.quniform("n_estimators", 300, 1200, 50),
    "min_child_samples": hp.quniform("min_child_samples", 20, 100, 5),
    "subsample": hp.uniform("subsample", 0.7, 1.0),
    "colsample_bytree": hp.uniform("colsample_bytree", 0.7, 1.0),
}

#### H. Fonction de calcul des m√©triques de ranking (compute_ranking_metrics)

Explications:
- group : vecteur indiquant la taille de chaque transaction / √©v√©nement. Pour chaque groupe :
    - Trier les pr√©dictions par score d√©croissant.
    - Calculer ndcg@k et hit_rate@k pour k = 1, 3, 5.

ndcg@k (Normalized Discounted Cumulative Gain): mesure la qualit√© du ranking en pond√©rant les positions des items pertinents.
hit_rate@k: indique si au moins un produit pertinent appara√Æt dans le top k.

=> Retourne un dictionnaire avec toutes les m√©triques moyennes sur tous les groupes.

In [9]:
def compute_ranking_metrics(y_true, y_score, group, ks=(1,3,5)):
    metrics = {}
    idx = 0
    per_k = {k: {"ndcg": [], "hit": []} for k in ks}
    for g in group:
        y_g = np.asarray(y_true[idx: idx + g])
        s_g = np.asarray(y_score[idx: idx + g])
        idx += g
        order = np.argsort(-s_g)
        rels_sorted = y_g[order]
        for k in ks:
            rels_k = rels_sorted[:k]
            per_k[k]["ndcg"].append(np.sum(rels_k / np.log2(np.arange(2, len(rels_k)+2))))
            per_k[k]["hit"].append(float(np.any(rels_k > 0)))
    for k in ks:
        metrics[f"ndcg_at_{k}"] = float(np.mean(per_k[k]["ndcg"]))
        metrics[f"hit_rate_at_{k}"] = float(np.mean(per_k[k]["hit"]))

    return metrics


In [18]:
def compute_classification_metrics(y_true, y_proba, threshold=0.5):
    y_pred = (y_proba >= threshold).astype(int)
    return {
        "auc": float(roc_auc_score(y_true, y_proba)) if len(np.unique(y_true)) > 1 else np.nan,
        "pr_auc": float(average_precision_score(y_true, y_proba)),
        "logloss": float(log_loss(y_true, y_proba)),
        "precision": float(precision_score(y_true, y_pred, zero_division=0)),
        "recall": float(recall_score(y_true, y_pred, zero_division=0)),
        "f1": float(f1_score(y_true, y_pred, zero_division=0)),
    }

#### Dictionnaires des mod√®les

In [10]:
models = {
    "LogReg": {"model_class": LogisticRegression, "param_grid": {"C": [0.1,1.0,10.0], "penalty":["l2"]}, "fixed_params": {"solver":"lbfgs","max_iter":2000,"n_jobs":-1,"random_state":42}, "type":"classification"},
    "XGBClassifier": {"model_class": XGBClassifier, "param_grid": {"n_estimators":[500,1000],"max_depth":[4,6,8],"learning_rate":[0.03,0.05,0.1],"subsample":[0.7,0.9,1.0],"colsample_bytree":[0.7,0.9,1.0],"min_child_weight":[1,5,10],"reg_alpha":[0.0,0.1,1.0],"reg_lambda":[1.0,2.0,5.0]}, "fixed_params":{"objective":"binary:logistic","eval_metric":"auc","tree_method":"hist","random_state":42,"n_jobs":-1}, "type":"classification"},
    "LGBMClassifier": {"model_class": LGBMClassifier, "param_grid":{"num_leaves":[31,63,127],"learning_rate":[0.03,0.05,0.1],"n_estimators":[500,1000],"min_child_samples":[20,50,100],"subsample":[0.7,0.9,1.0],"colsample_bytree":[0.7,0.9,1.0],"reg_alpha":[0.0,0.1,1.0],"reg_lambda":[0.0,0.1,1.0]}, "fixed_params":{"objective":"binary","metric":"auc","random_state":42,"n_jobs":-1}, "type":"classification"},
    "CatBoostClassifier": {"model_class": CatBoostClassifier, "param_grid":{"depth":[6,8,10],"learning_rate":[0.03,0.05,0.1],"iterations":[500,1000],"l2_leaf_reg":[1,3,5,9],"subsample":[0.7,0.9,1.0],"rsm":[0.7,0.9,1.0]}, "fixed_params":{"loss_function":"Logloss","eval_metric":"AUC","random_seed":42,"verbose":0}, "type":"classification"},
    "LGBMRanker": {"model_class": LGBMRanker, "param_grid":{"num_leaves":[31,63,127],"learning_rate":[0.03,0.05,0.1],"n_estimators":[500,1000],"min_child_samples":[20,50,100],"subsample":[0.7,0.9,1.0],"colsample_bytree":[0.7,0.9,1.0]}, "fixed_params":{"objective":"lambdarank","metric":"ndcg","random_state":42,"n_jobs":-1}, "type":"ranking"}
}

#### I. Fonction objectif Hyperopt (objective_lgbm_ranker)

Explications:
- **Conversion des hyperparam√®tres** en entiers si n√©cessaire (num_leaves, n_estimators, min_child_samples).
- **MLflow** : chaque appel √† la fonction ouvre un run nested pour enregistrer param√®tres et m√©triques.
- LGBMRanker.fit:
        - group_train : indique la taille des groupes (transactions).
        - eval_set et eval_group pour monitoring sur validation.
        - callbacks : early stopping + logging sans verbosit√©.

- **M√©triques** : calcul via compute_ranking_metrics.
- **Pruning** : si ndcg@3 < 0.05, on abandonne ce mod√®le pour √©conomiser du temps.
- **La loss** retourn√©e est n√©gative pour que Hyperopt maximise ndcg@3.

In [11]:
def safe_log_metrics(metrics_dict):
    safe_metrics = {}
    for k, v in metrics_dict.items():
        try:
            safe_metrics[k] = float(v)
        except (TypeError, ValueError):
            safe_metrics[k] = np.nan
    mlflow.log_metrics(safe_metrics)


In [12]:
def objective_ranker(params, X_train, y_train, X_val, y_val, group_train, group_val, model_name):
    params = {k:int(v) if k in ["num_leaves","n_estimators","min_child_samples"] else v for k,v in params.items()}
    
    with mlflow.start_run(nested=True):
        mlflow.set_tag("model_name", model_name)
        mlflow.set_tag("model_type", "ranking")

        model = LGBMRanker(**params)
        model.fit(
            X_train, y_train,
            group=group_train,
            eval_set=[(X_val, y_val)],
            eval_group=[group_val],
            eval_metric="ndcg",
            callbacks=[early_stopping(stopping_rounds=50, verbose=False),
                       log_evaluation(period=0)]
        )
        best_iter = model.best_iteration_
        mlflow.log_metric("best_iteration", best_iter)

        scores = model.predict(X_val, num_iteration=best_iter)
        metrics = compute_ranking_metrics(y_val, scores, group_val)

        mlflow.log_params(params)
        safe_log_metrics(metrics)
        mlflow.lightgbm.log_model(model, "model")
        
        return {"loss": -metrics["ndcg_at_3"], "status": STATUS_OK}


In [13]:
def objective_classifier(params, model_class, X_train, y_train, X_val, y_val, model_name):
    params = {k:int(v) if isinstance(v,float) and v.is_integer() else v for k,v in params.items()}
    
    with mlflow.start_run(nested=True):
        mlflow.set_tag("model_name", model_name)
        mlflow.set_tag("model_type", "classification")

        model = model_class(**params)
        model.fit(X_train, y_train)

        y_proba = model.predict_proba(X_val)[:,1]
        metrics = compute_classification_metrics(y_val, y_proba)

        mlflow.log_params(params)
        safe_log_metrics(metrics)

        if isinstance(model,LGBMClassifier):
            mlflow.lightgbm.log_model(model,"model")

        return {"loss": -metrics["auc"], "status": STATUS_OK}

In [16]:
mlflow.set_tracking_uri(MLFLOW_TRACKING_URI)  # ou localhost si accessible
try:
    mlflow.create_experiment(EXP_NAME)
except mlflow.exceptions.MlflowException:
    pass
mlflow.set_experiment(EXP_NAME)

<Experiment: artifact_location='mlflow-artifacts:/7', creation_time=1767777786934, experiment_id='7', last_update_time=1767777786934, lifecycle_stage='active', name='stockout_substitution_hyperopt_classifier_ranker_6', tags={}>

#### J. Lancement de l'optimisation Hyperopt

Explications :
- fmin : trouve les meilleurs hyperparam√®tres pour maximiser ndcg@3.
- algo=tpe.suggest : utilise **l‚Äôalgorithme Tree-structured Parzen Estimator** pour l‚Äôoptimisation bay√©sienne.
- max_evals=50 : nombre d‚Äôit√©rations Hyperopt.
- trials_ranker : enregistre tous les essais, m√©triques et hyperparam√®tres.
- rstate : garantit la **reproductibilit√©.**

In [19]:
for name, cfg in models.items():
    print(f"=== Optimisation {name} ===")
    
    # Cr√©ation de l'espace Hyperopt
    space = {k: hp.choice(k, v) if isinstance(v, list) else v for k, v in cfg["param_grid"].items()}
    trials = Trials()
    
    if cfg["type"] == "classification":
        fmin(
            fn=lambda p: objective_classifier(
                p,
                cfg["model_class"],
                X_train,
                y_train,
                X_val,
                y_val,
                model_name=name
            ),
            space=space,
            algo=tpe.suggest,
            max_evals=20,
            trials=trials,
            rstate=np.random.default_rng(42)
        )
    else:
        fmin(
            fn=lambda p: objective_ranker(
                p,
                X_train,
                y_train,
                X_val,
                y_val,
                group_train,
                group_val,
                model_name=name
            ),
            space=space,
            algo=tpe.suggest,
            max_evals=20,
            trials=trials,
            rstate=np.random.default_rng(42)
        )


=== Optimisation LogReg ===
  0%|          | 0/20 [00:00<?, ?trial/s, best loss=?]




üèÉ View run magnificent-gnu-774 at: http://localhost:5555/#/experiments/7/runs/207c4f55a6514de6995aa7388639bc67

üß™ View experiment at: http://localhost:5555/#/experiments/7

  5%|‚ñå         | 1/20 [00:00<00:10,  1.82trial/s, best loss: -0.6990719003179975]




üèÉ View run nosy-squid-670 at: http://localhost:5555/#/experiments/7/runs/604cb181601f4044827c9051f2a5fcc3

üß™ View experiment at: http://localhost:5555/#/experiments/7                     

 10%|‚ñà         | 2/20 [00:01<00:15,  1.16trial/s, best loss: -0.6990719003179975]




üèÉ View run dashing-cow-591 at: http://localhost:5555/#/experiments/7/runs/b4c76c9c85114a39bec4c49a874043cf

üß™ View experiment at: http://localhost:5555/#/experiments/7                     

 15%|‚ñà‚ñå        | 3/20 [00:02<00:11,  1.43trial/s, best loss: -0.6990719003179975]




üèÉ View run gentle-yak-101 at: http://localhost:5555/#/experiments/7/runs/48c8aac862184bad817f478372f22ebf

üß™ View experiment at: http://localhost:5555/#/experiments/7                     

 20%|‚ñà‚ñà        | 4/20 [00:02<00:09,  1.61trial/s, best loss: -0.6990719003179975]




üèÉ View run clumsy-jay-826 at: http://localhost:5555/#/experiments/7/runs/e433da9f84d24e35a99dcfa2d2b57972

üß™ View experiment at: http://localhost:5555/#/experiments/7                     

 25%|‚ñà‚ñà‚ñå       | 5/20 [00:03<00:08,  1.70trial/s, best loss: -0.6990719003179975]




üèÉ View run unleashed-lark-730 at: http://localhost:5555/#/experiments/7/runs/ff5ec4c82cf54734a438930bb7362e7e

üß™ View experiment at: http://localhost:5555/#/experiments/7                     

 30%|‚ñà‚ñà‚ñà       | 6/20 [00:03<00:07,  1.78trial/s, best loss: -0.6990719003179975]




üèÉ View run mysterious-fowl-816 at: http://localhost:5555/#/experiments/7/runs/7a1c806686a44212aaa311e4d66a2989

üß™ View experiment at: http://localhost:5555/#/experiments/7                     

 35%|‚ñà‚ñà‚ñà‚ñå      | 7/20 [00:04<00:07,  1.84trial/s, best loss: -0.6990719003179975]




üèÉ View run amazing-bird-553 at: http://localhost:5555/#/experiments/7/runs/643522ce67774ac198205b43ca586f23

üß™ View experiment at: http://localhost:5555/#/experiments/7                     

 40%|‚ñà‚ñà‚ñà‚ñà      | 8/20 [00:04<00:06,  1.91trial/s, best loss: -0.6990719003179975]




üèÉ View run aged-ray-967 at: http://localhost:5555/#/experiments/7/runs/ba4f7b91acc1478e85c8f70b038dd6fe

üß™ View experiment at: http://localhost:5555/#/experiments/7                     

 45%|‚ñà‚ñà‚ñà‚ñà‚ñå     | 9/20 [00:05<00:05,  1.96trial/s, best loss: -0.6990719003179975]




üèÉ View run debonair-robin-192 at: http://localhost:5555/#/experiments/7/runs/c79b77a58e3a45889308ad6c9ae54ba8

üß™ View experiment at: http://localhost:5555/#/experiments/7                     

 50%|‚ñà‚ñà‚ñà‚ñà‚ñà     | 10/20 [00:05<00:05,  1.96trial/s, best loss: -0.6990719003179975]




üèÉ View run stylish-hog-787 at: http://localhost:5555/#/experiments/7/runs/544c5be0fa2f42dfa7bf0a74078973a1

üß™ View experiment at: http://localhost:5555/#/experiments/7                      

 55%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå    | 11/20 [00:06<00:04,  1.97trial/s, best loss: -0.6990719003179975]




üèÉ View run bald-whale-841 at: http://localhost:5555/#/experiments/7/runs/ecab462da3314b8786f955a2f79b082e

üß™ View experiment at: http://localhost:5555/#/experiments/7                      

 60%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà    | 12/20 [00:06<00:04,  1.97trial/s, best loss: -0.6990719003179975]




üèÉ View run nimble-snake-629 at: http://localhost:5555/#/experiments/7/runs/02aab333e6da4656975ff9b3e2648fda

üß™ View experiment at: http://localhost:5555/#/experiments/7                      

 65%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå   | 13/20 [00:07<00:03,  1.98trial/s, best loss: -0.6990719003179975]




üèÉ View run blushing-snake-989 at: http://localhost:5555/#/experiments/7/runs/db8eab9650ec4fd8a4d213032e116fdc

üß™ View experiment at: http://localhost:5555/#/experiments/7                      

 70%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà   | 14/20 [00:07<00:03,  1.98trial/s, best loss: -0.6990719003179975]




üèÉ View run angry-shrimp-592 at: http://localhost:5555/#/experiments/7/runs/f59b122a8443448eb33e035f5e99be59

üß™ View experiment at: http://localhost:5555/#/experiments/7                      

 75%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå  | 15/20 [00:08<00:02,  2.01trial/s, best loss: -0.6990719003179975]




üèÉ View run auspicious-kite-219 at: http://localhost:5555/#/experiments/7/runs/4455b099902041c3ab8740ee60915b7e

üß™ View experiment at: http://localhost:5555/#/experiments/7                      

 80%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà  | 16/20 [00:08<00:01,  2.03trial/s, best loss: -0.6990719003179975]




üèÉ View run unequaled-doe-364 at: http://localhost:5555/#/experiments/7/runs/dbd9698b1b464e1e86c9effaf3718265

üß™ View experiment at: http://localhost:5555/#/experiments/7                      

 85%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå | 17/20 [00:09<00:01,  2.00trial/s, best loss: -0.6990719003179975]




üèÉ View run valuable-shrew-963 at: http://localhost:5555/#/experiments/7/runs/21af7217d40a494c92eff029d9b8bf99

üß™ View experiment at: http://localhost:5555/#/experiments/7                      

 90%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà | 18/20 [00:09<00:00,  2.02trial/s, best loss: -0.6990719003179975]




üèÉ View run casual-moth-717 at: http://localhost:5555/#/experiments/7/runs/baada54ce482454a8c832603267bad18

üß™ View experiment at: http://localhost:5555/#/experiments/7                      

 95%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå| 19/20 [00:10<00:00,  2.00trial/s, best loss: -0.6990719003179975]




üèÉ View run resilient-snail-131 at: http://localhost:5555/#/experiments/7/runs/d58263d8be26452a84fc12073f8b482c

üß™ View experiment at: http://localhost:5555/#/experiments/7                      

100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 20/20 [00:10<00:00,  1.88trial/s, best loss: -0.6990719003179975]
=== Optimisation XGBClassifier ===
üèÉ View run loud-midge-856 at: http://localhost:5555/#/experiments/7/runs/f7b1981c1d824c60a5f3fcd78486b7d3

üß™ View experiment at: http://localhost:5555/#/experiments/7

üèÉ View run burly-conch-639 at: http://localhost:5555/#/experiments/7/runs/af63216edfb34a8fae21db6f4fbbdfda

üß™ View experiment at: http://localhost:5555/#/experiments/7                     

üèÉ View run abundant-dog-321 at: http://localhost:5555/#/experiments/7/runs/5df5a7ba92a64b8fbec7cd30f054cdf2

üß™ View experiment at: http://localhost:5555/#/experiments/7                     

üèÉ View run awesome-pig-869 at: http://localhost:5555/#/experiments/7/runs/73732237977f4d53b





üèÉ View run wistful-moth-602 at: http://localhost:5555/#/experiments/7/runs/3c545acbdf5d439dbf36af215027b4b7

üß™ View experiment at: http://localhost:5555/#/experiments/7

[LightGBM] [Info] Number of positive: 50659, number of negative: 40282           
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.002271 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 826                                                 
[LightGBM] [Info] Number of data points in the train set: 90941, number of used features: 154
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.557053 -> initscore=0.229212   
[LightGBM] [Info] Start training from score 0.229212                             
  5%|‚ñå         | 1/20 [00:04<01:30,  4.76s/trial, best loss: -0.7193749706601728]





üèÉ View run lyrical-auk-925 at: http://localhost:5555/#/experiments/7/runs/804381c0b2e148918df2f89f27ad03e0

üß™ View experiment at: http://localhost:5555/#/experiments/7                     

[LightGBM] [Info] Number of positive: 50659, number of negative: 40282           
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.002172 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 826                                                 
[LightGBM] [Info] Number of data points in the train set: 90941, number of used features: 154
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.557053 -> initscore=0.229212   
[LightGBM] [Info] Start training from score 0.229212                             
 10%|‚ñà         | 2/20 [00:09<01:23,  4.64s/trial, best loss: -0.7193749706601728]





üèÉ View run enthused-sow-370 at: http://localhost:5555/#/experiments/7/runs/f3bcd5c7aaa24380ade5b672dacea5ea

üß™ View experiment at: http://localhost:5555/#/experiments/7                     

[LightGBM] [Info] Number of positive: 50659, number of negative: 40282           
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.002801 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 826                                                
[LightGBM] [Info] Number of data points in the train set: 90941, number of used features: 154
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.557053 -> initscore=0.229212  
[LightGBM] [Info] Start training from score 0.229212                            
 15%|‚ñà‚ñå        | 3/20 [00:12<01:08,  4.01s/trial, best loss: -0.724539654385693]





üèÉ View run traveling-flea-48 at: http://localhost:5555/#/experiments/7/runs/6ec7357e83854ec29fb411d74b04d66c

üß™ View experiment at: http://localhost:5555/#/experiments/7                    

[LightGBM] [Info] Number of positive: 50659, number of negative: 40282          
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.002747 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 826                                                
[LightGBM] [Info] Number of data points in the train set: 90941, number of used features: 154
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.557053 -> initscore=0.229212  
[LightGBM] [Info] Start training from score 0.229212                            
 20%|‚ñà‚ñà        | 4/20 [00:16<01:04,  4.00s/trial, best loss: -0.724539654385693]





üèÉ View run capricious-skunk-194 at: http://localhost:5555/#/experiments/7/runs/6a8a377281474c6aa9f1c3d4fa417900

üß™ View experiment at: http://localhost:5555/#/experiments/7                    

[LightGBM] [Info] Number of positive: 50659, number of negative: 40282          
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.003090 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 826                                                
[LightGBM] [Info] Number of data points in the train set: 90941, number of used features: 154
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.557053 -> initscore=0.229212  
[LightGBM] [Info] Start training from score 0.229212                            
 25%|‚ñà‚ñà‚ñå       | 5/20 [00:19<00:55,  3.67s/trial, best loss: -0.724539654385693]





üèÉ View run puzzled-finch-223 at: http://localhost:5555/#/experiments/7/runs/6d9be59bcbc84b14b8721faeacb23359

üß™ View experiment at: http://localhost:5555/#/experiments/7                    

[LightGBM] [Info] Number of positive: 50659, number of negative: 40282          
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.003475 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 826                                                
[LightGBM] [Info] Number of data points in the train set: 90941, number of used features: 154
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.557053 -> initscore=0.229212  
[LightGBM] [Info] Start training from score 0.229212                            
 30%|‚ñà‚ñà‚ñà       | 6/20 [00:23<00:51,  3.70s/trial, best loss: -0.724539654385693]





üèÉ View run colorful-grub-530 at: http://localhost:5555/#/experiments/7/runs/6867131d1d6a47d4af5a830efd46061e

üß™ View experiment at: http://localhost:5555/#/experiments/7                    

[LightGBM] [Info] Number of positive: 50659, number of negative: 40282          
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.002622 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 826                                                
[LightGBM] [Info] Number of data points in the train set: 90941, number of used features: 154
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.557053 -> initscore=0.229212  
[LightGBM] [Info] Start training from score 0.229212                            
 35%|‚ñà‚ñà‚ñà‚ñå      | 7/20 [00:27<00:48,  3.73s/trial, best loss: -0.724539654385693]





üèÉ View run big-vole-319 at: http://localhost:5555/#/experiments/7/runs/d9ef66f53aa9401798562f274ea04dd4

üß™ View experiment at: http://localhost:5555/#/experiments/7                    

[LightGBM] [Info] Number of positive: 50659, number of negative: 40282          
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.002838 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 826                                                
[LightGBM] [Info] Number of data points in the train set: 90941, number of used features: 154
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.557053 -> initscore=0.229212  
[LightGBM] [Info] Start training from score 0.229212                            
 40%|‚ñà‚ñà‚ñà‚ñà      | 8/20 [00:31<00:44,  3.75s/trial, best loss: -0.724539654385693]





üèÉ View run popular-swan-914 at: http://localhost:5555/#/experiments/7/runs/8a61e7537691458e9e160987ee84a5b4

üß™ View experiment at: http://localhost:5555/#/experiments/7                    

[LightGBM] [Info] Number of positive: 50659, number of negative: 40282          
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.003072 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 826                                                
[LightGBM] [Info] Number of data points in the train set: 90941, number of used features: 154
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.557053 -> initscore=0.229212  
[LightGBM] [Info] Start training from score 0.229212                            
 45%|‚ñà‚ñà‚ñà‚ñà‚ñå     | 9/20 [00:35<00:42,  3.89s/trial, best loss: -0.724539654385693]





üèÉ View run agreeable-swan-176 at: http://localhost:5555/#/experiments/7/runs/ae344c86baf74549917837aaf5027f09

üß™ View experiment at: http://localhost:5555/#/experiments/7                    

[LightGBM] [Info] Number of positive: 50659, number of negative: 40282           
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.002900 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 826                                                 
[LightGBM] [Info] Number of data points in the train set: 90941, number of used features: 154
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.557053 -> initscore=0.229212   
[LightGBM] [Info] Start training from score 0.229212                             
 50%|‚ñà‚ñà‚ñà‚ñà‚ñà     | 10/20 [00:40<00:42,  4.21s/trial, best loss: -0.724539654385693]





üèÉ View run spiffy-jay-31 at: http://localhost:5555/#/experiments/7/runs/eac4a4433be4454e986353e23e5ed767

üß™ View experiment at: http://localhost:5555/#/experiments/7                     

[LightGBM] [Info] Number of positive: 50659, number of negative: 40282           
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.003508 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 826                                                 
[LightGBM] [Info] Number of data points in the train set: 90941, number of used features: 154
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.557053 -> initscore=0.229212   
[LightGBM] [Info] Start training from score 0.229212                             
 55%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå    | 11/20 [00:44<00:37,  4.11s/trial, best loss: -0.724539654385693]





üèÉ View run efficient-foal-694 at: http://localhost:5555/#/experiments/7/runs/e17301c5219541d4897a934991d52440

üß™ View experiment at: http://localhost:5555/#/experiments/7                     

[LightGBM] [Info] Number of positive: 50659, number of negative: 40282           
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.002216 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 826                                                 
[LightGBM] [Info] Number of data points in the train set: 90941, number of used features: 154
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.557053 -> initscore=0.229212   
[LightGBM] [Info] Start training from score 0.229212                             
 60%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà    | 12/20 [00:48<00:32,  4.08s/trial, best loss: -0.724539654385693]





üèÉ View run traveling-kite-643 at: http://localhost:5555/#/experiments/7/runs/e231eeb815154145adde7c4945e98f54

üß™ View experiment at: http://localhost:5555/#/experiments/7                     

[LightGBM] [Info] Number of positive: 50659, number of negative: 40282           
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.002707 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 826                                                 
[LightGBM] [Info] Number of data points in the train set: 90941, number of used features: 154
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.557053 -> initscore=0.229212   
[LightGBM] [Info] Start training from score 0.229212                             
 65%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå   | 13/20 [00:52<00:28,  4.05s/trial, best loss: -0.724539654385693]





üèÉ View run dashing-shrimp-359 at: http://localhost:5555/#/experiments/7/runs/f92d7e3c4f794a1c81beaf80101fd414

üß™ View experiment at: http://localhost:5555/#/experiments/7                     

[LightGBM] [Info] Number of positive: 50659, number of negative: 40282           
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.005969 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 826                                                 
[LightGBM] [Info] Number of data points in the train set: 90941, number of used features: 154
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.557053 -> initscore=0.229212   
[LightGBM] [Info] Start training from score 0.229212                             
 70%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà   | 14/20 [00:56<00:24,  4.06s/trial, best loss: -0.724539654385693]





üèÉ View run mysterious-croc-198 at: http://localhost:5555/#/experiments/7/runs/d612a623ed4d4a5ab6571585b0aeb8aa

üß™ View experiment at: http://localhost:5555/#/experiments/7                     

[LightGBM] [Info] Number of positive: 50659, number of negative: 40282           
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.002921 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 826                                                 
[LightGBM] [Info] Number of data points in the train set: 90941, number of used features: 154
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.557053 -> initscore=0.229212   
[LightGBM] [Info] Start training from score 0.229212                             
 75%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå  | 15/20 [00:59<00:19,  3.92s/trial, best loss: -0.724539654385693]





üèÉ View run able-moose-750 at: http://localhost:5555/#/experiments/7/runs/4b2ac559bd474916b50ea5ab57db185e

üß™ View experiment at: http://localhost:5555/#/experiments/7                     

[LightGBM] [Info] Number of positive: 50659, number of negative: 40282           
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.002864 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 826                                                 
[LightGBM] [Info] Number of data points in the train set: 90941, number of used features: 154
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.557053 -> initscore=0.229212   
[LightGBM] [Info] Start training from score 0.229212                             
 80%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà  | 16/20 [01:03<00:15,  3.94s/trial, best loss: -0.724539654385693]





üèÉ View run loud-bug-599 at: http://localhost:5555/#/experiments/7/runs/ebb998dd076c42ae91a7ecde66e8e075

üß™ View experiment at: http://localhost:5555/#/experiments/7                     

[LightGBM] [Info] Number of positive: 50659, number of negative: 40282           
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.003516 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 826                                                 
[LightGBM] [Info] Number of data points in the train set: 90941, number of used features: 154
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.557053 -> initscore=0.229212   
[LightGBM] [Info] Start training from score 0.229212                             
 85%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå | 17/20 [01:07<00:11,  3.95s/trial, best loss: -0.724539654385693]





üèÉ View run mercurial-rat-647 at: http://localhost:5555/#/experiments/7/runs/bebf38c152954a11917f3eca2752bd76

üß™ View experiment at: http://localhost:5555/#/experiments/7                     

[LightGBM] [Info] Number of positive: 50659, number of negative: 40282           
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.002923 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 826                                                 
[LightGBM] [Info] Number of data points in the train set: 90941, number of used features: 154
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.557053 -> initscore=0.229212   
[LightGBM] [Info] Start training from score 0.229212                             
 90%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà | 18/20 [01:11<00:07,  3.94s/trial, best loss: -0.724539654385693]





üèÉ View run skillful-yak-732 at: http://localhost:5555/#/experiments/7/runs/4a95a6193f064ee498e0a1e1cc7de96e

üß™ View experiment at: http://localhost:5555/#/experiments/7                     

[LightGBM] [Info] Number of positive: 50659, number of negative: 40282           
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.002606 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 826                                                 
[LightGBM] [Info] Number of data points in the train set: 90941, number of used features: 154
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.557053 -> initscore=0.229212   
[LightGBM] [Info] Start training from score 0.229212                             
 95%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå| 19/20 [01:15<00:03,  3.83s/trial, best loss: -0.724539654385693]





üèÉ View run hilarious-bat-819 at: http://localhost:5555/#/experiments/7/runs/fbea00aac81a4826adcde001a0fad7da

üß™ View experiment at: http://localhost:5555/#/experiments/7                     

100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 20/20 [01:21<00:00,  4.05s/trial, best loss: -0.724539654385693]
=== Optimisation CatBoostClassifier ===
0:	learn: 0.6822086	total: 73.8ms	remaining: 36.8s    

1:	learn: 0.6726672	total: 92.4ms	remaining: 23s      

2:	learn: 0.6644129	total: 111ms	remaining: 18.5s     

3:	learn: 0.6567553	total: 131ms	remaining: 16.2s     

4:	learn: 0.6504622	total: 149ms	remaining: 14.8s     

5:	learn: 0.6444113	total: 168ms	remaining: 13.9s     

6:	learn: 0.6387959	total: 187ms	remaining: 13.2s     

7:	learn: 0.6340856	total: 206ms	remaining: 12.7s     

8:	learn: 0.6300859	total: 225ms	remaining: 12.3s     

9:	learn: 0.6262142	total: 244ms	remaining: 12s       

10:	learn: 0.6230159	total: 264ms	remaining: 11.7s    

11:	learn: 0.6201667	total: 285ms	remaining:





üèÉ View run casual-dove-172 at: http://localhost:5555/#/experiments/7/runs/561bc866c9c24f4aa0995f7d0c949de6

üß™ View experiment at: http://localhost:5555/#/experiments/7

[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.022333 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 826                                                 
[LightGBM] [Info] Number of data points in the train set: 90941, number of used features: 154
  5%|‚ñå         | 1/20 [00:04<01:17,  4.08s/trial, best loss: -0.8548327070517018]





üèÉ View run serious-doe-14 at: http://localhost:5555/#/experiments/7/runs/5330ca2ffd594cceb4036ae458bc0926

üß™ View experiment at: http://localhost:5555/#/experiments/7                     

[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.007761 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 826                                                 
[LightGBM] [Info] Number of data points in the train set: 90941, number of used features: 154
 10%|‚ñà         | 2/20 [00:08<01:14,  4.13s/trial, best loss: -0.8573883922378597]





üèÉ View run funny-auk-706 at: http://localhost:5555/#/experiments/7/runs/1d15dba734e94b28b2c725d830ed5c8c

üß™ View experiment at: http://localhost:5555/#/experiments/7                     

[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.002964 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 826                                                 
[LightGBM] [Info] Number of data points in the train set: 90941, number of used features: 154
 15%|‚ñà‚ñå        | 3/20 [00:12<01:11,  4.20s/trial, best loss: -0.8573883922378597]





üèÉ View run shivering-crab-925 at: http://localhost:5555/#/experiments/7/runs/e87cde8358f444178155fb2534a96d69

üß™ View experiment at: http://localhost:5555/#/experiments/7                     

[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.002665 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 826                                                 
[LightGBM] [Info] Number of data points in the train set: 90941, number of used features: 154
 20%|‚ñà‚ñà        | 4/20 [00:16<01:02,  3.91s/trial, best loss: -0.8573883922378597]





üèÉ View run flawless-bird-183 at: http://localhost:5555/#/experiments/7/runs/2e15294353bd46e69bf5fa0517e4c398

üß™ View experiment at: http://localhost:5555/#/experiments/7                     

[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.002978 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 826                                                 
[LightGBM] [Info] Number of data points in the train set: 90941, number of used features: 154
 25%|‚ñà‚ñà‚ñå       | 5/20 [00:19<00:56,  3.75s/trial, best loss: -0.8573883922378597]





üèÉ View run likeable-skunk-201 at: http://localhost:5555/#/experiments/7/runs/67e5272b86be48ed86e0b39f1244591e

üß™ View experiment at: http://localhost:5555/#/experiments/7                     

[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.013355 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 826                                                 
[LightGBM] [Info] Number of data points in the train set: 90941, number of used features: 154
 30%|‚ñà‚ñà‚ñà       | 6/20 [00:23<00:51,  3.68s/trial, best loss: -0.8573883922378597]





üèÉ View run tasteful-hound-808 at: http://localhost:5555/#/experiments/7/runs/5d3fb8bac9f3411e991bbfb451c4ab9e

üß™ View experiment at: http://localhost:5555/#/experiments/7                     

[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.011085 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 826                                                 
[LightGBM] [Info] Number of data points in the train set: 90941, number of used features: 154
 35%|‚ñà‚ñà‚ñà‚ñå      | 7/20 [00:35<01:24,  6.51s/trial, best loss: -0.8573883922378597]





üèÉ View run auspicious-gnu-48 at: http://localhost:5555/#/experiments/7/runs/5bf09d39223a486892e0c7fc5cbb27a2

üß™ View experiment at: http://localhost:5555/#/experiments/7                     

[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.006086 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 826                                                 
[LightGBM] [Info] Number of data points in the train set: 90941, number of used features: 154
 40%|‚ñà‚ñà‚ñà‚ñà      | 8/20 [00:41<01:17,  6.43s/trial, best loss: -0.8573883922378597]





üèÉ View run capable-hawk-518 at: http://localhost:5555/#/experiments/7/runs/4d8a67b526fa42b0b7c36a56a8c3cebf

üß™ View experiment at: http://localhost:5555/#/experiments/7                     

[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.006688 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 826                                                 
[LightGBM] [Info] Number of data points in the train set: 90941, number of used features: 154
 45%|‚ñà‚ñà‚ñà‚ñà‚ñå     | 9/20 [00:54<01:30,  8.26s/trial, best loss: -0.8573883922378597]





üèÉ View run bold-lynx-603 at: http://localhost:5555/#/experiments/7/runs/03d7e2c2c2614fb896cb4a53b75c2c72

üß™ View experiment at: http://localhost:5555/#/experiments/7                     

[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.005925 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 826                                                  
[LightGBM] [Info] Number of data points in the train set: 90941, number of used features: 154
 50%|‚ñà‚ñà‚ñà‚ñà‚ñà     | 10/20 [00:59<01:12,  7.24s/trial, best loss: -0.8573883922378597]





üèÉ View run adaptable-yak-459 at: http://localhost:5555/#/experiments/7/runs/30876b3b7ba34eb6a96d77d2039c6c5e

üß™ View experiment at: http://localhost:5555/#/experiments/7                      

[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.016753 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 826                                                  
[LightGBM] [Info] Number of data points in the train set: 90941, number of used features: 154
 55%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå    | 11/20 [01:05<01:03,  7.09s/trial, best loss: -0.8573883922378597]





üèÉ View run chill-kit-632 at: http://localhost:5555/#/experiments/7/runs/b35083f732154818b8df57d83efa125a

üß™ View experiment at: http://localhost:5555/#/experiments/7                      

[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.014265 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 826                                                  
[LightGBM] [Info] Number of data points in the train set: 90941, number of used features: 154
 60%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà    | 12/20 [01:17<01:07,  8.47s/trial, best loss: -0.8573883922378597]





üèÉ View run enthused-doe-492 at: http://localhost:5555/#/experiments/7/runs/e788c489114a44e8843905b9f1c804ae

üß™ View experiment at: http://localhost:5555/#/experiments/7                      

[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.009054 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 826                                                  
[LightGBM] [Info] Number of data points in the train set: 90941, number of used features: 154
 65%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå   | 13/20 [01:24<00:56,  8.01s/trial, best loss: -0.8573883922378597]





üèÉ View run learned-steed-367 at: http://localhost:5555/#/experiments/7/runs/1c3da1a2e73c45ce8a3f9b0e6d3fa6cc

üß™ View experiment at: http://localhost:5555/#/experiments/7                      

[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.006081 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 826                                                  
[LightGBM] [Info] Number of data points in the train set: 90941, number of used features: 154
 70%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà   | 14/20 [01:31<00:46,  7.73s/trial, best loss: -0.8573883922378597]





üèÉ View run legendary-tern-828 at: http://localhost:5555/#/experiments/7/runs/0c4199c8d7dc428e9f374aaf98d5c9e7

üß™ View experiment at: http://localhost:5555/#/experiments/7                      

[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.019894 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 826                                                  
[LightGBM] [Info] Number of data points in the train set: 90941, number of used features: 154
 75%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå  | 15/20 [01:38<00:36,  7.31s/trial, best loss: -0.8573883922378597]





üèÉ View run learned-moth-345 at: http://localhost:5555/#/experiments/7/runs/a088af5b33d84309b632ce596b9d5b41

üß™ View experiment at: http://localhost:5555/#/experiments/7                      

[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.003064 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 826                                                  
[LightGBM] [Info] Number of data points in the train set: 90941, number of used features: 154
 80%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà  | 16/20 [01:42<00:26,  6.64s/trial, best loss: -0.8573883922378597]





üèÉ View run clumsy-midge-190 at: http://localhost:5555/#/experiments/7/runs/e196f67b408948cd88f2e45679461551

üß™ View experiment at: http://localhost:5555/#/experiments/7                      

[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.004216 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 826                                                  
[LightGBM] [Info] Number of data points in the train set: 90941, number of used features: 154
 85%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå | 17/20 [01:46<00:17,  5.78s/trial, best loss: -0.8573883922378597]





üèÉ View run secretive-ape-612 at: http://localhost:5555/#/experiments/7/runs/1d55f3ce4a104f3cbaa7716234813ae2

üß™ View experiment at: http://localhost:5555/#/experiments/7                      

[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.002834 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 826                                                  
[LightGBM] [Info] Number of data points in the train set: 90941, number of used features: 154
 90%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà | 18/20 [01:50<00:10,  5.06s/trial, best loss: -0.8573883922378597]





üèÉ View run honorable-smelt-850 at: http://localhost:5555/#/experiments/7/runs/5910e6f65ad645ffb5716d529b2608d4

üß™ View experiment at: http://localhost:5555/#/experiments/7                      

[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.002779 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 826                                                  
[LightGBM] [Info] Number of data points in the train set: 90941, number of used features: 154
 95%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå| 19/20 [01:53<00:04,  4.53s/trial, best loss: -0.8573883922378597]





üèÉ View run valuable-conch-790 at: http://localhost:5555/#/experiments/7/runs/46ad7c66a9484c7a84cf379747c88287

üß™ View experiment at: http://localhost:5555/#/experiments/7                      

100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 20/20 [01:56<00:00,  5.83s/trial, best loss: -0.8573883922378597]


In [None]:
from sklearn.linear_model import LogisticRegression
from xgboost import XGBClassifier
from lightgbm import LGBMClassifier, LGBMRanker
from catboost import CatBoostClassifier

# -------------------------------------------------------------------
# Pourquoi tester ces algorithmes pour la recommandation de produits
# en cas de rupture de stock ?
#
# L‚Äôobjectif est de pr√©dire ou de classer les meilleurs produits de
# substitution √† proposer √† un client lorsqu‚Äôun article est indisponible.
# Nous testons plusieurs familles de mod√®les afin de comparer :
#   - la capacit√© de g√©n√©ralisation,
#   - la performance en classification et en ranking,
#   - la robustesse face √† des features h√©t√©rog√®nes (prix, cat√©gorie,
#     similarit√©, contexte client, etc.).
#
# 1) Logistic Regression
#    - Sert de mod√®le baseline simple et interpr√©table.
#    - Permet de v√©rifier que les features contiennent bien un signal
#      pr√©dictif (sanity check).
#    - Facilite l‚Äôanalyse des coefficients et la compr√©hension m√©tier.
#
# 2) XGBClassifier (XGBoost)
#    - Mod√®le de gradient boosting tr√®s performant sur donn√©es tabulaires.
#    - Capable de capturer des relations non lin√©aires complexes entre
#      produits et contexte client.
#    - Sert de r√©f√©rence "haut niveau" en classification supervis√©e.
#
# 3) LGBMClassifier (LightGBM)
#    - Alternative plus rapide et scalable au XGBoost.
#    - Tr√®s efficace sur de grands volumes de donn√©es et de nombreuses
#      features.
#    - Utilis√© ici pour pr√©dire la probabilit√© qu‚Äôun produit substitut
#      soit accept√© par le client.
#
# 4) LGBMRanker (LightGBM - Learning to Rank)
#    - Mod√®le sp√©cifiquement con√ßu pour les probl√®mes de ranking.
#    - Permet de classer plusieurs produits candidats pour une m√™me
#      rupture de stock et de s√©lectionner le meilleur (Top-1 ou Top-K).
#    - Particuli√®rement adapt√© aux syst√®mes de recommandation.
#
# 5) CatBoostClassifier
#    - Mod√®le de boosting optimis√© pour les variables cat√©gorielles.
#    - R√©duit le besoin de preprocessing (encodage) des cat√©gories.
#    - Souvent tr√®s performant dans les contextes e-commerce avec
#      cat√©gories, marques et attributs produits.
#
# Cette approche multi-mod√®les permet d‚Äôidentifier le meilleur compromis
# entre performance, interpr√©tabilit√© et robustesse pour le moteur de
# recommandation de produits de substitution.
# -------------------------------------------------------------------


# D√©finir les mod√®les et leurs grilles de param√®tres
models = {

    # =========================
    # Logistic Regression (Baseline)
    # =========================
    "LogReg": {
        "model_class": LogisticRegression,
        "param_grid": {
            "C": [0.1, 1.0, 10.0],
            "penalty": ["l2"],
        },
        "fixed_params": {
            "solver": "lbfgs",
            "max_iter": 2000,
            "n_jobs": -1,
            "random_state": 42,
        },
    },

    # =========================
    # XGBoost Classifier
    # =========================
    "XGBClassifier": {
        "model_class": XGBClassifier,
        "param_grid": {
            "n_estimators": [500, 1000],
            "max_depth": [4, 6, 8],
            "learning_rate": [0.03, 0.05, 0.1],
            "subsample": [0.7, 0.9, 1.0],
            "colsample_bytree": [0.7, 0.9, 1.0],
            "min_child_weight": [1, 5, 10],
            "reg_alpha": [0.0, 0.1, 1.0],
            "reg_lambda": [1.0, 2.0, 5.0],
        },
        "fixed_params": {
            "objective": "binary:logistic",
            "eval_metric": "auc",
            "tree_method": "hist",
            "random_state": 42,
            "n_jobs": -1,
        },
    },

    # =========================
    # LightGBM Classifier
    # =========================
    "LGBMClassifier": {
        "model_class": LGBMClassifier,
        "param_grid": {
            "num_leaves": [31, 63, 127],
            "learning_rate": [0.03, 0.05, 0.1],
            "n_estimators": [500, 1000],
            "min_child_samples": [20, 50, 100],
            "subsample": [0.7, 0.9, 1.0],
            "colsample_bytree": [0.7, 0.9, 1.0],
            "reg_alpha": [0.0, 0.1, 1.0],
            "reg_lambda": [0.0, 0.1, 1.0],
        },
        "fixed_params": {
            "objective": "binary",
            "metric": "auc",
            "random_state": 42,
            "n_jobs": -1,
        },
    },

    # =========================
    # LightGBM Ranker
    # =========================
    "LGBMRanker": {
        "model_class": LGBMRanker,
        "param_grid": {
            "objective": ["lambdarank"],
            "metric": ["ndcg"],
            "num_leaves": [31, 63, 127],
            "learning_rate": [0.03, 0.05, 0.1],
            "n_estimators": [500, 1000],
            "min_child_samples": [20, 50, 100],
            "subsample": [0.7, 0.9, 1.0],
            "colsample_bytree": [0.7, 0.9, 1.0],
        },
        "fixed_params": {
            "random_state": 42,
            "n_jobs": -1,
        },
    },

    # =========================
    # CatBoost Classifier
    # =========================
    "CatBoostClassifier": {
        "model_class": CatBoostClassifier,
        "param_grid": {
            "depth": [6, 8, 10],
            "learning_rate": [0.03, 0.05, 0.1],
            "iterations": [500, 1000],
            "l2_leaf_reg": [1, 3, 5, 9],
            "subsample": [0.7, 0.9, 1.0],
            "rsm": [0.7, 0.9, 1.0],
        },
        "fixed_params": {
            "loss_function": "Logloss",
            "eval_metric": "AUC",
            "random_seed": 42,
            "verbose": 0,
        },
    },
}



In [None]:
import numpy as np
from sklearn.metrics import (
    roc_auc_score, average_precision_score, log_loss,
    precision_score, recall_score, f1_score
)
# M√©triques ranking simples (binaire ou grades) 
def dcg_at_k(rels, k):
    rels = np.asarray(rels)[:k]
    if rels.size == 0:
        return 0.0
    discounts = 1.0 / np.log2(np.arange(2, rels.size + 2))
    return float(np.sum(rels * discounts))

def ndcg_at_k(rels, k):
    dcg = dcg_at_k(rels, k)
    ideal = dcg_at_k(sorted(rels, reverse=True), k)
    return 0.0 if ideal == 0 else float(dcg / ideal)

def mrr_at_k(rels, k):
    rels = np.asarray(rels)[:k]
    hits = np.where(rels > 0)[0]
    return 0.0 if hits.size == 0 else float(1.0 / (hits[0] + 1))

def hit_rate_at_k(rels, k):
    rels = np.asarray(rels)[:k]
    return float(np.any(rels > 0))

def precision_at_k(rels, k):
    rels = np.asarray(rels)[:k]
    return float(np.mean(rels > 0)) if rels.size else 0.0

def recall_at_k(rels, k):
    rels = np.asarray(rels)
    total_pos = np.sum(rels > 0)
    if total_pos == 0:
        return 0.0
    return float(np.sum(rels[:k] > 0) / total_pos)