## Created by <a href="https://github.com/yunsuxiaozi">yunsuxiaozi</a> 2025/02/07

As a beginner friendly notebook, I will use as many text descriptions as possible to provide detailed explanations of my ideas.

# <span><h1 style = "font-family: garamond; font-size: 40px; font-style: normal; letter-spcaing: 3px; background-color: #f6f5f5; color :#fe346e; border-radius: 100px 100px; text-align:center">1.Import Libraries</h1></span>

Due to the continuous updates of the <a href="https://github.com/yunsuxiaozi/Yunbase">Yunbase</a> GitHub repository, a static version is used <a href="https://www.kaggle.com/code/yunsuxiaozi/yunbase">here</a>.We are using version 16 here.

This is a framework I developed for the convenience of playing competitions and avoiding writing duplicate code. You can simply understand it as automl, although its current performance is not as good as automl.

In [1]:
source_file_path = '/kaggle/input/yunbase/Yunbase/baseline.py'
target_file_path = '/kaggle/working/baseline.py'
with open(source_file_path, 'r', encoding='utf-8') as file:
    content = file.read()
with open(target_file_path, 'w', encoding='utf-8') as file:
    file.write(content)

In [2]:
!pip install -q --requirement /kaggle/input/yunbase/Yunbase/requirements.txt  \
--no-index --find-links file:/kaggle/input/yunbase/

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
cuml 24.8.0 requires cupy-cuda11x>=12.0.0, which is not installed.
cesium 0.12.3 requires numpy<3.0,>=2.0, but you have numpy 1.26.4 which is incompatible.
libpysal 4.9.2 requires packaging>=22, but you have packaging 21.3 which is incompatible.
libpysal 4.9.2 requires shapely>=2.0.1, but you have shapely 1.8.5.post1 which is incompatible.
tsfresh 0.20.3 requires scipy>=1.14.0; python_version >= "3.10", but you have scipy 1.13.1 which is incompatible.[0m[31m
[0m

In [3]:
from baseline import Yunbase
import polars as pl#similar to pandas, but with better performance when dealing with large datasets.
import pandas as pd#read csv,parquet
import numpy as np#for scientific computation of matrices
from datetime import datetime, timedelta
from catboost import CatBoostRegressor
from xgboost import XGBRegressor
import gc#rubbish collection
import warnings#avoid some negligible errors
#The filterwarnings () method is used to set warning filters, which can control the output method and level of warning information.
warnings.filterwarnings('ignore')

import random#provide some function to generate random_seed.
#set random seed,to make sure model can be recurrented.
def seed_everything(seed):
    np.random.seed(seed)#numpy's random seed
    random.seed(seed)#python built-in random seed
seed_everything(seed=2025)

# <span><h1 style = "font-family: garamond; font-size: 40px; font-style: normal; letter-spcaing: 3px; background-color: #f6f5f5; color :#fe346e; border-radius: 100px 100px; text-align:center">2.Load Data</h1></span>

<a href="https://www.kaggle.com/code/yunsuxiaozi/rohlik-top1-solution/notebook">Here</a> is top1 solution in last competition. The feature engineering here refers to the top 1 solution.

The main task here is to read and concatenate the dataset, while also filling in some holidays that are not included in the calendar.

In order to ensure consistency between CV and LB,we only choose unique_id in test_data as train_data for model training.

In [4]:
print("< load dataset >")
train=pd.read_csv("/kaggle/input/rohlik-sales-forecasting-challenge-v2/sales_train.csv")
print(f"train.shape:{train.shape}")
test=pd.read_csv("/kaggle/input/rohlik-sales-forecasting-challenge-v2/sales_test.csv")
print(f"test.shape:{test.shape}")

print("< only use unique_id in testset >")
test_id=test['unique_id'].unique()
train=train[train['unique_id'].isin(test_id)]

print("< fill missing holiday >")
calendar=pd.read_csv("/kaggle/input/rohlik-sales-forecasting-challenge-v2/calendar.csv")
czech_holiday = [ # Prague
    (['03/31/2024', '04/09/2023', '04/17/2022', '04/04/2021', '04/12/2020'], 'Easter Day'),#loss
    (['05/12/2024', '05/10/2020', '05/09/2021', '05/08/2022', '05/14/2023'], "Mother Day"), #loss
]
brno_holiday = [ # Brno
    (['03/31/2024', '04/09/2023', '04/17/2022', '04/04/2021', '04/12/2020'], 'Easter Day'),#loss
    (['05/12/2024', '05/10/2020', '05/09/2021', '05/08/2022', '05/14/2023'], "Mother Day"), #loss
]

budapest_holidays = []
# Bavaria - Munich
munich_holidays = [
    (['03/30/2024', '04/08/2023', '04/16/2022', '04/03/2021'], 'Holy Saturday'),#loss
    (['05/12/2024', '05/14/2023', '05/08/2022', '05/09/2021'], 'Mother Day'),#loss
]

# Hesse - Frankfurt
frank_holidays = [
    (['03/30/2024', '04/08/2023', '04/16/2022', '04/03/2021'], 'Holy Saturday'),#loss
    (['05/12/2024', '05/14/2023', '05/08/2022', '05/09/2021'], 'Mother Day'),#loss
]

def fill_loss_holidays(df_fill, warehouses, holidays):
    df = df_fill.copy()
    for item in holidays:
        dates, holiday_name = item
        generated_dates = [datetime.strptime(date, '%m/%d/%Y').strftime('%Y-%m-%d') for date in dates]
        for generated_date in generated_dates:
            df.loc[(df['warehouse'].isin(warehouses)) & (df['date'] == generated_date), 'holiday'] = 1
            df.loc[(df['warehouse'].isin(warehouses)) & (df['date'] == generated_date), 'holiday_name'] = holiday_name
    #add features
    df['long_weekend'] = ((df['shops_closed'] == 1) & (df['shops_closed'].shift(1) == 1)).astype(np.int8)
    
    return df

calendar = fill_loss_holidays(df_fill=calendar, warehouses=['Prague_1', 'Prague_2', 'Prague_3'], holidays=czech_holiday)
calendar = fill_loss_holidays(df_fill=calendar, warehouses=['Brno_1'], holidays=brno_holiday)
calendar = fill_loss_holidays(df_fill=calendar, warehouses=['Munich_1'], holidays=munich_holidays)
calendar = fill_loss_holidays(df_fill=calendar, warehouses=['Frankfurt_1'], holidays=frank_holidays)
calendar = fill_loss_holidays(df_fill=calendar, warehouses=['Budapest_1'], holidays=budapest_holidays)
print(f"calendar.shape:{calendar.shape}")

print("< merge dataset >")
train=train.merge(calendar,on=['warehouse','date'],how='left')
test=test.merge(calendar,on=['warehouse','date'],how='left')

weights=pd.read_csv("/kaggle/input/rohlik-sales-forecasting-challenge-v2/test_weights.csv")
train=train.merge(weights,on='unique_id',how='left')
print(f"weights.shape:{weights.shape}")

inventory=pd.read_csv("/kaggle/input/rohlik-sales-forecasting-challenge-v2/inventory.csv")
train=train.merge(inventory,on=['warehouse','unique_id'],how='left')
test=test.merge(inventory,on=['warehouse','unique_id'],how='left')

train=train[train['date']>='2021-06-01']
train.head()

< load dataset >
train.shape:(4007419, 14)
test.shape:(47021, 12)
< only use unique_id in testset >
< fill missing holiday >
calendar.shape:(23016, 8)
< merge dataset >
weights.shape:(5390, 2)


Unnamed: 0,unique_id,date,warehouse,total_orders,sales,sell_price_main,availability,type_0_discount,type_1_discount,type_2_discount,...,winter_school_holidays,school_holidays,long_weekend,weight,product_unique_id,name,L1_category_name_en,L2_category_name_en,L3_category_name_en,L4_category_name_en
0,4845,2024-03-10,Budapest_1,6436.0,16.34,646.26,1.0,0.0,0.0,0.0,...,0,0,0,1.925596,2375,Croissant_35,Bakery,Bakery_L2_18,Bakery_L3_83,Bakery_L4_1
2,4845,2021-12-20,Budapest_1,6507.0,34.55,455.96,1.0,0.0,0.0,0.0,...,0,0,0,1.925596,2375,Croissant_35,Bakery,Bakery_L2_18,Bakery_L3_83,Bakery_L4_1
3,4845,2023-04-29,Budapest_1,5463.0,34.52,646.26,0.96,0.20024,0.0,0.0,...,0,0,0,1.925596,2375,Croissant_35,Bakery,Bakery_L2_18,Bakery_L3_83,Bakery_L4_1
4,4845,2022-04-01,Budapest_1,5997.0,35.92,486.41,1.0,0.0,0.0,0.0,...,0,0,0,1.925596,2375,Croissant_35,Bakery,Bakery_L2_18,Bakery_L3_83,Bakery_L4_1
5,4845,2024-03-02,Budapest_1,6760.0,27.26,646.26,1.0,0.0,0.0,0.0,...,0,0,0,1.925596,2375,Croissant_35,Bakery,Bakery_L2_18,Bakery_L3_83,Bakery_L4_1


# <span><h1 style = "font-family: garamond; font-size: 40px; font-style: normal; letter-spcaing: 3px; background-color: #f6f5f5; color :#fe346e; border-radius: 100px 100px; text-align:center">3.Feature Engineer</h1></span>

Here,weekend is considered a holiday.I have conducted a more detailed classification of food and constructed shift and diff features.

We have done feature engineering on 'date' here, and also used the top 1 solution of Playground.<a href="https://www.kaggle.com/code/ivyzang/1st-place-solution-less-is-more/notebook">less is more</a>


The feature engineering here is quite messy. If you are interested, you can take a closer look.

In [5]:
print("< get weekend date >")
start_date_str = train['date'].min()
end_date_str = train['date'].max()
start_date = datetime.strptime(start_date_str, '%Y-%m-%d')
end_date = datetime.strptime(end_date_str, '%Y-%m-%d')
current_date = start_date
weekends = []
while current_date <= end_date:
    if current_date.weekday() == 5 or current_date.weekday() == 6:
        weekends.append(current_date.strftime('%Y-%m-%d'))
    current_date += timedelta(days=1)

< get weekend date >


In the autoregressive features, 14, 20, 28, and 35 are features that have a correlation with sales greater than 0.93 within one year, while 356,364,370 are features that have a correlation greater than 0.92 after one year, and cannot even reach 0.90 after two years, so they are not considered.


In [6]:
def FE(df):
    df['index']=np.arange(len(df))
    df=df.sort_values(['date']).reset_index(drop=True)

    print("< autoregression feature >")
    for gap in [14,20,28,35,356,364,370]:
        df[f'sales_shift{gap}']=df.groupby(['warehouse','name'])['sales'].shift(gap)
    
    print("< date feature >")
 
    df['date_copy']=df['date']
    df['date_copy']=pd.to_datetime(df['date_copy'])
    
    df['dayofyear']=df['date_copy'].dt.dayofyear
    df['sin_dayofyear']=np.sin(2*np.pi*df['dayofyear']/365)
    df['cos_dayofyear']=np.cos(2*np.pi*df['dayofyear']/365)
    
    df['dayofweek']=df['date_copy'].dt.dayofweek
    df['weekday'] = df['date_copy'].dt.weekday
    df['weekend']=(df['dayofweek']>4).astype(np.int8)
    df['sin_dayofweek']=np.sin(2*np.pi*df['dayofweek']/7)
    df['cos_dayofweek']=np.cos(2*np.pi*df['dayofweek']/7)

    dayofweek2mean={0: 0.1414975636804815,1: 0.13738876781429193,
    2: 0.14013498532762625,3: 0.15052082748144407,4: 0.16312265716870394,
    5: 0.13516123608364708,6: 0.13217396244380525}
    df['dayofweek']=df['dayofweek'].apply(lambda x:dayofweek2mean[x])

    df['weekofyear'] = df['date_copy'].dt.isocalendar().week
    df['sin_weekofyear']=np.sin(2*np.pi*df['weekofyear']/52)
    df['cos_weekofyear']=np.cos(2*np.pi*df['weekofyear']/52)

    df['year']=df['date_copy'].dt.year
    df['quarter']=df['date_copy'].dt.quarter
    df['sin_quarter']=np.sin(2*np.pi*df['quarter']/4)
    df['cos_quarter']=np.cos(2*np.pi*df['quarter']/4)
    
    df['month']=df['date_copy'].dt.month
    df['is_month_start'] = df['date_copy'].dt.is_month_start
    df['is_month_end'] = df['date_copy'].dt.is_month_end
    df['sin_month']=np.sin(2*np.pi*df['month']/12)
    df['cos_month']=np.cos(2*np.pi*df['month']/12)
    
    df['day']=df['date_copy'].dt.day
    df['dayofmonth']=df['day']//10
    df['sin_day']=np.sin(2*np.pi*df['day']/30)
    df['cos_day']=np.cos(2*np.pi*df['day']/30)

    print("< data clean >")
    #name:'Pastry_196'
    df['name_0']=df['name'].apply(lambda x:x.split("_")[0])
    df['name_1']=df['name'].apply(lambda x:x.split("_")[1])
    df.drop(['name'],axis=1,inplace=True)
    for i in range(2,5):
        df[f'L{i}_category_name_en']=df[f'L{i}_category_name_en'].apply(lambda x:x.split('_')[2])

    print("< store2country feature >")
    store2country = {
        'Budapest_1': 'Hungary',
        'Prague_2': 'Czechia',
        'Brno_1': 'Czechia',
        'Prague_1': 'Czechia',
        'Prague_3': 'Czechia',
        'Munich_1': 'Germany',
        'Frankfurt_1': 'Germany'
    }
    df['country']=df['warehouse'].apply(lambda x:store2country[x])

    print("< add holiday >")
    #rohlik top1 solution:https://www.kaggle.com/code/yunsuxiaozi/rohlik-top1-solution/notebook
    rename_dict = {
        "Memorial Day for the Victims of the Holocaust": "Victims of the Holocaust",
        "Memorial Day for the Victims of the Communist Dictatorships": "Victims of the Communist",
        "Den vzniku samostatneho ceskoslovenskeho statu": "Den vzniku"
    }
    df['holiday_name'] = df['holiday_name'].replace(rename_dict)
    df.loc[(df['holiday']==1)&(df['holiday_name'].isna()),'holiday_name']='Easter Monday'
    datesx = ['03/31/2024', '04/09/2023', '04/17/2022', '04/04/2021', '04/12/2020']
    holidaysx = [datetime.strptime(date, '%m/%d/%Y') - timedelta(days=1) for date in datesx]
    warehouses = ['Prague_1', 'Prague_2', 'Prague_3']
    df.loc[(df['date'].isin(holidaysx)) & (df['warehouse'].isin(warehouses)), 'holiday'] = 1

    print("< add weekend feature >")
    df.loc[(df['holiday_name'].isna())&(df['date'].isin(weekends)),'holiday_name']='weekend'
    #simple weekend
    df['is_holiday']=(df['holiday_name']==df['holiday_name']).astype(np.int8)
    #holiday but not weekend
    df.loc[(df['is_holiday']==1)&(df['holiday_name']!='weekend'),'is_holiday']=2
    
    df['total_type_discount']=0
    for i in range(7):
        df['total_type_discount']+=df[f'type_{i}_discount']

    print("< time diff and shift feature >")
    
    for gap in [1,2]:
        for col in ['is_holiday','weekend']:
            df[col+f"_shift{gap}"]=df.groupby(['warehouse','unique_id','product_unique_id'])[col].shift(gap)

    for col in ['total_orders','sell_price_main','total_type_discount']:#'total_orders*sell_price_main'
        for agg in ['std','skew','max','median']:
            df[f'{agg}_{col}_each_name_WU_per_day']=df.groupby(['date','warehouse','unique_id','name_0','name_1'])[col].transform(agg)
            df[f'{agg}_{col}_each_name0_WU_per_day']=df.groupby(['date','warehouse','unique_id','name_0'])[col].transform(agg)
            df[f'{agg}_{col}_each_L1_WU_per_day']=df.groupby(['date','warehouse','unique_id','L1_category_name_en'])[col].transform(agg)
            df[f'{agg}_{col}_each_name0_W_per_day']=df.groupby(['date','warehouse','name_0'])[col].transform(agg)
            df[f'{agg}_{col}_each_name0_per_day']=df.groupby(['date','name_0'])[col].transform(agg)
            
            for gap in [1]:
                df[f'{agg}_{col}_each_name_WU_per_day_diff{gap}']=df.groupby(['warehouse','unique_id','name_0','name_1'])[f'{agg}_{col}_each_name_WU_per_day'].diff(gap)
                df[f'{agg}_{col}_each_name0_WU_per_day_diff{gap}']=df.groupby(['warehouse','unique_id','name_0','name_1'])[f'{agg}_{col}_each_name0_WU_per_day'].diff(gap)
                df[f'{agg}_{col}_each_L1_WU_per_day_diff{gap}']=df.groupby(['warehouse','unique_id','name_0','name_1'])[f'{agg}_{col}_each_L1_WU_per_day'].diff(gap)
                df[f'{agg}_{col}_each_name0_W_per_day_diff{gap}']=df.groupby(['warehouse','unique_id','name_0','name_1'])[f'{agg}_{col}_each_name0_W_per_day'].diff(gap)
                df[f'{agg}_{col}_each_name0_per_day_diff{gap}']=df.groupby(['warehouse','unique_id','name_0','name_1'])[f'{agg}_{col}_each_name0_per_day'].diff(gap)
                
    df=df.sort_values(['index']).reset_index(drop=True)
    
    df.drop(['index','date_copy'],axis=1,inplace=True)
    
    return df

total=pd.concat((train,test))
total=FE(total)
#baside target,columns in train.columns but not in test.columns
drop_cols=['availability']
total.drop([col for col in total.columns if total[col].isna().mean()>0.99]+drop_cols,axis=1,inplace=True)
train=total[:len(train)]
test=total[len(train):].drop(['sales','weight'],axis=1)
del total
gc.collect()
train['sales']=train['sales']/train['dayofweek']

print(f"train.shape:{train.shape},test.shape:{test.shape}")
train.head()

< autoregression feature >
< date feature >
< data clean >
< store2country feature >
< add holiday >
< add weekend feature >
< time diff and shift feature >
train.shape:(2938869, 149),test.shape:(47021, 147)


Unnamed: 0,unique_id,date,warehouse,total_orders,sales,sell_price_main,type_0_discount,type_1_discount,type_2_discount,type_3_discount,...,median_total_type_discount_each_name_WU_per_day,median_total_type_discount_each_name0_WU_per_day,median_total_type_discount_each_L1_WU_per_day,median_total_type_discount_each_name0_W_per_day,median_total_type_discount_each_name0_per_day,median_total_type_discount_each_name_WU_per_day_diff1,median_total_type_discount_each_name0_WU_per_day_diff1,median_total_type_discount_each_L1_WU_per_day_diff1,median_total_type_discount_each_name0_W_per_day_diff1,median_total_type_discount_each_name0_per_day_diff1
0,4845,2024-03-10,Budapest_1,6436.0,123.624954,646.26,0.0,0.0,0.0,0.0,...,0.15312,0.15312,0.15312,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,4845,2021-12-20,Budapest_1,6507.0,244.173815,455.96,0.0,0.0,0.0,0.0,...,0.15025,0.15025,0.15025,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,4845,2023-04-29,Budapest_1,5463.0,255.398671,646.26,0.20024,0.0,0.0,0.0,...,0.35336,0.35336,0.35336,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,4845,2022-04-01,Budapest_1,5997.0,220.202396,486.41,0.0,0.0,0.0,0.0,...,0.15649,0.15649,0.15649,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,4845,2024-03-02,Budapest_1,6760.0,201.685045,646.26,0.0,0.0,0.0,0.0,...,0.15312,0.15312,0.15312,0.0,0.0,0.0,0.0,0.0,0.0,0.0


# <span><h1 style = "font-family: garamond; font-size: 40px; font-style: normal; letter-spcaing: 3px; background-color: #f6f5f5; color :#fe346e; border-radius: 100px 100px; text-align:center">4.Fit and Predict</h1></span>

In [7]:
def weighted_MAE(y_true,y_pred,weight):
    return np.sum(weight*np.abs(y_true-y_pred))/np.sum(weight)

xgb_params={'objective': 'reg:squarederror', 'colsample_bytree': 0.6, 
            'enable_categorical': True, 'learning_rate': 0.2,  'max_depth': 9, 
            'n_estimators': 1280,  'random_state': 2024, 'reg_alpha': 0.08,
            'reg_lambda': 0.8, 'subsample': 0.95,'tree_method':'gpu_hist'
           }
cat_params={'random_state':2024,
           'eval_metric'         : 'MAE',
           'bagging_temperature' : 0.50,
           'iterations'          : 2048,
           'learning_rate'       : 0.1,
           'max_depth'           : 12,
           'l2_leaf_reg'         : 1.25,
           'min_data_in_leaf'    : 24,
           'random_strength'     : 0.25, 
           'verbose'             : 0,
           'task_type':"GPU"
          }
models=[(XGBRegressor(**xgb_params),'xgb'),
        (CatBoostRegressor(**cat_params),'cat')
       ]

yunbase=Yunbase(num_folds=1,
                  models=models,
                  FE=None,
                  seed=2024,
                  objective='regression',
                  custom_metric=weighted_MAE,
                  drop_cols=[],
                  target_col='sales',
                  save_oof_preds=True,
                  save_test_preds=False,
                  device='gpu',
                  one_hot_max=10,
                  early_stop=100,
                  cross_cols=['total_orders','sell_price_main','type_0_discount'],
                  use_high_corr_feat=False,
                  use_reduce_memory=True,
                  log=100,
                  plot_feature_importance=True,
)
test_preds=yunbase.purged_cross_validation(train_path_or_file=train,
                                           test_path_or_file=test,
                                           date_col='date',train_gap_each_fold=28,
                                           train_test_gap=0,
                                           use_seasonal_features=False,
                                           use_weighted_metric=True,
                                           category_cols=['unique_id','L2_category_name_en',
                                               'L3_category_name_en','L4_category_name_en',
                                                'name_0','name_1','product_unique_id'],
                                           only_inference=True,
                                          )
test_preds=np.clip(test_preds,0.0, 26316)
test_preds=test_preds*test['dayofweek'].values
yunbase.target_col='sales_hat'
yunbase.submit("/kaggle/input/rohlik-sales-forecasting-challenge-v2/solution.csv",test_preds)

Currently supported metrics:['custom_metric', 'mae', 'rmse', 'mse', 'medae', 'rmsle', 'msle', 'mape', 'r2', 'smape', 'auc', 'pr_auc', 'logloss', 'f1_score', 'mcc', 'accuracy', 'multi_logloss']
Currently supported models:['lgb', 'cat', 'xgb', 'ridge', 'Lasso', 'LinearRegression', 'LogisticRegression', 'tabnet', 'Word2Vec', 'tfidfvec', 'countvec']
Currently supported kfolds:['KFold', 'GroupKFold', 'StratifiedKFold', 'StratifiedGroupKFold', 'purged_CV']
Currently supported objectives:['binary', 'multi_class', 'regression']
< preprocess date_col >


0it [00:00, ?it/s]


< one hot encoder >
[31m-> for column unique_id labelencoder feature[0m
[31m-> for column L2_category_name_en labelencoder feature[0m
[31m-> for column L3_category_name_en labelencoder feature[0m
[31m-> for column L4_category_name_en labelencoder feature[0m
[31m-> for column name_0 labelencoder feature[0m
[31m-> for column name_1 labelencoder feature[0m
[31m-> for column product_unique_id labelencoder feature[0m
< drop high correlation feature >
drop_cols=['max_total_orders_each_name_WU_per_day', 'max_total_orders_each_name0_WU_per_day', 'max_total_orders_each_L1_WU_per_day', 'max_total_orders_each_name0_W_per_day', 'median_total_orders_each_name_WU_per_day', 'median_total_orders_each_name0_WU_per_day', 'median_total_orders_each_L1_WU_per_day', 'median_total_orders_each_name0_W_per_day', 'max_sell_price_main_each_name_WU_per_day', 'max_sell_price_main_each_name0_WU_per_day', 'max_sell_price_main_each_L1_WU_per_day', 'median_sell_price_main_each_name_WU_per_day', 'median_s

0it [00:00, ?it/s]

< one hot encoder >
[31m-> for column unique_id labelencoder feature[0m
[31m-> for column L2_category_name_en labelencoder feature[0m
[31m-> for column L3_category_name_en labelencoder feature[0m





[31m-> for column L4_category_name_en labelencoder feature[0m
[31m-> for column name_0 labelencoder feature[0m
[31m-> for column name_1 labelencoder feature[0m
[31m-> for column product_unique_id labelencoder feature[0m
< cross feature >
< drop useless cols >
nan_cols:[]
unique_cols:['type_1_discount', 'type_2_discount', 'type_3_discount', 'type_5_discount', 'school_holidays']
drop_cols:[]
high_corr_cols:['max_total_orders_each_name_WU_per_day', 'max_total_orders_each_name0_WU_per_day', 'max_total_orders_each_L1_WU_per_day', 'max_total_orders_each_name0_W_per_day', 'median_total_orders_each_name_WU_per_day', 'median_total_orders_each_name0_WU_per_day', 'median_total_orders_each_L1_WU_per_day', 'median_total_orders_each_name0_W_per_day', 'max_sell_price_main_each_name_WU_per_day', 'max_sell_price_main_each_name0_WU_per_day', 'max_sell_price_main_each_L1_WU_per_day', 'median_sell_price_main_each_name_WU_per_day', 'median_sell_price_main_each_name0_WU_per_day', 'median_sell_price_

Default metric period is 5 because MAE is/are not implemented for GPU


0:	learn: 419.5813468	total: 6.9s	remaining: 3h 55m 17s
100:	learn: 123.3749234	total: 56.5s	remaining: 18m 8s
200:	learn: 113.1361033	total: 1m 47s	remaining: 16m 25s
300:	learn: 107.0383564	total: 2m 38s	remaining: 15m 19s
400:	learn: 102.6713543	total: 3m 29s	remaining: 14m 20s
500:	learn: 99.2636880	total: 4m 21s	remaining: 13m 27s
600:	learn: 96.2988313	total: 5m 12s	remaining: 12m 32s
700:	learn: 93.8160139	total: 6m 4s	remaining: 11m 39s
800:	learn: 91.5203867	total: 6m 55s	remaining: 10m 47s
900:	learn: 89.5304655	total: 7m 47s	remaining: 9m 54s
1000:	learn: 87.6206686	total: 8m 39s	remaining: 9m 2s
1100:	learn: 85.9102004	total: 9m 31s	remaining: 8m 11s
1200:	learn: 84.3376741	total: 10m 22s	remaining: 7m 19s
1300:	learn: 82.8951073	total: 11m 15s	remaining: 6m 27s
1400:	learn: 81.4802666	total: 12m 7s	remaining: 5m 35s
1500:	learn: 80.1391247	total: 12m 59s	remaining: 4m 44s
1600:	learn: 78.7513873	total: 13m 51s	remaining: 3m 52s
1700:	learn: 77.5980464	total: 14m 43s	remain