<big>For classical machine learning algorithms, we often use the most popular Python library, Scikit-learn. With Scikit-learn you can fit models and search for optimal parameters, but it sometimes works for hours.</big><br><br>

<big>I want to show you how to use Scikit-learn library and get the results faster without changing the code. To do this, we will make use of another Python library, <strong> <a href='https://github.com/intel/scikit-learn-intelex'>Intel® Extension for Scikit-learn*</a></strong>.</big><br><br>

<big>I will show you how to <strong>speed up your kernel more than 2 times</strong> without changing your code!</big><big>

In [None]:
import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt

# Importing data

In [None]:
train = pd.read_csv('../input/competitive-data-science-predict-future-sales/sales_train.csv')
items = pd.read_csv('../input/competitive-data-science-predict-future-sales/items.csv')
categories = pd.read_csv('../input/competitive-data-science-predict-future-sales/item_categories.csv')
shops = pd.read_csv('../input/competitive-data-science-predict-future-sales/shops.csv')
test = pd.read_csv('../input/competitive-data-science-predict-future-sales/test.csv')
submission = pd.read_csv('../input/competitive-data-science-predict-future-sales/sample_submission.csv')

# Preprocessing

<big>I took preprocessing from  <a href='https://www.kaggle.com/gordotron85/future-sales-xgboost-top-3'>here</a> and <a href='https://www.kaggle.com/sarthakbatra/predicting-sales-tutorial'>here</a> </big><br><br> 
<big>The main steps:</big><br><br>
<ol>
<li><big>Cleaning "shops": fix store identifiers, adding categories and cities and convert it to a numeric attribute.</big></li><br>
<li><big>Cleaning "categories": select the category and subcategory of the product and convert it to a numeric attribute.</big></li><br>
<li><big>Cleaning "items": clean item names and types.</big></li><br>
<li><big>Generating prodcuct of Shop-Item pairs for each month in the training data.</big></li><br>
<li><big>Merge shops, items and categories dataframes with new_train set.</big></li><br>
<li><big>Generating Lag Features and Mean-Encodings.</big></li><br>
</ol>

In [None]:
# Delete outliers.

train = train[train['item_cnt_day'] < 2000]
train = train[train['item_price'] < 300000]

# Delete negative item price values.

train = train[train.item_price > 0].reset_index(drop = True)
train.loc[train.item_cnt_day < 1, "item_cnt_day"] = 0

# CLEANING SHOPS

shops_train = train['shop_id'].nunique()
shops_test = test['shop_id'].nunique()



# Some stores with the same name have different ID. We need to fix this.
train.loc[train['shop_id'] == 0, 'shop_id'] = 57
test.loc[test['shop_id'] == 0, 'shop_id'] = 57

train.loc[train['shop_id'] == 1, 'shop_id'] = 58
test.loc[test['shop_id'] == 1, 'shop_id'] = 58

train.loc[train['shop_id'] == 10, 'shop_id'] = 11
test.loc[test['shop_id'] == 10, 'shop_id'] = 11

# Add a city and a shop category.
shops.loc[ shops.shop_name == 'Сергиев Посад ТЦ "7Я"',"shop_name" ] = 'СергиевПосад ТЦ "7Я"'
shops["city"] = shops.shop_name.str.split(" ").map( lambda x: x[0] )
shops["category"] = shops.shop_name.str.split(" ").map( lambda x: x[1] )
shops.loc[shops.city == "!Якутск", "city"] = "Якутск"

# If there are less than 5 stores in one category, we will make them the category "other".
category = []
for cat in shops.category.unique():
    if len(shops[shops.category == cat]) >= 5:
        category.append(cat)
shops.category = shops.category.apply( lambda x: x if (x in category) else "other" )


# Let's transform the category and city of the store into a numeric attribute.
from sklearn import preprocessing
le = preprocessing.LabelEncoder()

shops["shop_category"] = le.fit_transform(shops.category)
shops["shop_city"] = le.fit_transform(shops.city)
shops = shops[["shop_id", "shop_category", "shop_city"]]


# CLEANING CATEGORIES
items_train = train['item_id'].nunique()
items_test = test['item_id'].nunique()


# Select the category and subcategory of the product and convert it to a numeric attribute.
main_categories = categories['item_category_name'].str.split('-')
categories['main_category_id'] = main_categories.map(lambda row: row[0].strip())
categories['main_category_id'] = le.fit_transform(categories['main_category_id'])

# Some items don't have sub-categories. For those, we will use the main category as a sub-category
categories['sub_category_id'] = main_categories.map(lambda row: row[1].strip() if len(row) > 1 else row[0].strip())
categories['sub_category_id'] = le.fit_transform(categories['sub_category_id'])


# CLEANING ITEMS
import re
def name_correction(x):
    x = x.lower() # all letters lower case
    x = x.partition('[')[0] # partition by square brackets
    x = x.partition('(')[0] # partition by curly brackets
    x = re.sub('[^A-Za-z0-9А-Яа-я]+', ' ', x) # remove special characters
    x = x.replace('  ', ' ') # replace double spaces with single spaces
    x = x.strip() # remove leading and trailing white space
    return x


# Clean item names
# split item names by first bracket
items["name1"], items["name2"] = items.item_name.str.split("[", 1).str
items["name1"], items["name3"] = items.item_name.str.split("(", 1).str

# replace special characters and turn to lower case
items["name2"] = items.name2.str.replace('[^A-Za-z0-9А-Яа-я]+', " ").str.lower()
items["name3"] = items.name3.str.replace('[^A-Za-z0-9А-Яа-я]+', " ").str.lower()

# fill nulls with '0'
items = items.fillna('0')

items["item_name"] = items["item_name"].apply(lambda x: name_correction(x))

# return all characters except the last if name 2 is not "0" - the closing bracket
items.name2 = items.name2.apply( lambda x: x[:-1] if x !="0" else "0")



# Clean item type
items["type"] = items.name2.apply(lambda x: x[0:8] if x.split(" ")[0] == "xbox" else x.split(" ")[0] )
items.loc[(items.type == "x360") | (items.type == "xbox360") | (items.type == "xbox 360") ,"type"] = "xbox 360"
items.loc[ items.type == "", "type"] = "mac"
items.type = items.type.apply( lambda x: x.replace(" ", "") )
items.loc[ (items.type == 'pc' )| (items.type == 'pс') | (items.type == "pc"), "type" ] = "pc"
items.loc[ items.type == 'рs3' , "type"] = "ps3"

group_sum = items.groupby(["type"]).agg({"item_id": "count"})
group_sum = group_sum.reset_index()
drop_cols = []
for cat in group_sum.type.unique():
    if group_sum.loc[(group_sum.type == cat), "item_id"].values[0] <40:
        drop_cols.append(cat)
items.name2 = items.name2.apply( lambda x: "other" if (x in drop_cols) else x )
items = items.drop(["type"], axis = 1)

items.name2 = le.fit_transform(items.name2)
items.name3 = le.fit_transform(items.name3)

items.drop(["item_name", "name1"],axis = 1, inplace= True)
items.head()


# Convert the date to "datetime" format.
train['date'] =  pd.to_datetime(train['date'], format='%d.%m.%Y')



from itertools import product
from tqdm import tqdm_notebook

def downcast_dtypes(df):
    '''
        Changes column types in the dataframe: 
                
                `float64` type to `float32`
                `int64`   type to `int32`
    '''
    
    # Select columns to downcast
    float_cols = [c for c in df if df[c].dtype == "float64"]
    int_cols =   [c for c in df if df[c].dtype == "int64"]
    
    # Downcast
    df[float_cols] = df[float_cols].astype(np.float32)
    df[int_cols]   = df[int_cols].astype(np.int32)
    
    return df




# Generating prodcuct of Shop-Item pairs for each month in the training data
months = train['date_block_num'].unique()

cartesian = []
for month in months:
    shops_in_month = train.loc[train['date_block_num']==month, 'shop_id'].unique()
    items_in_month = train.loc[train['date_block_num']==month, 'item_id'].unique()
    cartesian.append(np.array(list(product(*[shops_in_month, items_in_month, [month]])), dtype='int32'))

cartesian_df = pd.DataFrame(np.vstack(cartesian), columns = ['shop_id', 'item_id', 'date_block_num'], dtype=np.int32)


# Add revenue to the train dataset.
train["revenue"] = train["item_cnt_day"] * train["item_price"]


# Aggregating sales to a monthly level and clipping target variable
x = train.groupby(['shop_id', 'item_id', 'date_block_num'])['item_cnt_day'].sum().rename('item_cnt_month').reset_index()
x.head()


# Now we need to merge our two dataframes.
new_train = pd.merge(cartesian_df, x, on=['shop_id', 'item_id', 'date_block_num'], how='left').fillna(0)

new_train.head()


# Now we need to merge our two dataframes. For the intersecting, we will simply put the values that exist in the dataframe x. 
# For the remaining rows, we will sub in zero. Remember, the columns you want to merge on are the intersection of shop_id, item_id, and date_block_num
new_train['item_cnt_month'] = np.clip(new_train['item_cnt_month'], 0, 20)


del x
del cartesian_df
del cartesian


new_train.sort_values(['date_block_num','shop_id','item_id'], inplace = True)


# APPENDING TEST SET TO TRAINING SET

# First, let's insert the date_block_num feature for the test set! Using insert method of pandas to place this new column at a specific index. 
# This will allow us to concatenate the test set easily to the training set before we generate mean encodings and lag features.

test.insert(loc=3, column='date_block_num', value=34)

test['item_cnt_month'] = 0


new_train = new_train.append(test.drop('ID', axis = 1))

# Merge shops, items and categories dataframes with new_train

new_train = pd.merge(new_train, shops, on=['shop_id'], how='left')

new_train = pd.merge(new_train, items, on=['item_id'], how='left')

new_train = pd.merge(new_train, categories.drop('item_category_name', axis = 1), on=['item_category_id'], how='left')


# Generating Lag Features and Mean-Encodings
def lag_feature( df,lags, cols ):
    for col in cols:
        tmp = df[["date_block_num", "shop_id","item_id",col ]]
        for i in lags:
            shifted = tmp.copy()
            shifted.columns = ["date_block_num", "shop_id", "item_id", col + "_lag_"+str(i)]
            shifted.date_block_num = shifted.date_block_num + i
            df = pd.merge(df, shifted, on=['date_block_num','shop_id','item_id'], how='left')
    return df


del items
del categories
del shops


new_train = downcast_dtypes(new_train)

import gc
gc.collect()


# Add item_cnt_month lag features.
new_train = lag_feature( new_train, [1,2,3], ["item_cnt_month"] )


# Add the previous month's average item_cnt.
group = new_train.groupby( ["date_block_num"] ).agg({"item_cnt_month" : ["mean"]})
group.columns = ["date_avg_item_cnt"]
group.reset_index(inplace = True)

new_train = pd.merge(new_train, group, on = ["date_block_num"], how = "left")
new_train.date_avg_item_cnt = new_train["date_avg_item_cnt"].astype(np.float16)
new_train = lag_feature( new_train, [1], ["date_avg_item_cnt"] )
new_train.drop( ["date_avg_item_cnt"], axis = 1, inplace = True )


# Add lag values of item_cnt_month for month / item_id.
group = new_train.groupby(['date_block_num', 'item_id']).agg({'item_cnt_month': ['mean']})
group.columns = [ 'date_item_avg_item_cnt' ]
group.reset_index(inplace=True)

new_train = pd.merge(new_train, group, on=['date_block_num','item_id'], how='left')
new_train.date_item_avg_item_cnt = new_train['date_item_avg_item_cnt'].astype(np.float16)
new_train = lag_feature(new_train, [1,2,3], ['date_item_avg_item_cnt'])
new_train.drop(['date_item_avg_item_cnt'], axis=1, inplace=True)


# Add lag values for item_cnt_month for every month / shop combination.
group = new_train.groupby( ["date_block_num","shop_id"] ).agg({"item_cnt_month" : ["mean"]})
group.columns = ["date_shop_avg_item_cnt"]
group.reset_index(inplace = True)

new_train = pd.merge(new_train, group, on = ["date_block_num","shop_id"], how = "left")
new_train.date_avg_item_cnt = new_train["date_shop_avg_item_cnt"].astype(np.float16)
new_train = lag_feature( new_train, [1,2,3], ["date_shop_avg_item_cnt"] )
new_train.drop( ["date_shop_avg_item_cnt"], axis = 1, inplace = True )


# Add lag values for item_cnt_month for month/shop/item.
group = new_train.groupby( ["date_block_num","shop_id","item_id"] ).agg({"item_cnt_month" : ["mean"]})
group.columns = ["date_shop_item_avg_item_cnt"]
group.reset_index(inplace = True)

new_train = pd.merge(new_train, group, on = ["date_block_num","shop_id","item_id"], how = "left")
new_train.date_avg_item_cnt = new_train["date_shop_item_avg_item_cnt"].astype(np.float16)
new_train = lag_feature( new_train, [1,2,3], ["date_shop_item_avg_item_cnt"] )
new_train.drop( ["date_shop_item_avg_item_cnt"], axis = 1, inplace = True )


# Add lag values for item_cnt_month for month/shop/item subtype.
group = new_train.groupby(['date_block_num', 'shop_id', 'sub_category_id']).agg({'item_cnt_month': ['mean']})
group.columns = ['date_shop_subtype_avg_item_cnt']
group.reset_index(inplace=True)

new_train = pd.merge(new_train, group, on=['date_block_num', 'shop_id', 'sub_category_id'], how='left')
new_train.date_shop_subtype_avg_item_cnt = new_train['date_shop_subtype_avg_item_cnt'].astype(np.float16)
new_train = lag_feature(new_train, [1], ['date_shop_subtype_avg_item_cnt'])
new_train.drop(['date_shop_subtype_avg_item_cnt'], axis=1, inplace=True)


# Add lag values for item_cnt_month for month/city
group = new_train.groupby(['date_block_num', 'shop_city']).agg({'item_cnt_month': ['mean']})
group.columns = ['date_city_avg_item_cnt']
group.reset_index(inplace=True)

new_train = pd.merge(new_train, group, on=['date_block_num', "shop_city"], how='left')
new_train.date_city_avg_item_cnt = new_train['date_city_avg_item_cnt'].astype(np.float16)
new_train = lag_feature(new_train, [1], ['date_city_avg_item_cnt'])
new_train.drop(['date_city_avg_item_cnt'], axis=1, inplace=True)

# Add lag values for item_cnt_month for month/city/item.
group = new_train.groupby(['date_block_num', 'item_id', 'shop_city']).agg({'item_cnt_month': ['mean']})
group.columns = [ 'date_item_city_avg_item_cnt' ]
group.reset_index(inplace=True)

new_train = pd.merge(new_train, group, on=['date_block_num', 'item_id', 'shop_city'], how='left')
new_train.date_item_city_avg_item_cnt = new_train['date_item_city_avg_item_cnt'].astype(np.float16)
new_train = lag_feature(new_train, [1], ['date_item_city_avg_item_cnt'])
new_train.drop(['date_item_city_avg_item_cnt'], axis=1, inplace=True)

# Add average item price on to matix df.
# Add lag values of item price per month.
# Add delta price values - how current month average pirce relates to global average.

group = train.groupby( ["item_id"] ).agg({"item_price": ["mean"]})
group.columns = ["item_avg_item_price"]
group.reset_index(inplace = True)

new_train = new_train.merge( group, on = ["item_id"], how = "left" )
new_train["item_avg_item_price"] = new_train.item_avg_item_price.astype(np.float16)


group = train.groupby( ["date_block_num","item_id"] ).agg( {"item_price": ["mean"]} )
group.columns = ["date_item_avg_item_price"]
group.reset_index(inplace = True)

new_train = new_train.merge(group, on = ["date_block_num","item_id"], how = "left")
new_train["date_item_avg_item_price"] = new_train.date_item_avg_item_price.astype(np.float16)
lags = [1, 2, 3]
new_train = lag_feature( new_train, lags, ["date_item_avg_item_price"] )
for i in lags:
    new_train["delta_price_lag_" + str(i) ] = (new_train["date_item_avg_item_price_lag_" + str(i)]- new_train["item_avg_item_price"] )/ new_train["item_avg_item_price"]

def select_trends(row) :
    for i in lags:
        if row["delta_price_lag_" + str(i)]:
            return row["delta_price_lag_" + str(i)]
    return 0

new_train["delta_price_lag"] = new_train.apply(select_trends, axis = 1)
new_train["delta_price_lag"] = new_train.delta_price_lag.astype( np.float16 )
new_train["delta_price_lag"].fillna( 0 ,inplace = True)

features_to_drop = ["item_avg_item_price", "date_item_avg_item_price"]
for i in lags:
    features_to_drop.append("date_item_avg_item_price_lag_" + str(i) )
    features_to_drop.append("delta_price_lag_" + str(i) )
new_train.drop(features_to_drop, axis = 1, inplace = True)

# Add total shop revenue per month to matix df.
# Add lag values of revenue per month.
# Add delta revenue values - how current month revenue relates to global average.

group = train.groupby( ["date_block_num","shop_id"] ).agg({"revenue": ["sum"] })
group.columns = ["date_shop_revenue"]
group.reset_index(inplace = True)

new_train = new_train.merge( group , on = ["date_block_num", "shop_id"], how = "left" )
new_train['date_shop_revenue'] = new_train['date_shop_revenue'].astype(np.float32)

group = group.groupby(["shop_id"]).agg({ "date_block_num":["mean"] })
group.columns = ["shop_avg_revenue"]
group.reset_index(inplace = True )

new_train = new_train.merge( group, on = ["shop_id"], how = "left" )
new_train["shop_avg_revenue"] = new_train.shop_avg_revenue.astype(np.float32)
new_train["delta_revenue"] = (new_train['date_shop_revenue'] - new_train['shop_avg_revenue']) / new_train['shop_avg_revenue']
new_train["delta_revenue"] = new_train["delta_revenue"]. astype(np.float32)

new_train = lag_feature(new_train, [1], ["delta_revenue"])
new_train["delta_revenue_lag_1"] = new_train["delta_revenue_lag_1"].astype(np.float32)
new_train.drop( ["date_shop_revenue", "shop_avg_revenue", "delta_revenue"] ,axis = 1, inplace = True)

# Add month and number of days in each month to matrix df.

new_train["month"] = new_train["date_block_num"] % 12
days = pd.Series([31,28,31,30,31,30,31,31,30,31,30,31])
new_train["days"] = new_train["month"].map(days).astype(np.int8)

# Add holidays in dataset.
holiday_dict = {
    0: 6,
    1: 3,
    2: 2,
    3: 8,
    4: 3,
    5: 3,
    6: 2,
    7: 8,
    8: 4,
    9: 8,
    10: 5,
    11: 4,
}

new_train['holidays_in_month'] = new_train['month'].map(holiday_dict)


# Add the month of each shop and item first sale.
new_train["item_shop_first_sale"] = new_train["date_block_num"] - new_train.groupby(["item_id","shop_id"])["date_block_num"].transform('min')
new_train["item_first_sale"] = new_train["date_block_num"] - new_train.groupby(["item_id"])["date_block_num"].transform('min')

# Delete first three months from matrix. They don't have lag values.

new_train = new_train[new_train["date_block_num"] > 3]

new_train.head()

def fill_na(df):
    for col in df.columns:
        if ('_lag_' in col) & (df[col].isnull().any()):
            df[col].fillna(0, inplace=True)         
    return df

new_train = downcast_dtypes(new_train)

In [None]:
new_train = pd.read_csv('../input/new-train/new_train.csv', index_col='Unnamed: 0')

# Split data to a train and test sets

In [None]:
x_train = new_train[new_train.date_block_num < 34].drop(['item_cnt_month'], axis=1)
y_train = new_train[new_train.date_block_num < 34]['item_cnt_month']

x_val = new_train[new_train.date_block_num == 33].drop(['item_cnt_month'], axis=1)
y_val = new_train[new_train.date_block_num == 33]['item_cnt_month']

x_test = new_train[new_train.date_block_num == 34].drop(['item_cnt_month'], axis=1)

<big>Normalize data.</big>

In [None]:
from sklearn.preprocessing import StandardScaler, MinMaxScaler
scaler_x = MinMaxScaler()
scaler_y = StandardScaler()

In [None]:
scaler_x.fit(x_train)
x_train = scaler_x.transform(x_train)
x_val = scaler_x.transform(x_val)
x_test = scaler_x.transform(x_test)

In [None]:
scaler_y.fit(y_train.to_numpy().reshape(-1, 1))
y_train = scaler_y.transform(y_train.to_numpy().reshape(-1, 1)).ravel()
y_val = scaler_y.transform(y_val.to_numpy().reshape(-1, 1)).ravel()

# Installing Intel(R) Extension for Scikit-learn

<big>Use Intel® Extension for Scikit-learn* for fast compute Scikit-learn estimators.</big>

In [None]:
!pip install scikit-learn-intelex -q --progress-bar off

<big>Patch original scikit-learn.</big>

In [None]:
from sklearnex import patch_sklearn
patch_sklearn()

# Using optuna to select parameters for Stacking algorithm
<big>Stacking or generalization is an ensemble of machine learning algorithms.

This generalization consists of output combination of individual estimators and the final prediction based on it. Stacking allows to use the strength of each individual estimator by using their output as input of a final estimator.</big><br><br>
<big>We adjust hyperparameters for the best result.</big><br><br>

<big>Parameters that we select:</big><br>
<big>* <code>alpha</code> - Regularization parameter. Regularization improves the solution and reduces the variance of estimates.<br> </big>
<big>* <code>l1_ratio</code> - Regularization parameter. For the penalty is a combination of L1 and L2 regularization.<br> </big>

In [None]:
from sklearn.linear_model import Lasso, ElasticNet, Ridge
from sklearn.ensemble import StackingRegressor
from sklearn.metrics import mean_squared_error
import optuna

In [None]:
def get_stacking_regressor( alpha4=None,
                            alpha1=None, alpha2=None, alpha3=None,
                            l1_ratio=None, l1_ratio2=None
                            ):
    elastic = ElasticNet(alpha=alpha1, l1_ratio=l1_ratio, random_state=0)
    lasso2 = Lasso(alpha=alpha2, random_state=0)
    ridge = Ridge(alpha=alpha3, random_state=0)

    
    elastic_f = ElasticNet(alpha=alpha4, l1_ratio=l1_ratio2, random_state=0)
    stacking_estimators = [
        ('elastic', elastic),
        ('lasso2', lasso2),
        ('ridge', ridge),
    ]
    
    return StackingRegressor(estimators=stacking_estimators, final_estimator=elastic_f)

<big>The process of selecting the parameters is too long and computationally intensive, so I selected the parameters in advance.</big>

In [None]:
def objective_stack(trial):
    params ={
        'alpha4': trial.suggest_float('alpha4', 0.0, 0.02969371481087929),
        'alpha1': trial.suggest_float('alpha1', 0.0, 0.027694846519887552),
        'alpha2': trial.suggest_float('alpha2', 0.0, 0.31557621736570013),
        'alpha3': trial.suggest_float('alpha3', 0.0,  0.029221357138328012),
        'l1_ratio': trial.suggest_float('l1_ratio', 0.0, 0.31140039770607025),
        'l1_ratio2': trial.suggest_float('l1_ratio2', 0.0, 0.09864359696600125),

    }
    model = get_stacking_regressor(**params).fit(x_train, y_train)
    y_pred = model.predict(x_val)
    loss = np.sqrt(mean_squared_error(y_val, y_pred))
    return loss

<big><strong>Select parameters</strong></big>

In [None]:
study = optuna.create_study(sampler=optuna.samplers.TPESampler(seed=123),
                            direction="minimize",
                            pruner=optuna.pruners.HyperbandPruner())

<big>Let's see the execution time with Intel(R) Extension for Scikit-learn.</big>

In [None]:
%%time
study.optimize(objective_stack, n_trials=10)

<big><strong>Training the model with the selected parameters.</strong></big>

In [None]:
x_train_full = np.concatenate((x_train, x_val), axis=0)
y_train_full = np.concatenate((y_train, y_val), axis=0)

In [None]:
%%time
final_model = get_stacking_regressor(**study.best_params).fit(x_train_full, y_train_full)

<big><strong>Prediction.</strong></big>

In [None]:
%%time
y_pred = final_model.predict(x_test)
y_pred = scaler_y.inverse_transform(y_pred)

<big>Save the results in 'submission.csv'.</big>

In [None]:
submission['item_cnt_month'] = y_pred
submission.to_csv('submission.csv', index=False)
submission.head(10)

# Now we use the same algorithms with original scikit-learn

<big>Let’s run the same code with original scikit-learn and compare its execution time with the execution time of the patched by Intel(R) Extension for Scikit-learn.</big>

In [None]:
from sklearnex import unpatch_sklearn
unpatch_sklearn()

In [None]:
from sklearn.linear_model import Lasso, ElasticNet, Ridge

<big><strong>Select parameters</strong></big>

In [None]:
study = optuna.create_study(sampler=optuna.samplers.TPESampler(seed=123),
                            direction="minimize",
                            pruner=optuna.pruners.HyperbandPruner())

<big>Let's see the execution time without patch.</big>

In [None]:
%%time
study.optimize(objective_stack, n_trials=10)

<big><strong>Training the model with the selected parameters.</strong></big>

In [None]:
%%time
final_model = get_stacking_regressor(**study.best_params).fit(x_train_full, y_train_full)

<h2>Conclusions</h2>
<big>We can see that using only one classical machine learning algorithm may give you a pretty hight accuracy score. We also use well-known libraries Scikit-learn and Optuna, as well as the increasingly popular library Intel® Extension for Scikit-learn. Noted that Intel® Extension for Scikit-learn gives you opportunities to:</big>

* <big>Use your Scikit-learn code for training and inference without modification.</big>
* <big>Speed up selection of parameters <strong>from 27 minutes to 12 minutes.</strong></big>
* <big>Get predictions of the similar quality.</big>