## Fast Random Forest Regression with [Intel(R) Extension for Scikit-learn*](https://github.com/intel/scikit-learn-intelex)

### I make sklearn RandomForest estimator **2 times** more faster, with helps [Intel(R) Extension for Scikit-learn](https://github.com/intel/scikit-learn-intelex), using only one function: patch_sklearn()

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns

### Reading data

In [None]:
train = pd.read_csv('../input/tabular-playground-series-jul-2021/train.csv')
test = pd.read_csv('../input/tabular-playground-series-jul-2021/test.csv')
sample_submission = pd.read_csv('../input/tabular-playground-series-jul-2021/sample_submission.csv')

y_train = train[['target_carbon_monoxide', 'target_benzene', 'target_nitrogen_oxides']]
x_train = train.drop(['target_carbon_monoxide', 'target_benzene', 'target_nitrogen_oxides'], axis=1)
x_test = test
x_train.shape, x_test.shape, y_train.shape

Let's check correlation matrix of train data

In [None]:
correlation = train.corr()
correlation

In [None]:
sns.heatmap(correlation, square=True, cmap='coolwarm')

target_benzene have good correlation with sensor_2, so this should be seen in feature importance.

# Installing Intel(R) Extension for Scikit-learn
### Use [Intel(R) Extension for Scikit-learn](https://github.com/intel/scikit-learn-intelex) for fast compute Scikit-learn estimators

In [None]:
!pip install scikit-learn-intelex -q --progress-bar off

### Patch original scikit-learn without any code changes

In [None]:
from sklearnex import patch_sklearn
patch_sklearn()

Preprocessing

In [None]:
x_train['date_time'] = x_train['date_time'].astype('datetime64[ns]').astype(np.int64)/10**9
x_test['date_time'] = x_test['date_time'].astype('datetime64[ns]').astype(np.int64)/10**9

In [None]:
x_train

In [None]:
y_train

### Using GridSearchCV with RandomForestRegressor from Intel(R) Extension for Scikit-learn*

In [None]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import GridSearchCV

param_grid = {'n_estimators':[300, 400, 500, 600, 700, 800, 900, 1000, 1500],
              'max_depth':[8, None],
              'n_jobs':[-1],
              'random_state':[42]}
target_benzene         = y_train['target_benzene']
target_carbon_monoxide = y_train['target_carbon_monoxide']
target_nitrogen_oxides = y_train['target_nitrogen_oxides']

rf = RandomForestRegressor()

#### Train

In [None]:
%%time

clf_tb = GridSearchCV(rf, param_grid).fit(x_train, target_benzene)
clf_tcm = GridSearchCV(rf, param_grid).fit(x_train, target_carbon_monoxide)
clf_tno = GridSearchCV(rf, param_grid).fit(x_train, target_nitrogen_oxides)

In [None]:
print(clf_tb.best_params_, clf_tb.best_score_)
print(clf_tcm.best_params_, clf_tcm.best_score_)
print(clf_tno.best_params_, clf_tno.best_score_)

In [None]:
clf_tb.best_estimator_.feature_importances_

In [None]:
clf_tcm.best_estimator_.feature_importances_

In [None]:
clf_tno.best_estimator_.feature_importances_

#### Prediction

In [None]:
%%time
target_benzene_pred = clf_tb.predict(x_test)
target_carbon_monoxide_pred = clf_tcm.predict(x_test)
target_nitrogen_oxides_pred = clf_tno.predict(x_test)

#### Save results

In [None]:
sample_submission['target_carbon_monoxide'] = target_carbon_monoxide_pred
sample_submission['target_benzene'] = target_benzene_pred
sample_submission['target_nitrogen_oxides'] = target_nitrogen_oxides_pred
sample_submission.to_csv('submission_sklearnex.csv', index=False)

### Using same algorithms with original scikit-learn

In [None]:
from sklearnex import unpatch_sklearn
unpatch_sklearn()

In [None]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import GridSearchCV

param_grid = {'n_estimators':[300, 400, 500, 600, 700, 800, 900, 1000, 1500],
              'max_depth':[8, None],
              'n_jobs':[-1],
              'random_state':[42]}
target_benzene         = y_train['target_benzene']
target_carbon_monoxide = y_train['target_carbon_monoxide']
target_nitrogen_oxides = y_train['target_nitrogen_oxides']

rf = RandomForestRegressor()

In [None]:
%%time

clf_tb = GridSearchCV(rf, param_grid).fit(x_train, target_benzene)
clf_tcm = GridSearchCV(rf, param_grid).fit(x_train, target_carbon_monoxide)
clf_tno = GridSearchCV(rf, param_grid).fit(x_train, target_nitrogen_oxides)

In [None]:
%%time
target_benzene_pred = clf_tb.predict(x_test)
target_carbon_monoxide_pred = clf_tcm.predict(x_test)
target_nitrogen_oxides_pred = clf_tno.predict(x_test)

In [None]:
print(clf_tb.best_params_, clf_tb.best_score_)
print(clf_tcm.best_params_, clf_tcm.best_score_)
print(clf_tno.best_params_, clf_tno.best_score_)

In [None]:
sample_submission['target_carbon_monoxide'] = target_carbon_monoxide_pred
sample_submission['target_benzene'] = target_benzene_pred
sample_submission['target_nitrogen_oxides'] = target_nitrogen_oxides_pred
sample_submission.to_csv('submission_orig.csv', index=False)