<a class="anchor" id="0"></a>

# The importance of all features in different models - Advanced Visualization with Matplotlib and Pandas.parallel_coordinates
## Feature Importance diagrams of 3 models (XGB, LGB, LinReg) and the solution as weighted average of its
## The code is universal for both the Classification and the Regression tasks
### For the example for the my dataset ["Ammonium prediction in river water"](https://www.kaggle.com/vbmokin/ammonium-prediction-in-river-water)

This based on my notebooks:
* [Merging FE & Prediction - xgb, lgb, logr, linr](https://www.kaggle.com/vbmokin/merging-fe-prediction-xgb-lgb-logr-linr)
* [FE - Feature Importance - Advanced Visualization](https://www.kaggle.com/vbmokin/fe-feature-importance-advanced-visualization)

<a class="anchor" id="0.1"></a>

## Table of Contents

1. [Import libraries](#1)
1. [Download datasets](#2)
1. [FE & EDA](#3)
1. [Preparing to modeling](#4)
1. [Tuning models, building the feature importance diagrams and prediction](#5)
    -  [LGBM](#5.1)
    -  [XGB](#5.2)
    -  [Linear Regression](#5.3)
1. [Comparison and merging of all feature importance diagrams](#6)
1. [Feature Importance - Advanced Visualization](#7)
    -  [Matplotlib](#7.1)
    -  [Pandas.parallel_coordinates](#7.2)
1. [Analysis of data forecasting results](#8)

## 1. Import libraries <a class="anchor" id="1"></a>

[Back to Table of Contents](#0.1)

In [None]:
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
warnings.filterwarnings("ignore")

import numpy as np 
import pandas as pd 

# Visualization
import matplotlib.pyplot as plt
from pandas.plotting import parallel_coordinates
import eli5

# Modeling
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
from sklearn.preprocessing import LabelEncoder, StandardScaler
import lightgbm as lgbm
import xgboost as xgb

# Metrics
from sklearn.metrics import r2_score

pd.set_option('max_columns',100)

## 2. Download datasets <a class="anchor" id="2"></a>

[Back to Table of Contents](#0.1)

In [None]:
# Download data
df = pd.read_csv('../input/ammonium-prediction-in-river-water/PB_1996_2019_NH4.csv', sep=';')
df.head(3)

## 3. FE & EDA <a class="anchor" id="3"></a>

[Back to Table of Contents](#0.1)

In [None]:
# Date group by months
df['date'] = pd.to_datetime(df['Date'], format='%d.%m.%Y', errors='coerce').dt.to_period('m')
df

In [None]:
# Selection the main data
df = df[['ID_Station','date','NH4']]
df.head(3)

In [None]:
# Dataset transformation
df_id_list = df['ID_Station'].unique().tolist()
df_id_str_list = [str(x) for x in df_id_list]
df = pd.pivot_table(df, values='NH4', index=['date'], columns='ID_Station')
df.columns = df_id_str_list
df = df.reset_index(drop=False)
df.head(3)

In [None]:
df.info()

In [None]:
# Selection stations with the biggest length of data series (more 80% from all dates)
col2 = []
for col in df_id_str_list:
    if len(df[col]) - df[col].isna().sum() > 0.6*len(df):
        col2.append(col)
df2 = df[['date'] + col2]
df2.info()

In [None]:
df2 = df2.dropna().reset_index(drop=True)
df2.info()

In [None]:
df2

In [None]:
df2[col2].plot(figsize=(16,8))

In [None]:
df2[['16','35']].plot(figsize=(16,3))

In [None]:
df2[col2].mean()

Data for stations '16' and '35' are very small and very differ from others. Remove it.

In [None]:
col3 = col2.copy()
col3.remove('16')
col3.remove('35')
df3 = df2[['date'] + col3]
df3[col3].plot(figsize=(16,8))

Let's try to predict the data of the last station 29 based on data from other stations (located upstream of this river).

In [None]:
target_name = '29'

## 4. Preparing to modeling <a class="anchor" id="4"></a>

[Back to Table of Contents](#0.1)

In [None]:
# We have numerical data only

# # Encoding categorical features
# numerics = ['int8', 'int16', 'int32', 'int64', 'float16', 'float32', 'float64']
# categorical_columns = []
# features = train.columns.values.tolist()
# for col in features:
#     if train[col].dtype in numerics: continue
#     categorical_columns.append(col)
# for col in categorical_columns:
#     if col in train.columns:
#         le = LabelEncoder()
#         le.fit(list(train[col].astype(str).values) + list(test[col].astype(str).values))
#         train[col] = le.transform(list(train[col].astype(str).values))
#         test[col] = le.transform(list(test[col].astype(str).values)) 

## 5. Tuning models, building the feature importance diagrams and prediction<a class="anchor" id="5"></a>

[Back to Table of Contents](#0.1)

In [None]:
# Get data
data = df3[col3]
target = data.pop(target_name)

In [None]:
# Get train abd test data
train, test, target_train, target_test = train_test_split(data, target, test_size=0.2, random_state=0)

In [None]:
train

In [None]:
test.info()

### 5.1 LGBM <a class="anchor" id="5.1"></a>

[Back to Table of Contents](#0.1)

In [None]:
#%% split training set to validation set
Xtrain, Xval, Ztrain, Zval = train_test_split(train, target_train, test_size=0.2, random_state=0)
train_set = lgbm.Dataset(Xtrain, Ztrain, silent=False)
valid_set = lgbm.Dataset(Xval, Zval, silent=False)

In [None]:
# Tuning LGB model
# See parameters in the documentation https://lightgbm.readthedocs.io/en/latest/Parameters.html
params = {
        'boosting_type':'gbdt',
        'objective': 'regression', # for classification task - "binary" or other
        'num_leaves': 31,
        'learning_rate': 0.05,
        'max_depth': -1,
        'subsample': 0.8,
        'bagging_fraction' : 1,
        'max_bin' : 50 ,
        'bagging_freq': 20,
        'colsample_bytree': 0.6,
        'metric': 'rmse',     # eval_metric, for classification task - "binary" or other
        'min_split_gain': 0.5,
        'min_child_weight': 1,
        'min_child_samples': 2,
        'scale_pos_weight':1,
        'zero_as_missing': True,
        'seed':0,        
    }

modelL = lgbm.train(params, train_set = train_set, num_boost_round=2000,
                   early_stopping_rounds=10, verbose_eval=10, valid_sets=valid_set)

In [None]:
# FI diagram drawing
fig =  plt.figure(figsize = (15,15))
axes = fig.add_subplot(111)
lgbm.plot_importance(modelL,ax = axes,height = 0.5)
plt.show();plt.close()

In [None]:
# FI diagram saving
feature_score = pd.DataFrame(train.columns, columns = ['feature']) 
feature_score['LGB'] = modelL.feature_importance()

In [None]:
# Prediction
y_train_lgb = modelL.predict(train, num_iteration=modelL.best_iteration)
y_preds_lgb = modelL.predict(test, num_iteration=modelL.best_iteration)

### 5.2 XGB<a class="anchor" id="5.2"></a>

[Back to Table of Contents](#0.1)

In [None]:
#%% split training set to validation set 
data_tr  = xgb.DMatrix(Xtrain, label=Ztrain)
data_cv  = xgb.DMatrix(Xval   , label=Zval)
data_train = xgb.DMatrix(train)
data_test  = xgb.DMatrix(test)
evallist = [(data_tr, 'train'), (data_cv, 'valid')]

In [None]:
# Tuning XGB model
# See parameters in the documentation https://xgboost.readthedocs.io/en/latest/parameter.html
parms = {'max_depth':5, # maximum depth of a tree
         'objective':'reg:squarederror', # for classification task - "reg:logistic" or other
         'eval_metric':'rmse',      # for classification task - "error" or other
         'learning_rate':0.01,
         'subsample':0.8, # SGD will use this percentage of data
         'colsample_bylevel':0.9,
         'min_child_weight': 2,
         'seed': 0}
modelx = xgb.train(parms, data_tr, num_boost_round=2000, evals = evallist,
                  early_stopping_rounds=300, maximize=False, 
                  verbose_eval=100)

print('score = %1.5f, n_boost_round =%d.'%(modelx.best_score,modelx.best_iteration))

In [None]:
# FI diagram drawing
fig =  plt.figure(figsize = (15,15))
axes = fig.add_subplot(111)
xgb.plot_importance(modelx,ax = axes,height = 0.5)
plt.show();plt.close()

In [None]:
# FI diagram saving
feature_score['XGB'] = feature_score['feature'].map(modelx.get_score(importance_type='weight'))

In [None]:
# Prediction
y_train_xgb = modelx.predict(data_train)
y_preds_xgb = modelx.predict(data_test)

### 5.3 Linear Regression <a class="anchor" id="5.3"></a>

[Back to Table of Contents](#0.1)

In [None]:
# Standardization for regression models
Scaler_train = preprocessing.MinMaxScaler().fit(train)
train = pd.DataFrame(Scaler_train.transform(train), columns=train.columns, index=train.index)
test = pd.DataFrame(Scaler_train.transform(test), columns=test.columns, index=test.index)

In [None]:
# Linear Regression Tuning
linreg = LinearRegression()
linreg.fit(train, target_train)

In [None]:
# FI diagram drawing
coeff_linreg = pd.DataFrame(train.columns)
coeff_linreg.columns = ['feature']
coeff_linreg["LinRegress"] = pd.Series(linreg.coef_)
coeff_linreg.sort_values(by='LinRegress', ascending=False)

In [None]:
# Eli5 visualization
eli5.show_weights(linreg)

In [None]:
# FI diagram saving
coeff_linreg["LinRegress"] = coeff_linreg["LinRegress"].abs()
feature_score = pd.merge(feature_score, coeff_linreg, on='feature')
feature_score = feature_score.fillna(0)
feature_score = feature_score.set_index('feature')
feature_score

In [None]:
# Prediction
y_train_linreg = linreg.predict(train)
y_preds_linreg = linreg.predict(test)

## 6. Comparison and merging of all feature importance diagrams <a class="anchor" id="6"></a>

[Back to Table of Contents](#0.1)

In [None]:
# MinMax scaling all feature importances
feature_score = pd.DataFrame(
    preprocessing.MinMaxScaler().fit_transform(feature_score),
    columns=feature_score.columns,
    index=feature_score.index
)

# Create mean column
feature_score['Mean'] = feature_score.mean(axis=1)

In [None]:
# Merging FI diagram

# Set weight of models
w_lgb = 0.4
w_xgb = 0.5
w_linreg = 1 - w_lgb - w_xgb
w_linreg

# Create merging column with different weights
feature_score['Merging'] = w_lgb*feature_score['LGB'] + w_xgb*feature_score['XGB'] + w_linreg*feature_score['LinRegress']
feature_score.sort_values('Merging', ascending=False)

## 7. Feature Importance - Advanced Visualization <a class="anchor" id="7"></a>

[Back to Table of Contents](#0.1)

### 7.1 Matplotlib <a class="anchor" id="7.1"></a>

[Back to Table of Contents](#0.1)

In [None]:
# Plot the feature importances
plot_title = "Feature Importance - Advanced Visualization with Matplotlib"
feature_score.sort_values('Merging', ascending=False).plot(kind='bar', figsize=(20, 10), title = plot_title)

### 7.2 Pandas.parallel_coordinates <a class="anchor" id="7.2"></a>

[Back to Table of Contents](#0.1)

In [None]:
def plot_feature_parallel(df, title):
    # Draw Pandas.parallel_coordinates for features of the given df
    
    plt.figure(figsize=(15,12))
    parallel_coordinates(df, 'feature', colormap=plt.get_cmap("tab20c"), lw=3)
    plt.title(title)
    plt.xlabel("Models")
    plt.ylabel("Feature importance")
    plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
    plt.savefig('graph.png')
    plt.show()

In [None]:
# List of models
feature_score_columns = feature_score.columns
feature_score_columns

In [None]:
feature_score = feature_score.reset_index(drop=False)
plot_feature_parallel(feature_score, f"All Feature Importance - Advanced Visualization")

In [None]:
feature_score

In [None]:
def features_selection_by_weights(df, threshold):
    # Selection features with weights more threshold at least in a one column (model)

    features_list = df.feature.tolist()
    features_best = []
    for i in range(len(df)):
        feature_name = features_list[i]
        feature_is_best = False
        for col in feature_score_columns:
            if df.loc[i, col] > threshold:
                feature_is_best = True
        if feature_is_best:
            features_best.append(feature_name)
    
    return df[df['feature'].isin(features_best)].reset_index(drop=True)

In [None]:
# Selection the best features
threshold_fi = 0.25
feature_score_best = features_selection_by_weights(feature_score, threshold_fi)
feature_score_best

In [None]:
plot_feature_parallel(feature_score_best, f"All Feature Importance of the best of features - Advanced Visualization")

Then you can remove insignificant features or decide to change the weights of the models' solutions, or you can first find out what accuracy the previously selected weights will give, and then experiment with their options.

## 8. Analysis of data forecasting results<a class="anchor" id="8"></a>

[Back to Table of Contents](#0.1)

In [None]:
# Target for test data
target_test[:10].values

In [None]:
# Mean solution
y_train_mean = (y_train_lgb + y_train_xgb + y_train_linreg)/3  # for training data
y_preds_mean = (y_preds_lgb + y_preds_xgb + y_preds_linreg)/3  # for test data
y_preds_mean[:10]

In [None]:
# Merging solutions
y_train = w_lgb*y_train_lgb + w_xgb*y_train_xgb + w_linreg*y_train_linreg  # for training data
y_preds = w_lgb*y_preds_lgb + w_xgb*y_preds_xgb + w_linreg*y_preds_linreg  # for test data
y_preds[:10]

In [None]:
def plot_data(target, y, y_mean, y_lgb, y_xgb, y_lr, title):
    # Drawing plot with title and with with given target and predicted y
    
    def acc(y_pred):
        return str(round(r2_score(target,y_pred),2))
    
    x = np.arange(len(target))
    plt.figure(figsize=(16,10))
    plt.scatter(x, target, label = "Target data", color = 'k', s=100)
    plt.plot(x, y_lgb, label = f"Model LGB forecast ({acc(y_lgb)})", color = 'b')
    plt.plot(x, y_xgb, label = f"Model XGB forecast ({acc(y_xgb)})", color = 'orange')
    plt.plot(x, y_lr, label = f"Model Linear Regression forecast ({acc(y_lr)})", color = 'g')
    plt.plot(x, y_mean, label = f"Mean forecast ({acc(y_mean)})", color = 'r')
    plt.plot(x, y, label = f"Merging forecasts ({acc(y)})", color = 'purple')
    plt.plot(x, np.full(len(target), 0.5), label = "Maximum allowable value", color = 'brown')
    plt.title(title)
    plt.legend(loc='best')
    plt.grid(True)

In [None]:
# Building plots
plot_data(target_train, y_train, y_train_mean, y_train_lgb, y_train_xgb, y_train_linreg, 'Prediction for the training data (r2_score metrics)')
plot_data(target_test, y_preds, y_preds_mean, y_preds_lgb, y_preds_xgb, y_preds_linreg, 'Prediction for the test data (r2_score metrics)')

The analysis showed that:
1. The average solution ("Mean") is better than the combined with certain weights ("Merging"), which means that those weights were chosen unsuccessfully and need to be changed.
2. Accuracy (r2_score) of models on the test dataset is very bad, which means that the models still need to be improved or make them more complex ensembles.

I hope you find this kernel useful and enjoyable.

Your comments and feedback are most welcome.

[Go to Top](#0)