In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

<center style="font-family:verdana;"><h1 style="font-size:200%; padding: 10px; background: #001f3f;"><b style="color:orange;">Dominance-Analysis : A Python Library for Accurate and Intuitive Relative Importance of Predictors</b></h1></center>

"This package is designed to determine relative importance of predictors for both regression and classification models. The determination of relative importance depends on how one defines importance; Budescu (1993) and Azen and Budescu (2003) proposed using dominance analysis (DA) because it invokes a general and intuitive definition of "relative importance" that is based on the additional contribution of a predictor in all subset models. The purpose of determining predictor importance in the context of DA is not model selection but rather uncovering the individual contributions of the predictors."

"In case the target is a continuous variable, the package determines the dominance of one predictor over another by comparing their incremental R-squared contribution across all subset models. In case the target variable is binary, the package determines the dominance over another by comparing their incremental Pseudo R-Squared contribution across all subset models."

Authors & License

"The Dominance Analysis package is based on the concept developed by Azen and Budescu. This package is released under a MIT License. Dominance Analysis Python package has been developed by Shashank Shekhar, Sajan Bhagat, Kunjithapatham Sivakumar and Bala Koteshwar Kolluri . Pull requests submitted to the GitHub Repo are highly encouraged!"

https://github.com/dominance-analysis/dominance-analysis

In [None]:
train = pd.read_csv('../input/tabular-playground-series-jul-2021/train.csv')
test = pd.read_csv('../input/tabular-playground-series-jul-2021/test.csv')
sample = pd.read_csv('../input/tabular-playground-series-jul-2021/sample_submission.csv')

In [None]:
!pip install dominance-analysis

#Without the Tornado 5.1 it will result in "AttributeError: module 'tornado.ioloop' has no attribute '_Selectable'.

In [None]:
!pip3 install tornado==5.1

#<font color="#EC7063">Important Parameters</font>

data : Complete Dataset, should be a Pandas DataFrame.

target : Name of the target variable, it should be present in passed dataset.

top_k : No. of features to choose from all available features. By default, the package will run for top 15 features.

objective : It can take value either 0 or 1. 0 for Classification and 1 for Regression. By default, the package will run for Regression.

pseudo_r2 : It can take one of the Pseudo R-Squared measures - "mcfadden","nagelkerke", "cox_and_snell" or "estrella", where default="mcfadden". It's not needed in case of regression (objective=1).

data_format: It can take value 0, 1 or 2. 0 is for raw data, 1 is when correlation matrix (correlation of predictors with the target variable) is being passed, 2 is when covraiance matrix (covariance of predictors with the the traget variable) is being passed. By default, the package will run for raw data (data_format=0). This parameter is not needed in case of classification.

https://github.com/dominance-analysis/dominance-analysis

In [None]:
from dominance_analysis import Dominance

train = pd.read_csv('../input/tabular-playground-series-jul-2021/train.csv')
train.head()

#<font color="#EC7063">Dominance Analysis - The Significance!</font>

"Dominance Analysis, according to Azen and Budescu meets three important criteria for measuring relative importance. First, the technique should be defined in terms of its ability to reduce error in predicting the outcome variable. Next, it should permit direct comparison of measures within a model (that is, X1 is twice as important as X2). Finally, the technique should permit inferences concerning an attribute's direct effect (that is, when considered by itself), total effect (that is, when considered with other attributes) and partial effect (that is, when considered with various combinations of other predictors). Hence, Dominance analysis is both robust and intuitive and its interpretation is also very straightforward."

https://github.com/dominance-analysis/dominance-analysis

#Though the file has 3 targets (Pollutant Gases). I'll work with only one.

I barely can handle with one target. Imagine three. No way! That's out of my league.

In [None]:
#Code by Mileta https://www.kaggle.com/mileta1976/dominance-analysis-instant-gratification/notebook

cols = [c for c in train.columns if c not in ['date_time', 'target_carbon_monoxide']]
cols2 = [c for c in train.columns if c not in ['date_time']]

# Negative values in columns are not allowed, so we will use MinMaxScaler.

from sklearn import preprocessing    
scaler = preprocessing.MinMaxScaler()
train_scaled = pd.DataFrame(scaler.fit_transform(train[cols]))

In [None]:
#Code by Mileta https://www.kaggle.com/mileta1976/dominance-analysis-instant-gratification/notebook

train_scaled['target_carbon_monoxide']=train['target_carbon_monoxide']
train_scaled.columns = train[cols2].columns
train_scaled.head()

In [None]:
#https://github.com/dominance-analysis/dominance-analysis
dominance_regression=Dominance(data=train_scaled,target='target_carbon_monoxide',objective=1)

In [None]:
#https://github.com/dominance-analysis/dominance-analysis

incr_variable_rsquare=dominance_regression.incremental_rsquare()

In [None]:
#https://github.com/dominance-analysis/dominance-analysis

dominance_regression.plot_incremental_rsquare()

#<font color="#EC7063">Dominance Statistics</font>

"A relative importance measure should be able to describe a predictor's direct, total and partial effect, therefore in the Dominance Statistics, the authors have come up with four different types of Dominance measures. These measures have been conceptualized, defined and formulated by us and are unique to this library. Below are the definitions and interpretations of the measures:"

"Interactional Dominance - This is the incremental R2 contribution of the predictor to the complete model. Hence, the Interactional Dominance of a particular predictor 'X' will be the diffrence between the R2 of the complete model and the R2 of the model with all other predictors except the particular predictor 'X'.
Consider a scenario when they have Y as the dependent variable and four predictors X1, X2, X3 and X4, let R2Y.X1,X2 be the R2 of the model between Y and X1, X2 ; R2Y.X1,X3 be the R2 of the model between Y and X1, X3 so on and so forth. In this case, the interactional dominance of predictor X1 will be R2Y.X1,X2,X3,X4 - R2Y.X2,X3,X4.
Hence, interactional dominance can be interpreted as the incremental impact or incremental variability explained by the predictor in presence of all other predictors."

"Individual Dominance - The individual dominance of a predictor is the R2 of the model between the dependent variable and the predictor. So, the individual dominance of predictor X1 will be R2Y.X1.
Hence, individual dominance can be interpreted as the variability explained by the predictor alone or the quantum of impact that a predictor will have in absence of all other predictors."

"Average Partial Dominance - This is average of average incremental R2 contributions of the predictor to all subset models except complete model and bi-variate (when only one predcitor is present) model.
Hence, this can be interpreted as the average impact that a predictor has when it is available in all possible combinations with other predictors except the combination when all predcitors are available."

"Total Dominance - The last measure of dominance summarizes the additional contributions of each predictor to all subset models by averaging all the conditional values. This consists of averaging the four averaged entries in each column."

https://github.com/dominance-analysis/dominance-analysis

In [None]:
#https://github.com/dominance-analysis/dominance-analysis

dominance_regression.dominance_stats()

#<font color="#EC7063">Dominance Level</font>

In [None]:
dominance_regression.dominance_level()

#<font color="#EC7063">References</font>

Azen, R. (2000). Inference for predictor comparisons:Dominance analysis and the distribution of R2 differences. Dissertation Abstracts International B, 61/10, 5616.

Azen, R., Budescu, D. V., & Reiser, B. (2001). Criticality of predictors in multiple regression. British Journal of Mathematical and Statistical Psychology, 54, 201–225.

Azen, R., Budescu, D. V. (2003). The Dominance Analysis Approach for Comparing Predictors in Multiple Regression. Psychological Methods, 2003, Vol. 8, No. 2, 129–148. https://doi.org/10.1037/1082-989X.8.2.129

Azen, R., Budescu, D. V. (2006). Comparing Predictors in Multivariate Regression Models: An Extension of Dominance Analysis. Journal of Educational and Behavioral Statistics Summer 2006, Vol. 31, No. 2, pp. 157-180. https://doi.org/10.3102/10769986031002157

Azen, R., Traxel, N. (2009). Using Dominance Analysis to Determine Predictor Importance in Logistic Regression. Journal of Educational and Behavioral Statistics September 2009, Vol. 34, No. 3, pp. 319-347. https://doi.org/10.3102/1076998609332754

Budescu, D. V. (1993). Dominance analysis: A new approach to the problem of relative importance of predictors in multiple regression. Psychological Bulletin, 114(3), 542-551. https://doi.org/10.1037/0033-2909.114.3.542

Luo, W., & Azen, R. (2013). Determining Predictor Importance in Hierarchical Linear Models Using Dominance Analysis. Journal of Educational and Behavioral Statistics, 38(1), 3-31. https://doi.org/10.3102/1076998612458319


#Dominance Analysis 

"This package can be used for dominance analysis or Shapley Value Regression for finding relative importance of predictors on given dataset. This library can be used for key driver analysis or marginal resource allocation models."

https://github.com/dominance-analysis/dominance-analysis