feature importance [enhancement] #27

mglowacki100 · 2019-09-04T15:06:00Z

It'd be nice to have 'feature importance' exposed it the same way as in sklearn.

mglowacki100 · 2019-10-04T20:37:12Z

It can be done with rfpimp library i monkey-patching AutoML with scikit (quick temp fix)
Here is sample code:

#https://github.com/parrt/random-forest-importances
#!pip install rfpimp 
import rfpimp

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from supervised.automl import AutoML
from sklearn.metrics import accuracy_score
        

df = pd.read_csv(...)

df_train, df_test = train_test_split(df, test_size=0.20)

X_train, y_train = df_train.drop('Target',axis=1), df_train['Target']
X_test, y_test = df_test.drop('Target',axis=1), df_test['Target']
X_train['random'] = np.random.random(size=len(X_train))
X_test['random'] = np.random.random(size=len(X_test))


print('training')
automl = AutoML(...
)

automl.fit(X_train, y_train)

#monkey-patching of AutoML
def score(self, X, y, sample_weight=None):
    return accuracy_score(y, self.predict(X)['label'], sample_weight=sample_weight)
setattr(AutoML, 'score', score)


print('feature importance')
imp = rfpimp.importances(automl, X_test, y_test) # permutation
viz = rfpimp.plot_importances(imp)
viz.view()

pplonski · 2020-04-24T14:23:09Z

sklearn supports feature importance: https://scikit-learn.org/stable/modules/generated/sklearn.inspection.permutation_importance.html

I will use their implementation. (I need to add predict_proba in algorithms interface just to be compatible with sklearn interface)

pplonski · 2020-04-27T14:27:56Z

For each fold, the feature importance is computed based on permutation. In the plot, there are displayed top-25 best features. All importance values are saved to the CSV file.

The example of the plot:

Tonywhitemin · 2022-06-02T03:41:26Z

Hi pplonski,
Sorry to keep bothering you...
I tried to understand the "features_scores_threshold_2.5.csv" file as following table.
And some questions list below, could you help?

Do the numbers under the learnerX indicate the importance of the feature? How are they calculated?
How to compare which feature corresponds to each column?
How to calculate the "counter" value?
It seems that the number of features in this csv file may contain the number of golden features, right?

pplonski · 2022-06-02T09:22:45Z

@Tonywhitemin the feature importance is computed for each learner. The importance is computed with a permutation method. Each learner has a vector with importance for each feature. The columns in CSV are joined based on features. The first row is the first feature from the dataset.

When computing importance there is injected random feature to the dataset. The counter keeps information of how many times a feature has lower importance than a random feature.

Tonywhitemin · 2022-06-02T14:21:38Z

@pplonski Thanks for your reply!
As you said, "The counter keeps information of how many times a feature has lower importance than a random feature."
But can we know the information about the random feature's value in this csv file? (or which row should be the random feature's value?)

By the way, it seems that the number of features in this csv file may contain some of golden features, is that correct?
Thank you!

pplonski · 2022-06-02T15:14:43Z

@Tonywhitemin you will need to check that in code ... I dont remember all details, sorry!

Tonywhitemin · 2022-06-03T05:33:11Z

Hi @pplonski,
Please don't say sorry, you've helped a lot, I really appreciate your help!
Have a nice day!

pplonski added this to To do in mljar-supervised Oct 22, 2019

pplonski added the enhancement New feature or request label Apr 8, 2020

pplonski self-assigned this Apr 8, 2020

pplonski added this to the version 0.2.0 milestone Apr 8, 2020

pplonski modified the milestones: version 0.2.0, version 0.3.0 Apr 16, 2020

pplonski moved this from To do to In progress in mljar-supervised Apr 24, 2020

pplonski pinned this issue Apr 24, 2020

pplonski assigned aplonska Apr 27, 2020

pplonski added a commit that referenced this issue Apr 27, 2020

add feature importance (#27)

2b8b527

pplonski added a commit that referenced this issue Apr 27, 2020

add feature importance (#27)

190a9f6

pplonski closed this as completed Apr 27, 2020

pplonski unpinned this issue Apr 27, 2020

pplonski moved this from In progress to Done in mljar-supervised May 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature importance [enhancement] #27

feature importance [enhancement] #27

mglowacki100 commented Sep 4, 2019

mglowacki100 commented Oct 4, 2019

pplonski commented Apr 24, 2020 •

edited

pplonski commented Apr 27, 2020

Tonywhitemin commented Jun 2, 2022

pplonski commented Jun 2, 2022

Tonywhitemin commented Jun 2, 2022

pplonski commented Jun 2, 2022

Tonywhitemin commented Jun 3, 2022

feature importance [enhancement] #27

feature importance [enhancement] #27

Comments

mglowacki100 commented Sep 4, 2019

mglowacki100 commented Oct 4, 2019

pplonski commented Apr 24, 2020 • edited

pplonski commented Apr 27, 2020

Tonywhitemin commented Jun 2, 2022

pplonski commented Jun 2, 2022

Tonywhitemin commented Jun 2, 2022

pplonski commented Jun 2, 2022

Tonywhitemin commented Jun 3, 2022

pplonski commented Apr 24, 2020 •

edited