featexp

Feature exploration for supervised learning. Helps with feature understanding, identifying noisy features, feature debugging, leakage detection and model monitoring.

Installation

pip install featexp

Using featexp

Detailed Medium post on using featexp.

featexp draws plots similar to partial dependence plots, but directly from data instead of using a trained model like current implementations of pdp do. Since, it draws plots from data directly, it helps with understanding the features well and building better ML models.

from featexp import get_univariate_plots
get_univariate_plots(data=data_train, target_col='target', data_test=data_test, features_list=['DAYS_EMPLOYED'])

# data_test and features_list are optional. 
# Draws plots for all columns if features_list not passed
# Draws only train data plots if no test_data passed

featexp bins a feature into equal population bins and shows mean value of dependent variable (target) in each bin. Here's how to read these plots:

Trend plot on left helps you understand the relationship between target and feature.
Population distribution helps you make sure the feature is correct.
Also, shows number of trend direction changes and correlation between train and test trend which can be used to identify noisy features. High number of trend changes or low trend correlation implies high noise.

Example of noisy feature: Has low trend correlation

Getting binned feature stats

Returns mean target and population in each bin of a feature

from featexp import univariate_plotter
binned_data_train, binned_data_test = univariate_plotter(data=data_train, target_col='target', feature='DAYS_EMPLOYED', data_test=data_test)
# For only train data
binned_data_train = univariate_plotter(data=data_train, target_col='target', feature='DAYS_EMPLOYED')

Getting stats for all features:

Returns trend changes and trend correlation for all features in a dataframe

from featexp import get_trend_stats_feature
stats = get_trend_stats(data=data_train, target_col='target', data_test=data_test)

# data_test is optional. If not passed, trend correlations aren't calculated.

Returns a dataframe with trend changes and trend correlation which can be used for dropping the noisy features, etc.

Leakage detection

Helps with identifying why a feature is leaky which helps with debugging.

Nulls have 0% mean target and 100% mean target in other bins. Implies this feature is populated only for target = 1.

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
demo/sample_outputs		demo/sample_outputs
featexp		featexp
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
featexp_demo.ipynb		featexp_demo.ipynb
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

demo/sample_outputs

demo/sample_outputs

featexp

featexp

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

featexp_demo.ipynb

featexp_demo.ipynb

setup.py

setup.py

Repository files navigation

featexp

Installation

Using featexp

Getting binned feature stats

Getting stats for all features:

Leakage detection

About

Releases

Packages

Languages

License

skkeyan/featexp

Folders and files

Latest commit

History

Repository files navigation

featexp

Installation

Using featexp

Getting binned feature stats

Getting stats for all features:

Leakage detection

About

Resources

License

Stars

Watchers

Forks

Languages