Encode binary features based on binned continuous variables in Pandas.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
pandas_bin_continuous
.gitattributes
.gitignore
LICENSE
README.md
example.ipynb
setup.py

README.md

Pandas-Bin-Continuous

A Pandas extension to allow you to encode binary features based on binned continuous values.

Sometimes keeping continous variables as floating point values or integers may actually throw off your model weights. It may be more useful to encode your continuous variables as seperate binary features decided by their value in different bins, which this extension should address.

Installation

git clone https://github.com/jtloong/pandas-bin-continous
cd pandas-bin-continuous
pip install .

Usage

dataframe = pandas_bin_continuous.create_features(dataframe, bin_edges, feature_name)

Where bin_edges is a list of the different bounds of the bins. The bin_edges follows the same usage of bin lists as in pandas.cut()

You can also view example.ipynb for a more full example of usage.

Notes

Realized after I made this, that sklearn has a binarizer class in their preprocessing module. However, I think this is slightly more useful for features with a wide range of continuous values that you need to seperate into many bins.