Monotonic-WOE-Binning-Algorithm

Developed and documented by John Stephen Joseph Arul Selvam

How to use

pip install monotonic_binning: pip install monotonic-binning (note that earlier versions were hosted on test.pypi.org but the latest version is on pypi.org)
Import monotonic_woe_binning: from monotonic_binning import monotonic_woe_binning as bin
Use fit and transform to bin variables for train and test datasets respectively

Demo Run Details

The demo_run.py file available under tests/ uses German credit card data from Penn State's online course and gives an overview of how to use the package.

Summary of Monotonic WOE

The weight-of-evidence (WOE) method of evaluating strength of predictors is an understated one in the field of analytics. While it is standard fare in credit risk modelling, it is under-utilized in other settings though its formulation makes it generic enough for use in other domains too. The WOE method primarily aims to bin variables into buckets that deliver the most information to a potential classification model. Quite often, WOE binning methods measure effectiveness of such bins using Information Value or IV. For a more detailed introduction to WOE and IV, this article is a useful read.

In the world of credit risk modelling, regulatory oversight often requires that the variables that go into models are split into bins

whose weight of evidence (WOE) values maintain a monotonic relationship with the 1/0 variable (loan default or not default for example.)
are reasonably sized and large enough to be respresentative of population segments, and
maximize the IV value of the given variable in the process of this binning.

To exemplify the constraints such a problem, consider a simple dataset containing age and a default indicator (1 if defaulted, 0 if not). The following is a possible scenario in which the variable is binned into three groups in such a manner that their WOE values decrease monotomically as the ages of customers increase.

The WOE is derived in such a manner that as the WOE value increases, the default rate decreases. So we can infer that younger customers are more likely to default in comparison to older customers.

Arriving at the perfect bin cutoffs to meet all three requirements discussed earlier is a non-trivial exercise. Most statistical software provide this type of optimal discretization of interval variables. R's smbinning package and SAS' proc transreg are two such examples. To my knowledge, Python's solutions to this problem are fairly sparse.

This package is an attempt to complement already exhaustive packages like scorecardpy with the capability to bin variables with monotonic WOE.

References

This algorithm is based on the excellent paper by Mironchyk and Tchistiakov (2017) named "Monotone optimal binning algorithm for credit risk modeling".

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.idea		.idea
build/lib/monotonic_binning		build/lib/monotonic_binning
monotonic_binning		monotonic_binning
tests		tests
LICENSE		LICENSE
README.md		README.md
_config.yml		_config.yml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Monotonic-WOE-Binning-Algorithm

How to use

Demo Run Details

Summary of Monotonic WOE

References

About

Releases

Packages

Languages

License

enriczhang/Monotonic-WOE-Binning-Algorithm

Folders and files

Latest commit

History

Repository files navigation

Monotonic-WOE-Binning-Algorithm

How to use

Demo Run Details

Summary of Monotonic WOE

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages