Skip to content
compute best number of bins for a histogram
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Optimal Binning

Routine to choose the optimal number of equally spaced bins for use in a histogram. Two methods are used, both based on Hogg. This is packaged into a scikit transformer that takes a vector x and a binary label y, and once fit will return, for a given x, the mean y in the corresponding optimised bin.

There are three versions:

  • This works out the optimal number of equally spaced bins -- seems to work well.
  • Start with a given number of equally spaced bins, recursively merge adjacent bins so long as the log likelihood increases.
  • Start with 1 bin and add bins left to right so long as it increases the likelihood, once these have been added the use the recursive binning above to prune back.

Both the recursive methods seem to not work perfectly, they create too many bins. Neither of the algorithms I've got here do exactly what I want.

You can’t perform that action at this time.