# Binning Feature

<span>The binning of data into a few categorical groups can help us see the summarize the sparse continuous values to a few data points. This can be useful in a few machine learning problems and makes it easy to plot data into a few points in a barplot. I use pandas' cut function to bin the data.</span>

### Import Preliminaries

In [1]:
# Import modules
import numpy as np
import pandas as pd 

# Create a dataframe
df = pd.DataFrame( data={ 'feature_1': np.random.randn(90)})

# Createa a second feature
df['feature_2'] = df.feature_1.copy()

# View the head of the dataframe
df.head(7)

Unnamed: 0,feature_1,feature_2
0,-1.333569,-1.333569
1,-1.289372,-1.289372
2,1.42309,1.42309
3,-0.255041,-0.255041
4,-0.632775,-0.632775
5,1.388692,1.388692
6,-1.373234,-1.373234


### Binning into N Group

Grouping the feature into a set number of bins (5) using the bins parameter.

In [2]:
# Bin the second feature
df.feature_2 = pd.cut(df.feature_1, bins=5)

# View the head of the dataframe
df.head(7)

Unnamed: 0,feature_1,feature_2
0,-1.333569,"(-1.69, -0.575]"
1,-1.289372,"(-1.69, -0.575]"
2,1.42309,"(0.541, 1.657]"
3,-0.255041,"(-0.575, 0.541]"
4,-0.632775,"(-1.69, -0.575]"
5,1.388692,"(0.541, 1.657]"
6,-1.373234,"(-1.69, -0.575]"


### Binning into Predefined Groups

Grouping the feature into bins with a defined range for each bin. Again we pass a list to the bins parameter within the cuts function.

In [3]:
# Bin based of predefined boundaries specified in our bin list
df.feature_2 = pd.cut(df.feature_1, bins=[-5,-3,-2,-1,0,1,2,3,5])

# View the head of dataframe
df.head(7)

Unnamed: 0,feature_1,feature_2
0,-1.333569,"(-2, -1]"
1,-1.289372,"(-2, -1]"
2,1.42309,"(1, 2]"
3,-0.255041,"(-1, 0]"
4,-0.632775,"(-1, 0]"
5,1.388692,"(1, 2]"
6,-1.373234,"(-2, -1]"


### Counting Bins

After binning the data you can run the value_counts function on the feature to output the number of observations that fall into each bin.

In [4]:
# Create a dataframe of the value counts from feature 2
counts = pd.DataFrame(df.feature_2.value_counts())
counts.index.name = "Bins"
counts.columns = ['count']

# View feature 2 value counts
counts

Unnamed: 0_level_0,count
Bins,Unnamed: 1_level_1
"(0, 1]",30
"(-1, 0]",23
"(-2, -1]",17
"(1, 2]",14
"(2, 3]",4
"(-3, -2]",2
"(3, 5]",0
"(-5, -3]",0


Author: Kavi Sekhon