In [1]:
import pandas as pd
import numpy as np

In [2]:
ages = [20, 22, 25, 27, 21, 23, 37, 31, 61, 45, 41, 32]

Let’s divide these into bins of 18 to 25, 26 to 35, 36 to 60, and finally 61 and older. To
do so, you have to use cut, a function in pandas:

In [3]:
bins = [18, 25, 35, 60, 100]

In [4]:
cats = pd.cut(ages, bins)
cats

[(18, 25], (18, 25], (18, 25], (25, 35], (18, 25], ..., (25, 35], (60, 100], (35, 60], (35, 60], (25, 35]]
Length: 12
Categories (4, interval[int64, right]): [(18, 25] < (25, 35] < (35, 60] < (60, 100]]

The object pandas returns is a special Categorical object. The output you see
describes the bins computed by pandas.cut. You can treat it like an array of strings
indicating the bin name; internally it contains a categories array specifying the dis‐
tinct category names along with a labeling for the ages data in the codes attribute:

In [5]:
cats.codes

array([0, 0, 0, 1, 0, 0, 2, 1, 3, 2, 2, 1], dtype=int8)

In [6]:
cats.categories

IntervalIndex([(18, 25], (25, 35], (35, 60], (60, 100]], dtype='interval[int64, right]')

In [7]:
pd.value_counts(cats)

# Note that pd.value_counts(cats) are the bin counts for the result of pandas.cut.

(18, 25]     5
(25, 35]     3
(35, 60]     3
(60, 100]    1
Name: count, dtype: int64

Consistent with mathematical notation for intervals, a parenthesis means that the side
is open, while the square bracket means it is closed (inclusive). You can change which
side is closed by passing right=False:

In [8]:
pd.cut(ages, [18, 26, 36, 61, 100], right=False)

[[18, 26), [18, 26), [18, 26), [26, 36), [18, 26), ..., [26, 36), [61, 100), [36, 61), [36, 61), [26, 36)]
Length: 12
Categories (4, interval[int64, left]): [[18, 26) < [26, 36) < [36, 61) < [61, 100)]

__You can also pass your own bin names by passing a list or array to the labels option:__

In [9]:
group_names = ['Youth', 'YoungAdult', 'MiddleAged', 'Senior']
pd.cut(ages, bins, labels=group_names)

['Youth', 'Youth', 'Youth', 'YoungAdult', 'Youth', ..., 'YoungAdult', 'Senior', 'MiddleAged', 'MiddleAged', 'YoungAdult']
Length: 12
Categories (4, object): ['Youth' < 'YoungAdult' < 'MiddleAged' < 'Senior']

If you pass an integer number of bins to cut instead of explicit bin edges, it will compute 
equal-length bins based on the minimum and maximum values in the data.
Consider the case of some uniformly distributed data chopped into fourths:

In [10]:
data = np.random.rand(20)
pd.cut(data, 4, precision=2)
# The precision=2 option limits the decimal precision to two digits.

[(0.75, 0.99], (0.25, 0.5], (0.75, 0.99], (0.5, 0.75], (0.5, 0.75], ..., (0.5, 0.75], (0.0043, 0.25], (0.75, 0.99], (0.0043, 0.25], (0.0043, 0.25]]
Length: 20
Categories (4, interval[float64, right]): [(0.0043, 0.25] < (0.25, 0.5] < (0.5, 0.75] < (0.75, 0.99]]

A closely related function, qcut, bins the data based on sample quantiles. Depending
on the distribution of the data, using cut will not usually result in each bin having the
same number of data points. Since qcut uses sample quantiles instead, by definition
you will obtain roughly equal-size bins:

In [15]:
data = np.random.randn(1000) # Normally distributed
data
cats = pd.qcut(data, 4) # Cut into quartiles
cats

[(0.0694, 0.711], (-3.387, -0.696], (0.0694, 0.711], (-0.696, 0.0694], (0.711, 3.011], ..., (0.0694, 0.711], (0.0694, 0.711], (0.0694, 0.711], (0.0694, 0.711], (0.0694, 0.711]]
Length: 1000
Categories (4, interval[float64, right]): [(-3.387, -0.696] < (-0.696, 0.0694] < (0.0694, 0.711] < (0.711, 3.011]]

In [16]:
pd.value_counts(cats)

(-3.387, -0.696]    250
(-0.696, 0.0694]    250
(0.0694, 0.711]     250
(0.711, 3.011]      250
Name: count, dtype: int64

Similar to cut you can pass your own quantiles (numbers between 0 and 1, inclusive):

In [17]:
pd.qcut(data, [0, 0.1, 0.5, 0.9, 1.])

[(0.0694, 1.274], (-1.339, 0.0694], (0.0694, 1.274], (-1.339, 0.0694], (0.0694, 1.274], ..., (0.0694, 1.274], (0.0694, 1.274], (0.0694, 1.274], (0.0694, 1.274], (0.0694, 1.274]]
Length: 1000
Categories (4, interval[float64, right]): [(-3.387, -1.339] < (-1.339, 0.0694] < (0.0694, 1.274] < (1.274, 3.011]]