This notebook aims to categorize a continuous variable into specific intervals.

In [1]:
import pandas as pd
import numpy as np

Here are the versions of the libraries I am using:

In [2]:
print('Pandas Version:', pd.__version__, 
      '\nNumpy Version:', np.__version__)

Pandas Version: 2.2.2 
Numpy Version: 1.24.3


Let's create a DataFrame consisting of a column: Floats.
- **Floats:** Contains `1,000` values drawn from a standard normal distribution.

In [3]:
# Create the DataFrame
df = pd.DataFrame({
    'Floats': np.random.randn(1000)
})

# Display the first 3 rows of the DataFrame
df.head(3) 

Unnamed: 0,Floats
0,-0.77317
1,1.51415
2,-0.121042


Let's review descriptive statistics for the`Floats`, transposed for better readability.

In [4]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Floats,1000.0,-0.0071,1.014145,-2.714057,-0.667897,-0.034432,0.670126,3.329568


With the **custom_transform** function we'll categorize values from the `Floats` column into predefined ranges like `>3`, `>2`, and `<-1` using if-elif statements.

In [5]:
def custom_transform(row):
    if row['Floats'] > 3: 
        return '>3'
    elif row['Floats'] > 2: 
        return '>2'
    elif row['Floats'] > 1: 
        return '>1'
    elif row['Floats'] > 0: 
        return '>0'
    elif row['Floats'] < -3: 
        return '<-3'
    elif row['Floats'] < -2: 
        return '<-2'
    elif row['Floats'] < -1:
        return '<-1'
    elif row['Floats'] < 0: 
        return '<0'
    else: 
        return 'ZERO'

We'll apply the **custom_transform** function to each row of the `df` and create a new column called `Category` with the results.

In [6]:
df['Category'] = df.apply(custom_transform, axis=1)

In [7]:
df.pivot(columns='Category').describe().T

Unnamed: 0_level_0,Unnamed: 1_level_0,count,mean,std,min,25%,50%,75%,max
Unnamed: 0_level_1,Category,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Floats,<-1,139.0,-1.393177,0.251087,-1.965988,-1.572732,-1.373747,-1.187195,-1.005093
Floats,<-2,26.0,-2.273333,0.208688,-2.714057,-2.353816,-2.265,-2.10037,-2.005605
Floats,<0,341.0,-0.463998,0.270254,-0.992686,-0.667807,-0.423468,-0.248239,-0.0086
Floats,>0,334.0,0.474242,0.279538,0.004202,0.237195,0.475182,0.696317,0.997641
Floats,>1,131.0,1.35021,0.268824,1.001577,1.146281,1.284686,1.499861,1.991024
Floats,>2,26.0,2.271703,0.236164,2.015028,2.062529,2.199505,2.43131,2.847408
Floats,>3,3.0,3.180913,0.129236,3.095261,3.106585,3.11791,3.223739,3.329568


Alternative: pd.cut