In [4]:
# importing necessary module
import pandas as pd
from sklearn.preprocessing import LabelEncoder      # For label encoding

#### Practice on my custome datasets.
##### 1. Using scikit-learn‚Äôs LabelEncoder

* Nominal Data

In [5]:
df = pd.DataFrame({
    'Fruit': ['Apple', 'Banana', 'Orange', 'Apple', 'Orange', 'Banana'],
    'Price': [1.2, 0.5, 0.8, 1.3, 0.9, 0.6]
})


df

Unnamed: 0,Fruit,Price
0,Apple,1.2
1,Banana,0.5
2,Orange,0.8
3,Apple,1.3
4,Orange,0.9
5,Banana,0.6


In [6]:
le = LabelEncoder()     # Create a object called le od LabelEncoder class.
df['Fruit_encode'] = le.fit(df['Fruit'])

In [7]:
df

Unnamed: 0,Fruit,Price,Fruit_encode
0,Apple,1.2,LabelEncoder()
1,Banana,0.5,LabelEncoder()
2,Orange,0.8,LabelEncoder()
3,Apple,1.3,LabelEncoder()
4,Orange,0.9,LabelEncoder()
5,Banana,0.6,LabelEncoder()


In [8]:
df['Fruit_label'] = le.transform(df['Fruit'])
df

Unnamed: 0,Fruit,Price,Fruit_encode,Fruit_label
0,Apple,1.2,LabelEncoder(),0
1,Banana,0.5,LabelEncoder(),1
2,Orange,0.8,LabelEncoder(),2
3,Apple,1.3,LabelEncoder(),0
4,Orange,0.9,LabelEncoder(),2
5,Banana,0.6,LabelEncoder(),1


In [9]:
df.drop(columns='Fruit_encode')

Unnamed: 0,Fruit,Price,Fruit_label
0,Apple,1.2,0
1,Banana,0.5,1
2,Orange,0.8,2
3,Apple,1.3,0
4,Orange,0.9,2
5,Banana,0.6,1


We can do the fit and transform together using fit_transform.

In [10]:
df2 = pd.DataFrame({
    'Fruit2': [ 'Orange', 'Banana', 'Apple', 'Orange', 'Banana', 'Apple'],
    'Price2': [1.2, 0.5, 0.8, 1.3, 0.9, 0.6]
})
df2

Unnamed: 0,Fruit2,Price2
0,Orange,1.2
1,Banana,0.5
2,Apple,0.8
3,Orange,1.3
4,Banana,0.9
5,Apple,0.6


In [11]:
df2['Fruit_label2'] = le.fit_transform(df2['Fruit2'])
df2

Unnamed: 0,Fruit2,Price2,Fruit_label2
0,Orange,1.2,2
1,Banana,0.5,1
2,Apple,0.8,0
3,Orange,1.3,2
4,Banana,0.9,1
5,Apple,0.6,0


Ordinal Data

In [12]:
df_o = pd.DataFrame({
    'Size': ['Small', 'Medium', 'Large', 'Small', 'Large', 'Medium'],
    'Price': [100, 150, 200, 120, 210, 160]
})
df_o

Unnamed: 0,Size,Price
0,Small,100
1,Medium,150
2,Large,200
3,Small,120
4,Large,210
5,Medium,160


In [13]:
df_o['Size_label'] = le.fit_transform(df_o['Size'])
df_o

Unnamed: 0,Size,Price,Size_label
0,Small,100,2
1,Medium,150,1
2,Large,200,0
3,Small,120,2
4,Large,210,0
5,Medium,160,1


##### 2. Using pandas‚Äô Categorical Type

In [16]:
df_c = pd.DataFrame({
    "Color": ["Red", "Blue", "Green", "Yellow", "Blue", "Red"],
    "Price": [10, 15, 12, 20, 18, 11]
})

df_c

Unnamed: 0,Color,Price
0,Red,10
1,Blue,15
2,Green,12
3,Yellow,20
4,Blue,18
5,Red,11


In [20]:
df_c['cat'] = df_c['Color'].astype('category').cat.codes
df_c

Unnamed: 0,Color,Price,cat
0,Red,10,2
1,Blue,15,0
2,Green,12,1
3,Yellow,20,3
4,Blue,18,0
5,Red,11,2


2Ô∏è‚É£.astype('category')

This converts the column into a categorical type instead of plain text (object type).

Internally, pandas stores categorical data more efficiently ‚Äî each unique label (like ‚ÄúApple‚Äù, ‚ÄúBanana‚Äù, ‚ÄúOrange‚Äù) is assigned a category code.

So, after this step, pandas internally maps:

Apple ‚Üí 0

Banana ‚Üí 1

Orange ‚Üí 2

(or another order, depending on how pandas sorts the categories alphabetically)

3Ô∏è‚É£ .cat.codes

This extracts those internal numeric category codes.

Now you get:

| Fruit   | Fruit_Encoded_Pandas |
|----------|----------------------|
| Apple    | 0                    |
| Banana   | 1                    |
| Orange   | 2                    |
| Apple    | 0                    |
| Orange   | 2                    |
4Ô∏è‚É£ data['Fruit_Encoded_Pandas'] = ...

This creates a new column in the DataFrame named Fruit_Encoded_Pandas and assigns the numeric codes to it.

‚úÖ Final Result
| Fruit   | Fruit_Encoded_Pandas |
|----------|----------------------|
| Apple    | 0                    |
| Banana   | 1                    |
| Orange   | 2                    |
| Apple    | 0                    |
| Orange   | 2                    |

üß† In summary:

This is a quick and efficient way to convert a categorical column (text labels) into numeric form using pandas, without manually encoding or using other libraries.

`Equivalent to: Label Encoding.`

In [24]:
print('Category Mapping:', 
      dict(enumerate(df_c['Color'].astype('category').cat.categories)))

Category Mapping: {0: 'Blue', 1: 'Green', 2: 'Red', 3: 'Yellow'}
