# üè∑Ô∏è Encoding in Machine Learning

This notebook demonstrates **Label Encoding** and **Ordinal Encoding** step-by-step using Python and Scikit-learn.

## 1. Import Libraries

In [1]:
import numpy as np
import pandas as pd

## 2. Create Sample Dataset

In [2]:
df = pd.DataFrame({
    'Gender': ['Male', 'Female', 'Female', 'Male', 'Male'],
    'Size': ['Small', 'Medium', 'Large', 'Medium', 'Small']
})

df

Unnamed: 0,Gender,Size
0,Male,Small
1,Female,Medium
2,Female,Large
3,Male,Medium
4,Male,Small


## 3. Label Encoding using LabelEncoder

In [3]:
from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()
df['Gender_encoded_LE'] = le.fit_transform(df['Gender'])

df

Unnamed: 0,Gender,Size,Gender_encoded_LE
0,Male,Small,1
1,Female,Medium,0
2,Female,Large,0
3,Male,Medium,1
4,Male,Small,1


## 4. Label Encoding using Manual Mapping (Best Practice)

In [4]:
df['Gender_encoded_manual'] = df['Gender'].map({'Male': 1, 'Female': 0})

df

Unnamed: 0,Gender,Size,Gender_encoded_LE,Gender_encoded_manual
0,Male,Small,1,1
1,Female,Medium,0,0
2,Female,Large,0,0
3,Male,Medium,1,1
4,Male,Small,1,1


## 5. Ordinal Encoding using Manual Mapping

In [5]:
df['Size_encoded_manual'] = df['Size'].map({
    'Small': 0,
    'Medium': 1,
    'Large': 2
})

df

Unnamed: 0,Gender,Size,Gender_encoded_LE,Gender_encoded_manual,Size_encoded_manual
0,Male,Small,1,1,0
1,Female,Medium,0,0,1
2,Female,Large,0,0,2
3,Male,Medium,1,1,1
4,Male,Small,1,1,0


## 6. Ordinal Encoding using Sklearn OrdinalEncoder

In [6]:
from sklearn.preprocessing import OrdinalEncoder

oe = OrdinalEncoder(categories=[['Small', 'Medium', 'Large']])
df[['Size_encoded_sklearn']] = oe.fit_transform(df[['Size']])

df

Unnamed: 0,Gender,Size,Gender_encoded_LE,Gender_encoded_manual,Size_encoded_manual,Size_encoded_sklearn
0,Male,Small,1,1,0,0.0
1,Female,Medium,0,0,1,1.0
2,Female,Large,0,0,2,2.0
3,Male,Medium,1,1,1,1.0
4,Male,Small,1,1,0,0.0


## 7. Safe Encoding with Train-Test Split (Avoid Data Leakage)

In [7]:
from sklearn.model_selection import train_test_split

X = df[['Size']]
X_train, X_test = train_test_split(X, test_size=0.3, random_state=0)

oe = OrdinalEncoder(categories=[['Small', 'Medium', 'Large']])
oe.fit(X_train)

X_train_encoded = oe.transform(X_train)
X_test_encoded = oe.transform(X_test)

X_train_encoded, X_test_encoded

(array([[1.],
        [1.],
        [0.]]),
 array([[2.],
        [0.]]))

8. Summary

- Label Encoding is used for binary or simple categorical data.
- Ordinal Encoding is used when categories have a meaningful order.
- Always fit encoders on training data to avoid data leakage.