# Categorical Encoding

`Categorical Encoding` is the process of converting categorical (non-numeric) variables into numerical form so that 
machine learning algorithms can process them.

Different encoding techniques are used depending on whether the categorical feature has an inherent order.

---

### Ordinal Encoding

Ordinal Encoding is used for categorical features that have a **natural order**.

Examples:  
Poor < Average < Good  
Low < Medium < High  

Each category is mapped to an integer based on its order.

| Review  | Encoded |
|--------|---------|
| Poor   | 0 |
| Average| 1 |
| Good   | 2 |

**Use case:** Ordered categorical features.

---

### Label Encoding

Label Encoding assigns a **unique integer** to each category.

| Purchased | Encoded |
|----------|---------|
| No | 0 |
| Yes | 1 |

Label Encoding is most appropriate for:
- **Target variables (labels)**  
- Binary categorical features  

Using Label Encoding for **unordered input features** can introduce unintended ordinal relationships.

---

### Important Tips

- Use **Ordinal Encoding** when categories have a meaningful order  
- Use **Label Encoding** mainly for target variables or binary features  
- Avoid Label Encoding for unordered input features with more than two categories  
- For unordered input features, prefer **One-Hot Encoding**

---

## Summary

Choosing the correct categorical encoding method is crucial for model performance.  
Using the wrong encoding can mislead the model and negatively impact learning.


In [None]:
%%capture
!pip install numpy
!pip install pandas
!pip install matplotlib
!pip install scikit-learn

In [5]:
import numpy as np
import pandas as pd


In [18]:
df = pd.read_csv("customer.csv")
df.sample(5)

Unnamed: 0,Purchased,Gender,Education,Review
46,Yes,Female,PG,Good
342,No,Female,HS,Average
398,No,Male,PG,Average
268,Yes,Female,HS,Poor
147,Yes,Male,PG,Good


In [None]:
df=df.drop('Gender',axis=1)

In [25]:
df.head()

Unnamed: 0,Purchased,Education,Review
0,No,UG,Average
1,Yes,HS,Good
2,Yes,PG,Good
3,Yes,HS,Poor
4,No,UG,Good


In [27]:
from sklearn.model_selection import train_test_split
X=df.drop('Purchased',axis=1)
y=df['Purchased']
X,y

(    Education   Review
 0          UG  Average
 1          HS     Good
 2          PG     Good
 3          HS     Poor
 4          UG     Good
 ..        ...      ...
 495        HS     Poor
 496        PG     Poor
 497        PG  Average
 498        HS  Average
 499        PG     Poor
 
 [500 rows x 2 columns],
 0       No
 1      Yes
 2      Yes
 3      Yes
 4       No
       ... 
 495     No
 496     No
 497     No
 498     No
 499     No
 Name: Purchased, Length: 500, dtype: object)

In [28]:
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=42)
X_train.shape

(400, 2)

### **OrdinalEncoding**

In [49]:
from sklearn.preprocessing import OrdinalEncoder


cols = ['Review', 'Education']


oe = OrdinalEncoder(categories=[['Poor','Average','Good'], ['HS','UG','PG']])


X_train_encoded = pd.DataFrame(
    oe.fit_transform(X_train[cols]),
    columns=cols,
    index=X_train.index
)


X_test_encoded = pd.DataFrame(
    oe.transform(X_test[cols]),
    columns=cols,
    index=X_test.index
)


X_train_encoded.head(5)


Unnamed: 0,Review,Education
249,2.0,0.0
433,1.0,1.0
19,0.0,0.0
322,0.0,1.0
332,2.0,2.0


In [51]:
X_train.loc[249]


Education      HS
Review       Good
Name: 249, dtype: object

### **LabelEncoding**

In [52]:
from sklearn.preprocessing import LabelEncoder
le=LabelEncoder()
y_train_encoded=le.fit_transform(y_train)
y_test_encoded=le.transform(y_test)
y_train_encoded

array([0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1,
       0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0,
       0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
       1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1,
       1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0,
       0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0,
       0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1,
       0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1,
       1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1,
       1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0,
       0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1,
       0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0,
       1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1,
       0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0,