# Topic 03 - Problem 9: Handling Categorical Data - Label Encoding and One-Hot Encoding

---

## 1. About the Problem

This problem asks me to handle **categorical data** in a dataset.  
- **Label Encoding**: Converts each category into a unique integer. This method is useful for ordinal data where the categories have an inherent order (e.g., "low", "medium", "high").
- **One-Hot Encoding**: Converts each category into a binary vector. This method is useful for nominal data where the categories donâ€™t have a meaningful order (e.g., "red", "blue", "green").

I will apply both encoding techniques to a sample dataset with categorical variables.

---


## 2. Solution Code

In [12]:
import pandas as pd
from sklearn.preprocessing import LabelEncoder

data={
    'color':['Red','Blue','Green','Blue','Green','Red'],
    'size':['S','M','L','M','S','L'],
    'category':['A','B','A','B','C','C']
}
df=pd.DataFrame(data)
color_one_hot=pd.get_dummies(df['color'],prefix='color')
category_one_hot=pd.get_dummies(df['category'],prefix='category')

label_encoder=LabelEncoder()
df['size_label_encoder']=label_encoder.fit_transform(df['size'])


encoded_df=pd.concat([df,color_one_hot,category_one_hot],axis=1)

encoded_df=encoded_df.drop(['color','size','category'],axis=1)


print("Data with Label Encoding and One-Hot Encoding:")
print(encoded_df)

Data with Label Encoding and One-Hot Encoding:
   size_label_encoder  color_Blue  color_Green  color_Red  category_A  \
0                   2       False        False       True        True   
1                   1        True        False      False       False   
2                   0       False         True      False        True   
3                   1        True        False      False       False   
4                   2       False         True      False       False   
5                   0       False        False       True       False   

   category_B  category_C  
0       False       False  
1        True       False  
2       False       False  
3        True       False  
4       False        True  
5       False        True  


---

## 3. Summary / Takeaways

By solving this problem, I learned how to handle **categorical data** using two techniques:
1. **Label Encoding**: Useful when the categorical values are **ordinal**, as it assigns a unique integer to each category.  
2. **One-Hot Encoding**: Useful when the categorical values are **nominal**, as it creates a binary vector for each category.

Handling categorical data properly ensures that machine learning models can process it efficiently.  
This step is crucial when dealing with features like **gender**, **color**, or **region** in your data.  
Next, I want to explore **handling missing values** and how it impacts machine learning models.
