In [1]:
import pandas as pd
from sklearn.preprocessing import LabelEncoder

# Sample categorical data
df = pd.DataFrame({
    "Color": ["Red", "Green", "Blue", "Green", "Red", "Blue"]
})

# Label Encoding
le = LabelEncoder()
df["Label Encoded"] = le.fit_transform(df["Color"])

# One-Hot Encoding (with 0/1 output)
one_hot = pd.get_dummies(df["Color"], prefix="Color", dtype=int)
df = pd.concat([df, one_hot], axis=1)

# Show result
print(df)


   Color  Label Encoded  Color_Blue  Color_Green  Color_Red
0    Red              2           0            0          1
1  Green              1           0            1          0
2   Blue              0           1            0          0
3  Green              1           0            1          0
4    Red              2           0            0          1
5   Blue              0           1            0          0


### Topic 7 – Data Encoding (Label & One-Hot Encoding)

In this notebook, we encode categorical values using two commonly used methods:

- **Label Encoding**: assigns a unique integer to each category.  
- **One-Hot Encoding**: creates separate binary columns (with 0s and 1s) for each category.

We used:
- `sklearn.preprocessing.LabelEncoder` to apply label encoding.
- `pandas.get_dummies(..., dtype=int)` to create binary (0/1) columns instead of default `True/False`.

Encoding categorical data into numeric format is crucial for many machine learning algorithms such as decision trees, logistic regression, and neural networks.

By using `dtype=int`, the one-hot encoded features are compatible with all numerical models and distance-based algorithms.
