One-Hot Encoding (OHE) transforms each category into a new binary column (0 or 1) — marking presence (1) or absence (0) of that category.

It’s especially useful for nominal categorical variables (no natural order).

How One-Hot Encoding Works
Identify unique categories in the feature.

Create a new column for each category.

Assign 1 where the row matches that category, else 0.

Example 1 — Fruits
Original data:
| Fruit  |
| ------ |
| Apple  |
| Banana |
| Mango  |
| Banana |
| Apple  |

One-Hot Encoded:
| Apple | Banana | Mango |
| ----- | ------ | ----- |
| 1     | 0      | 0     |
| 0     | 1      | 0     |
| 0     | 0      | 1     |
| 0     | 1      | 0     |
| 1     | 0      | 0     |

OHE removes any numerical relationship

Python Example with pandas

In [None]:
import pandas as pd

data = pd.DataFrame({
    'Fruit': ["Apple", "Banana", "Mango", "Banana", "Apple"]
})

# One-Hot Encoding
encoded = pd.get_dummies(data, columns=['Fruit'])
print(encoded)


When to Use One-Hot Encoding
✅ Nominal variables (no order, like colors, countries, brands)
✅ When you want to avoid artificial numeric relationships
⚠️ Not great when there are too many unique categories (can create very wide datasets — “curse of dimensionality”).

Label Encoding vs One-Hot Encoding
| Feature        | Label Encoding | One-Hot Encoding |
| -------------- | -------------- | ---------------- |
| Apple          | 0              | 1 0 0            |
| Banana         | 1              | 0 1 0            |
| Mango          | 2              | 0 0 1            |
| Creates Order? | Yes            | No               |
| Columns        | 1              | #categories      |
