One Hot Encoding: One-Hot Encoding is a way to represent categorical data as numbers without introducing false meaning.
- A way to convert the text into vector

_Example category_:

["apple", "banana", "orange"]


One-hot encoding turns this into:

|apple	|banana	|orange
|--|--|--|
|1|	0|	0|
|0|	1|	0|
|0|	0|	1|

Only one position is “hot” (1), the rest are 0.

Advantages

- Easy and deterministic to implement (sklearn, pandas).
- Preserves categorical nature without imposing order.
- Works well for small, stable category sets.
- Highly interpretable and auditable.
- Strong baseline for many ML problems.

Disadvantages

- High dimensionality for large category sets.
- Sparse representations increase variance and instability.
- No semantic or similarity information.
- Poor handling of unseen categories.
- Not scalable for large-vocabulary NLP.
- Ignores frequency and importance.

In [2]:
import pandas as pd

data = {
    "payment_method": ["UPI", "Card", "Wallet", "UPI", "Card"]
}

df = pd.DataFrame(data)

# One-Hot Encoding using pandas
encoded_df = pd.get_dummies(df, columns=["payment_method"])

print(encoded_df)


   payment_method_Card  payment_method_UPI  payment_method_Wallet
0                False                True                  False
1                 True               False                  False
2                False               False                   True
3                False                True                  False
4                 True               False                  False


In [4]:
from sklearn.preprocessing import OneHotEncoder
import numpy as np

data = np.array([["UPI"], ["Card"], ["Wallet"], ["UPI"]])

encoder = OneHotEncoder(sparse_output=False)
encoded = encoder.fit_transform(data)

print(encoded)
print(encoder.get_feature_names_out())


[[0. 1. 0.]
 [1. 0. 0.]
 [0. 0. 1.]
 [0. 1. 0.]]
['x0_Card' 'x0_UPI' 'x0_Wallet']
