## One - Hot Encoding 
    One-Hot Encoding is a preprocessing technique used to convert categorical variables into a binary matrix (0s and 1s) so that machine learning models can understand them without implying any ordinal relationship.
    - Each unique category gets its own column.
    - If a row belongs to that category, it is marked 1, otherwise 0.

### Why One-Hot Encoding Is Needed?
    Most machine learning algorithms ::
    Cannot process text categories
    Interpret numbers as ordered or having magnitude
    
    One-hot encoding ::
    Removes false ordinal relationships
    Allows models to treat categories as independent
    Prevents misleading numerical interpretation 

### When Should One-Hot Encoding Be Used?
    - Feature is nominal (no natural order)
    - Categories are independent
    - Using distance-based or linear models

### How One-Hot Encoding Works 
    - Identify unique categories in a feature
    - Create one new column per category
    - Assign :: 1 → category present
                0 → category absent
    - Remove original categorical column

### Importing Essentital Libraries 

In [13]:
import pandas as pd
from sklearn.preprocessing import OneHotEncoder
import seaborn as sns

### Load iris dataset through seaborn 

In [16]:
data = sns.load_dataset("iris")
data.head(2)

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa


### Encoding through pandas get_dumies

In [19]:
ohe_data = pd.get_dummies(data)
ohe_data.head(2)

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species_setosa,species_versicolor,species_virginica
0,5.1,3.5,1.4,0.2,True,False,False
1,4.9,3.0,1.4,0.2,True,False,False


### Got three extra column in species

In [21]:
ohe_data.columns

Index(['sepal_length', 'sepal_width', 'petal_length', 'petal_width',
       'species_setosa', 'species_versicolor', 'species_virginica'],
      dtype='object')

### One-hot Encoding through sklearn OneHotEncoder()

In [25]:
ohe = OneHotEncoder(sparse_output=False)
data_sk_ohe = ohe.fit_transform(data[['species']])
data_sk_ohe_col = pd.DataFrame(data_sk_ohe,columns=['species_setosa', 'species_versicolor', 'species_virginica'])
data_sk_ohe_col.head(2)

Unnamed: 0,species_setosa,species_versicolor,species_virginica
0,1.0,0.0,0.0
1,1.0,0.0,0.0


### Encoding on Tips dataset 

In [28]:
data_1 = sns.load_dataset("tips")
data_1.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


### Using pandas get_dummies()

In [30]:
data_1_d = pd.get_dummies(data_1)
data_1_d.head(2)

Unnamed: 0,total_bill,tip,size,sex_Male,sex_Female,smoker_Yes,smoker_No,day_Thur,day_Fri,day_Sat,day_Sun,time_Lunch,time_Dinner
0,16.99,1.01,2,False,True,False,True,False,False,False,True,False,True
1,10.34,1.66,3,True,False,False,True,False,False,False,True,False,True


In [36]:
data_1_d.columns

Index(['total_bill', 'tip', 'size', 'sex_Male', 'sex_Female', 'smoker_Yes',
       'smoker_No', 'day_Thur', 'day_Fri', 'day_Sat', 'day_Sun', 'time_Lunch',
       'time_Dinner'],
      dtype='object')

### Using sklearn OneHotEncoder()

In [50]:
ohe = OneHotEncoder(sparse_output=False)
data_sk_ohe = ohe.fit_transform(data_1[['sex','smoker','day','time']])
data_sk_ohe_col = pd.DataFrame(data_sk_ohe,columns=['sex_Male','sex_Female','smoker_Yes','smoker_No','day_Thur','day_Fri','day_Sat','day_Sun','time_Lunch','time_Dinner'])
data_sk_ohe_col.head(2)

Unnamed: 0,sex_Male,sex_Female,smoker_Yes,smoker_No,day_Thur,day_Fri,day_Sat,day_Sun,time_Lunch,time_Dinner
0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0
1,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0
