## Data Encoding:--
1. Nominal/OHE encoding
2. Lable and Ordinal encoding
3. Target guided Ordinal encoding. 

### 1. Nominal/OHE encoding: 

In [None]:
One hot encoding also known as nominal encoding, is a technique used to represent categorical data as numerable data, which is more suitable 

In [None]:
🔹 When to use One-Hot Encoding (OHE)

Use it when the categories are nominal (no order).
Example:

["Red", "Blue", "Green"] → each gets its own column.

Best for linear models (Linear Regression, Logistic Regression, SVM, Neural Networks).

Because if you use Label Encoding here, the model may assume "Green > Blue > Red," which is wrong.

### 🔹 When to use Label Encoding

Use it when the categories have a natural order (ordinal).
Example:

["Low", "Medium", "High"] → [0, 1, 2]

Makes sense because "High" really is greater than "Medium".

Works well for tree-based models (like Decision Trees, Random Forest, XGBoost, LightGBM).

These models don’t assume linear relationships between numbers, so they don’t get confused by the integer values.

### 🔹 When to use One-Hot Encoding (OHE)

Use it when the categories are nominal (no order).
Example:

["Red", "Blue", "Green"] → each gets its own column.

Best for linear models (Linear Regression, Logistic Regression, SVM, Neural Networks).

Because if you use Label Encoding here, the model may assume "Green > Blue > Red," which is wrong.

### 🔹 Quick Rule of Thumb

Nominal data (no order) → One-Hot Encoding

Ordinal data (order exists) → Label Encoding

In [1]:
import pandas as pd
from sklearn.preprocessing import OneHotEncoder

#### Create a simple dataframe:

In [2]:
df = pd.DataFrame({
    'color' : ['red','blue','green','green','red','blue']
})

In [3]:
df.head()

Unnamed: 0,color
0,red
1,blue
2,green
3,green
4,red


#### Create an instance of one hot encoder: 


In [4]:
encoder= OneHotEncoder()

#### Perform fit and transform 

In [5]:
encoder.fit_transform(df[['color']])

<Compressed Sparse Row sparse matrix of dtype 'float64'
	with 6 stored elements and shape (6, 3)>

In [7]:
encoded = encoder.fit_transform(df[['color']]).toarray()

In [8]:
encoder_df = pd.DataFrame(encoded,columns=encoder.get_feature_names_out())

In [9]:
encoder_df

Unnamed: 0,color_blue,color_green,color_red
0,0.0,0.0,1.0
1,1.0,0.0,0.0
2,0.0,1.0,0.0
3,0.0,1.0,0.0
4,0.0,0.0,1.0
5,1.0,0.0,0.0


In [11]:
pd.concat([df,encoder_df],axis=1)

Unnamed: 0,color,color_blue,color_green,color_red
0,red,0.0,0.0,1.0
1,blue,1.0,0.0,0.0
2,green,0.0,1.0,0.0
3,green,0.0,1.0,0.0
4,red,0.0,0.0,1.0
5,blue,1.0,0.0,0.0


### Assignment: 

In [12]:
import seaborn as sns


In [14]:
df = sns.load_dataset('tips')

In [16]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4
