# ENCODING

### What Is Encoding ?

In [None]:
Encoding is the process of transforming categorical variables (like names, colors, or types) into numeric values that
machine learning models can interpret.

Example:

Color	Encoded
Red	    1
Blue	2
Green	3


### Why Encoding is Needed ?


In [None]:
1. ML algorithms work with numbers, not text.

2. Encoding helps algorithms calculate distance, similarity, and weight.

3. It improves model accuracy and efficiency.

### Steps to Perform Encoding

#### Step 1: Identify Categorical Variables

In [None]:
Check which columns in your dataset contain text or category values (like Gender, City, Color, etc.).
Example:

data.select_dtypes(include=['object'])

#### Step 2: Choose the Right Encoding Method

In [None]:
Decide which type of encoding fits your data:

    Type	                      When to Use	                                      Example

Label Encoding	       When categories have order (Ordinal data)	         Low < Medium < High
One Hot Encoding	   When categories have no order (Nominal data)         	Red, Blue, Green
Ordinal Encoding	   When there is a clear ranking	                     Poor < Average < Good

#### Step 3: Apply Encoding

In [None]:
Use libraries like scikit-learn or pandas to perform encoding.

Example (Label Encoding):

from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
data['Gender'] = le.fit_transform(data['Gender'])


Example (One Hot Encoding):

data = pd.get_dummies(data, columns=['City'])

#### Step 4: Merge Encoded Data (if needed)

In [None]:
If your encoded data was created separately, join it back with your main dataset.

data = pd.concat([data, encoded_df], axis=1)

#### Step 5: Verify Results

In [None]:
Check that all categorical values have been replaced with numbers.

data.dtypes

## Flowchart Summary

In [None]:
 ┌──────────────────────────────┐
     │ Step 1: Identify categorical │
     │         variables            │
     └──────────────┬───────────────┘
                    ↓
     ┌──────────────────────────────┐
     │ Step 2: Choose encoding type │
     │ (Label / OneHot / Ordinal)   │
     └──────────────┬───────────────┘
                    ↓
     ┌──────────────────────────────┐
     │ Step 3: Apply encoding using │
     │ pandas or sklearn            │
     └──────────────┬───────────────┘
                    ↓
     ┌──────────────────────────────┐
     │ Step 4: Merge encoded data   │
     │ with main dataset            │
     └──────────────┬───────────────┘
                    ↓
     ┌──────────────────────────────┐
     │ Step 5: Verify final dataset │
     │ (all numeric columns)        │
     └──────────────────────────────┘


## Here’s one simple and clear example of encoding in machine learning 

# Example: One Hot Encoding

In [None]:
Dataset:

Color
Red
Blue
Green

# Goal:

In [None]:
Convert the text values (Red, Blue, Green) into numeric form so a machine learning model can use them.

# Python Code:

In [None]:
import pandas as pd

# Step 1: Create sample data
data = pd.DataFrame({'Color': ['Red', 'Blue', 'Green']})

# Step 2: Apply One Hot Encoding
encoded_data = pd.get_dummies(data, columns=['Color'])

# Step 3: Display result
print(encoded_data)

In [None]:
| Color_Blue | Color_Green | Color_Red |
| ---------- | ----------- | --------- |
| 0          | 0           | 1         |
| 1          | 0           | 0         |
| 0          | 1           | 0         |

# Explanation:

In [None]:
1.Each color becomes a separate column.

2.A 1 indicates the presence of that color, and 0 means absence.

3.For example,

“Red” → [0, 0, 1]

“Blue” → [1, 0, 0]

“Green” → [0, 1, 0]