<a href="https://colab.research.google.com/github/mishad01/Data-Science-Machine-Learning/blob/main/Pattern%20Recognition/patternt_lab_1_2_(Label_Encoder).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

A **Label Encoder** is a tool used in **machine learning** to convert categorical data (non-numeric values) into numerical format so that algorithms can understand and process it effectively. Many machine learning models require numerical input, and label encoding helps bridge the gap by mapping each category to a unique integer.

---

### How It Works
For example, suppose you have a column with colors:
```
Red, Blue, Green, Red, Blue
```
Using **Label Encoding**, these categories could be mapped as:
```
Red → 2  
Blue → 0  
Green → 1  
```
The transformed column would look like:
```
2, 0, 1, 2, 0
```

---

### Advantages
- Simple and efficient transformation.
- Useful when there is an **ordinal relationship** between categories (like "Low", "Medium", "High").

### Disadvantages
- Introduces **unintended ordinal relationships** when none exist (e.g., mapping "Apple" to 0 and "Banana" to 1 implies that "Banana" is somehow greater than "Apple").
  
### Alternatives
If your categorical values have no inherent order, it's better to use **One-Hot Encoding**, which creates binary columns for each category.

---

### Implementation in Python (using Scikit-learn)
```python
from sklearn.preprocessing import LabelEncoder

# Sample data
data = ['Red', 'Blue', 'Green', 'Red', 'Blue']

# Create and fit the encoder
encoder = LabelEncoder()
encoded_data = encoder.fit_transform(data)

print(encoded_data)
# Output: [2, 0, 1, 2, 0]

# Mapping back to original labels
decoded_data = encoder.inverse_transform(encoded_data)
print(decoded_data)
```

Would you like help choosing between Label Encoding or One-Hot Encoding for a project you're working on?



```python
import pandas as pd
from sklearn import preprocessing

# Sample DataFrame with categorical column 'Gender'
data = {'Gender': ['Male', 'Female', 'Female', 'Male']}
df = pd.DataFrame(data)

# Initialize the LabelEncoder
lab_encoder = preprocessing.LabelEncoder()

# Apply label encoding
df['Gender'] = lab_encoder.fit_transform(df['Gender'])

# Display the updated DataFrame
print(df)
```

---

### Output
```
   Gender
0       1
1       0
2       0
3       1
```

Here:
- `Male` is encoded as **1**  
- `Female` is encoded as **0**

---

### How It Works
- `lab_encoder.fit_transform(df[col])` transforms the `Gender` column by assigning a number to each category.
- The transformation is stored back in the original DataFrame (`df[col] = ...`).

---

### Why Use a Loop?
If you have multiple categorical columns, your current code (`for col in col_cat`) will handle all of them efficiently. For instance:

```python
col_cat = ['Gender', 'Occupation']
```

In such a case, the loop will apply label encoding to both columns.

---

### Key Note
Be cautious when using Label Encoding for columns like `Gender`. For non-ordinal categories (like `Fruit` or `Color`), consider **One-Hot Encoding** to avoid introducing artificial relationships.

In [None]:
import pandas as pd
from sklearn import preprocessing

# Sample DataFrame with categorical column 'Gender'
data = {'Gender': ['Male', 'Female', 'Female', 'Male']}
df = pd.DataFrame(data)

# Initialize the LabelEncoder
lab_encoder = preprocessing.LabelEncoder()

# Apply label encoding
df['Gender'] = lab_encoder.fit_transform(df['Gender'])

# Display the updated DataFrame
print(df)
