# Encoding
**Encoding specifically involves transforming categorical data into numerical data so that it can be effectively used by algorithms**

**Summary**
* Label Encoding: Maps categories to integer values.
* One-Hot Encoding: Converts each category into a binary vector.
* Binary Encoding: Encodes categories into binary numbers.
* Ordinal Encoding: Maps categories to ordered integer values.
* Frequency Encoding: Replaces categories with their frequency counts.

# Scaling - to make the magnitude (range) equal for all features (columns)
**Scaling refers to the process of adjusting the range and distribution of numerical data to improve the performance and convergence of machine learning algorithms.**

- 100 - 1
- 200 - 2
- 300 - 3

#Before Scaling
- salary - 10000,20000,30000
- age - 10, 20, 30

- In the above case, since salary magnitude is higher it will be given more importance by the machine learning model

#after scaling
- salary - 1,2,3
- age - 1,2,3

- After Scaling, both features are given equal importance

# Encoding

In [90]:
import pandas as pd

# Example DataFrame
df = pd.DataFrame({'color': ['red', 'green', 'blue', 'green', 'red']})
df

Unnamed: 0,color
0,red
1,green
2,blue
3,green
4,red


In [92]:
encode = {"red":1, "green": 2, "blue": 3}

df["color"] = df["color"].map(encode)

In [94]:
df

Unnamed: 0,color
0,1
1,2
2,3
3,2
4,1


In [80]:
from sklearn.preprocessing import LabelEncoder

In [98]:
le = LabelEncoder()

df["color"] = le.fit_transform(df['color'])
df

Unnamed: 0,color
0,0
1,1
2,2
3,1
4,0


In [None]:
df

In [100]:
df = pd.DataFrame({'company': ['ocean', 'tcs', 'wipro', 'royalenfield', 'mahindra']})

In [102]:
df

Unnamed: 0,company
0,ocean
1,tcs
2,wipro
3,royalenfield
4,mahindra


In [104]:
l = LabelEncoder()

df["company"] = le.fit_transform(df['company'])
df

Unnamed: 0,company
0,1
1,3
2,4
3,2
4,0


**Target encoding**

In [141]:
df = pd.DataFrame({'company': ['ocean', 'tcs', 'wipro', 'royalenfield', 'mahindra']})

In [143]:
encode = {"ocean":1, "tcs": 2, "royalenfield": 3,"mahindra": 4,"wipro":5}

df["company"] = df["company"].map(encode)

In [145]:
df

Unnamed: 0,company
0,1
1,2
2,5
3,3
4,4


## One hot encoding

In [None]:
one_hot_encoded_df = pd.get_dummies(df, columns=['color'])