### Label Encoding

#### Used to encode categorical data as numerical data.

#### Involves assigning a unique numerical label to each category in the variable. The labels are usually assigned in alphabetical order or based on the frequency of the categories. For example, if we have a categorical variable "color" with three possible values (red, green, blue) we can represent it using label encoding as follows:

 - Red: 1
 - Green: 2
 - Blue: 3

In [6]:
from sklearn.preprocessing import LabelEncoder
import pandas as pd

lbl_encoder = LabelEncoder()

In [4]:
df = pd.DataFrame({
    'color': ['red', 'blue', 'green', 'green', 'red', 'blue']
})

df.head()

Unnamed: 0,color
0,red
1,blue
2,green
3,green
4,red


In [7]:
lbl_encoder.fit_transform(df[['color']])

  y = column_or_1d(y, warn=True)


array([2, 0, 1, 1, 2, 0])

In [8]:
lbl_encoder.transform([['red']])

  y = column_or_1d(y, dtype=self.classes_.dtype, warn=True)


array([2])

In [9]:
lbl_encoder.transform([['blue']])

  y = column_or_1d(y, dtype=self.classes_.dtype, warn=True)


array([0])

In [10]:
lbl_encoder.transform([['green']])

  y = column_or_1d(y, dtype=self.classes_.dtype, warn=True)


array([1])

### Disadvantages: Model may think Green is greater than Red and Blue is greater than Green and Red
#### For that We have Ordinal Encoding

### Ordinal Encoding
#### Used to encode categorical data that have an intrinsic order or ranking. In this technique each catetogory is assigned a numerical value based on its position in the order. For example, if we have a categorical variable "education level" with four possible values (high school, college, graduate, post-graduate) we can represent it using encoding as follows:

- High School: 1
- College: 2
- Graduate: 3
- Post-Graduate: 4

In [11]:
from sklearn.preprocessing import OrdinalEncoder

In [12]:
df = pd.DataFrame({
    'size': ['small', 'medium', 'large', 'medium', 'small', 'large']})

In [16]:
df.head(6)

Unnamed: 0,size
0,small
1,medium
2,large
3,medium
4,small
5,large


In [20]:
# in ascending order
encoder = OrdinalEncoder(categories=[['small', 'medium', 'large']])

In [21]:
encoder.fit_transform(df[['size']])

array([[0.],
       [1.],
       [2.],
       [1.],
       [0.],
       [2.]])

In [23]:
encoder.transform([['small']])



array([[0.]])