<a href="https://colab.research.google.com/github/swopnimghimire-123123/Machine-Learning-Journey/blob/main/26_Ordinal_Encoding.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Ordinal Encoding vs. Label Encoding

Both Ordinal Encoding and Label Encoding are techniques used to convert categorical data into numerical data, which is necessary for many machine learning algorithms. However, they differ in how they handle the relationship between categories.

**Label Encoding**

*   Assigns a unique integer to each category in a column.
*   Does **not** assume any inherent order or ranking between the categories.
*   Can be problematic for algorithms that might misinterpret the numerical values as having a quantitative relationship (e.g., a category encoded as '3' is not necessarily "more" than a category encoded as '1').
*   Suitable for categorical features where the order does not matter (e.g., colors like red, blue, green).

**Ordinal Encoding**

*   Assigns an integer to each category based on a specific order or ranking.
*   Assumes that there is a meaningful order between the categories.
*   Suitable for categorical features where the order **does** matter (e.g., educational levels like high school, bachelor's, master's, or sizes like small, medium, large).

**In summary:**

Use **Label Encoding** when there is no order among categories.
Use **Ordinal Encoding** when there is a clear order among categories.

In [None]:
import numpy as np
import pandas as pd

In [None]:
df = pd.read_csv("/content/customer.csv")

In [None]:
df.sample(4)

Unnamed: 0,age,gender,review,education,purchased
14,15,Male,Poor,PG,Yes
7,60,Female,Poor,School,Yes
1,68,Female,Poor,UG,No
38,45,Female,Good,School,No


In [None]:
df = df.iloc[:,2:]

In [None]:
df.sample(4)

Unnamed: 0,review,education,purchased
42,Good,PG,Yes
27,Poor,PG,No
13,Average,School,No
4,Average,UG,No


In [None]:
from sklearn.model_selection import train_test_split

In [None]:
x_train, x_test,y_train, y_test = train_test_split(df.iloc[:,0:2],df.iloc[:,-1],test_size=0.2)

In [None]:
from sklearn.preprocessing import OrdinalEncoder

In [None]:
x_train.sample(10)

Unnamed: 0,review,education
41,Good,PG
23,Good,School
43,Poor,PG
38,Good,School
5,Average,School
0,Average,School
20,Average,School
10,Good,UG
15,Poor,UG
26,Poor,PG


In [None]:
oe = OrdinalEncoder(categories=[['Poor','Average','Good'],['School','UG','PG']])

In [None]:
oe.fit(x_train)

In [None]:
x_train = oe.transform(x_train)
x_test = oe.transform(x_test)

In [None]:
x_train

array([[2., 1.],
       [2., 2.],
       [2., 2.],
       [0., 2.],
       [1., 0.],
       [0., 2.],
       [2., 2.],
       [1., 1.],
       [0., 1.],
       [2., 0.],
       [2., 2.],
       [0., 2.],
       [0., 0.],
       [0., 2.],
       [0., 2.],
       [0., 0.],
       [2., 1.],
       [1., 2.],
       [1., 0.],
       [1., 1.],
       [2., 1.],
       [0., 1.],
       [0., 1.],
       [2., 1.],
       [0., 2.],
       [2., 0.],
       [2., 0.],
       [0., 0.],
       [0., 2.],
       [2., 0.],
       [1., 0.],
       [0., 2.],
       [1., 2.],
       [0., 2.],
       [2., 2.],
       [1., 0.],
       [0., 0.],
       [2., 1.],
       [1., 1.],
       [1., 2.]])

In [None]:
oe.categories_

[array(['Poor', 'Average', 'Good'], dtype=object),
 array(['School', 'UG', 'PG'], dtype=object)]

In [None]:
from sklearn.preprocessing import LabelEncoder

In [None]:
le = LabelEncoder()

In [None]:
le.fit(y_train)

In [None]:
le.classes_

array(['No', 'Yes'], dtype=object)

In [None]:
y_train = le.transform(y_train)
y_test = le.transform(y_test)

In [None]:
y_train

array([1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0,
       0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1])