# ordinal Encoding
This means that if your data contains categorical data, you must encode it to numbers before you can fit and evaluate a model.

The two most popular techniques are an Ordinal Encoding and a One-Hot Encoding.

Nominal Variable (Categorical):=Variable comprises a finite set of discrete values with no relationship between values.
Ordinal Variable:=Variable comprises a finite set of discrete values with a ranked ordering between values.

![image.png](attachment:image.png)

If your input data have categorical column than we use ordinal encoding but if your output column have categorical column than we use label encoding 

In [1]:
import pandas as pd

In [2]:
url="https://raw.githubusercontent.com/campusx-official/100-days-of-machine-learning/main/day26-ordinal-encoding/customer.csv"
df=pd.read_csv(url)
df

Unnamed: 0,age,gender,review,education,purchased
0,30,Female,Average,School,No
1,68,Female,Poor,UG,No
2,70,Female,Good,PG,No
3,72,Female,Good,PG,No
4,16,Female,Average,UG,No
5,31,Female,Average,School,Yes
6,18,Male,Good,School,No
7,60,Female,Poor,School,Yes
8,65,Female,Average,UG,No
9,74,Male,Good,UG,Yes


In [3]:
df.shape

(50, 5)

In this data set :
gender and purchased are nominal categorical column, 
 review, education are  ordinal categorical column,

but on purchased column we apply label encoding 

In [4]:
# first applied Ordinal encoding
df =df.iloc[:,2:]

In [5]:
df

Unnamed: 0,review,education,purchased
0,Average,School,No
1,Poor,UG,No
2,Good,PG,No
3,Good,PG,No
4,Average,UG,No
5,Average,School,Yes
6,Good,School,No
7,Poor,School,Yes
8,Average,UG,No
9,Good,UG,Yes


In [6]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train,y_test=train_test_split(df.iloc[:,0:2], df.iloc[:,2],test_size=0.2)
x_train

Unnamed: 0,review,education
39,Poor,PG
21,Average,PG
42,Good,PG
26,Poor,PG
38,Good,School
27,Poor,PG
29,Average,UG
48,Good,UG
4,Average,UG
28,Poor,School


In [7]:
from sklearn.preprocessing import OrdinalEncoder

In [8]:
oe=OrdinalEncoder(categories=[['Poor','Average','Good'],['School','UG','PG']])

In [9]:
oe.fit(x_train)

In [10]:
x_train=oe.transform(x_train)
# x_test=oe.transform(x_test)

In [11]:
x_train

array([[0., 2.],
       [1., 2.],
       [2., 2.],
       [0., 2.],
       [2., 0.],
       [0., 2.],
       [1., 1.],
       [2., 1.],
       [1., 1.],
       [0., 0.],
       [1., 1.],
       [1., 1.],
       [0., 1.],
       [1., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 2.],
       [2., 1.],
       [0., 2.],
       [2., 1.],
       [2., 2.],
       [1., 0.],
       [0., 0.],
       [1., 1.],
       [0., 2.],
       [2., 2.],
       [2., 2.],
       [1., 0.],
       [2., 2.],
       [2., 0.],
       [2., 1.],
       [2., 0.],
       [0., 1.],
       [0., 1.],
       [1., 1.],
       [2., 2.],
       [2., 0.],
       [0., 2.],
       [1., 2.]])

In [12]:
x_test=oe.transform(x_test)

In [13]:
x_test

array([[1., 0.],
       [0., 2.],
       [2., 0.],
       [2., 0.],
       [1., 2.],
       [2., 1.],
       [0., 2.],
       [0., 1.],
       [2., 1.],
       [1., 0.]])

use Label Encoding on output column 

Encode target labels with value between 0 and n_classes-1.
This transformer should be used to encode target values, i.e. y, and not the input X

In [14]:
from sklearn.preprocessing import LabelEncoder

In [15]:
le=LabelEncoder()

In [16]:
le.fit(y_train)

In [17]:
le.classes_

array(['No', 'Yes'], dtype=object)

In [18]:
y_train=le.transform(y_train)
y_test=le.transform(y_test)

In [19]:
y_train

array([0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1,
       0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1])