# Ordinal Encoder

Ordinal encorder transforms each categoriacal feature to one feature of integers between `0` and `(number of categories - 1)`

Ordinal variables comprises a finite set of discrete values with a rank.

e.g: 1st, 2nd, 3rd

we cannot put 3rd rank at the 4th place and 1st rank at the 3rd place

E.g.: 

**Original Values**

Here, categories are ordered such that `PhD` is the highest qualification while `Bachelors` is the least qualification

==> so we have to put these categories in order
```
Variable: Education 
Categories: PhD, Masters, Bachelors
```

**After Applying Ordinal Encoder**

After applying Ordinal encoder, highest rank 1 for `PhD` and the 3rd rank 3 for the `Bachelors`
```
Varaible: Education
Categories: 1, 2, 3
```





In [1]:
from sklearn.metrics import mean_squared_error
y_true = [ 3 , -0.5 , 2 , 7 ]
y_pred = [ 2.5 , 0.0 , 2 , 8 ]
mean_squared_error(y_true, y_pred)

0.375

In [1]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import OrdinalEncoder

In [2]:
dataset_url = "https://raw.githubusercontent.com/shadhini/ML/main/feature_engineering/feature_encoding/res/train.csv"
df = pd.read_csv(dataset_url)

In [3]:
df.head() #top 5 rows
df = df.fillna(np.nan)
df["Embarked"].fillna("S", inplace = True)
df["Embarked"].unique()

array(['S', 'C', 'Q'], dtype=object)


## Apply ordinal encoding with automatic ordering


In [4]:
oe = OrdinalEncoder()
print (oe)

OrdinalEncoder()


In [5]:
df[["Embarked"]] = oe.fit_transform(df[["Embarked"]])
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,2.0
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,0.0
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,2.0
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,2.0
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,2.0


## Apply ordinal encoding with custom ordering of categories

Passengers boarded the Titanic in the order of Southampton, followed by Cherbourg, and finally Queenstown.

Thus let's rank `Embark` vaiable as follows and apply ordinal encoding.

1. Southampton (S)
2. Cherbourg (C)
3. Queenstown (Q)

In [6]:
embarked = ['S', 'C', 'Q']
oe = OrdinalEncoder(categories=[embarked])

In [7]:
df = pd.read_csv(dataset_url)
df = df.fillna(np.nan)
df["Embarked"].fillna("S", inplace = True)
df[["Embarked"]] = oe.fit_transform(df[["Embarked"]])
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,0.0
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,1.0
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,0.0
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,0.0
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,0.0
