# Ordinal Encoding

Ordinal encoding is a technique used to convert categorical data into numerical values, but unlike label encoding, it takes into account the order or ranking of categories.

#### Concept
In ordinal encoding, categories are mapped to integers based on their inherent order. For example, a column representing education levels like `["High School", "Bachelor's", "Master's", "PhD"]` may be encoded as 0, 1, 2, 3, reflecting the order of education.

#### Usage
Ordinal encoding is useful when the categorical data has a meaningful order but the distance between the categories is not necessarily equal. This is common in cases where there is a ranking or hierarchy, such as satisfaction levels (`Low`, `Medium`, `High`) or experience levels (`Junior`, `Mid`, `Senior`).

#### Considerations
- Ordinal encoding preserves the order of the categories, making it suitable for ordinal data.
- It should not be used for nominal (non-ordered) data, as this can create false hierarchies and affect model performance.
- Algorithms may treat the numerical values as having a fixed relationship, which could be misleading if the actual difference between categories is not uniform.


In [1]:
import pandas as pd

### 1. Using Scikit-Learn for Ordinal Encoding

This example demonstrates how to apply ordinal encoding to a categorical column using Scikit-Learn's `OrdinalEncoder`.

1. **Data Preparation**  
   - A `DataFrame` named `data` is created with a `Size` column, containing string values representing different sizes like 's', 'm', 'l', 'xl', 'xxl', etc.

2. **Ordinal Encoder Setup**  
   - `ordinal_data = [['s', 'm', 'l', 'xl', 'xxl', 'xxxl']]`: Defines the order of categories explicitly. The list specifies the ranking or order of the sizes, where 's' is the smallest and 'xxxl' is the largest.
   - `OrdinalEncoder(categories=ordinal_data)`: The `OrdinalEncoder` is initialized with the specified categories. This ensures the encoder treats the categories in the given order during encoding.

3. **Fit and Transform**  
   - `arr = ordinal_encoder.fit_transform(data[['Size']])`: The `fit_transform` method is applied to the `Size` column of the `data` DataFrame. It converts the categorical values in the `Size` column into numerical values based on the predefined order.
   - The result of this transformation is stored in the variable `arr`.

4. **Add Encoded Values to DataFrame**  
   - `data['Encoded_Size'] = arr`: The encoded values (numerical representation of sizes) are added to the original DataFrame as a new column named `Encoded_Size`.

5. **Result**  
   - The final DataFrame now contains both the original categorical `Size` column and the newly created `Encoded_Size` column, which holds the ordinally encoded numerical values.


In [2]:
data = pd.DataFrame({'Size' : ['s', 'm', 'l', 'xl', 'xxl', 'm', 's', 'xl', 'm', 'l', 'l', 'xxl', 'xxxl', 'm', 's', 'l']})

data.head()

Unnamed: 0,Size
0,s
1,m
2,l
3,xl
4,xxl


In [3]:
ordinal_data = [['s', 'm', 'l', 'xl', 'xxl', 'xxxl']]

In [4]:
from sklearn.preprocessing import OrdinalEncoder

In [5]:
ordinal_encoder = OrdinalEncoder(categories=ordinal_data)
arr = ordinal_encoder.fit_transform(data[['Size']])
arr

array([[0.],
       [1.],
       [2.],
       [3.],
       [4.],
       [1.],
       [0.],
       [3.],
       [1.],
       [2.],
       [2.],
       [4.],
       [5.],
       [1.],
       [0.],
       [2.]])

In [6]:
data['Encoded_Size'] = arr
data

Unnamed: 0,Size,Encoded_Size
0,s,0.0
1,m,1.0
2,l,2.0
3,xl,3.0
4,xxl,4.0
5,m,1.0
6,s,0.0
7,xl,3.0
8,m,1.0
9,l,2.0


### 2. Ordinal Encoding using `map`

This example demonstrates how to apply ordinal encoding to a categorical column using the `map` method in Pandas.

1. **Data Preparation**  
   - A `DataFrame` named `data` is created with a `Size` column containing sizes such as 's', 'm', 'l', 'xl', 'xxl', and 'xxxl'.

2. **Ordinal Mapping Dictionary**  
   - `ordinal_data_map = {'s':0, 'm':1, 'l':2, 'xl':3, 'xxl':4, 'xxxl':5}`: A dictionary is created to define the ordinal mapping for the `Size` column. Each unique size is mapped to an integer value (e.g., 's' is mapped to 0, 'm' to 1, etc.).

3. **Applying Ordinal Encoding**  
   - `data["Size_Mapped"] = data['Size'].map(ordinal_data_map)`: The `map` method is applied to the `Size` column of the `data` DataFrame. It replaces each size with its corresponding integer from the `ordinal_data_map` dictionary. The result is stored in the new column `Size_Mapped`.

4. **Result**  
   - The `data` DataFrame now contains a new column, `Size_Mapped`, with the ordinally encoded values of the `Size` column.


In [7]:
ordinal_data_map = {'s':0, 'm':1, 'l':2, 'xl':3, 'xxl':4, 'xxxl':5}
data["Size_Mapped"] = data['Size'].map(ordinal_data_map)
data

Unnamed: 0,Size,Encoded_Size,Size_Mapped
0,s,0.0,0
1,m,1.0,1
2,l,2.0,2
3,xl,3.0,3
4,xxl,4.0,4
5,m,1.0,1
6,s,0.0,0
7,xl,3.0,3
8,m,1.0,1
9,l,2.0,2


### Lets work with the `loan['Property_Area]` dataset 

In [8]:
loan_data = pd.read_csv('loan.csv')
loan_data.head()

Unnamed: 0,Loan_ID,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
0,LP001002,Male,No,0,Graduate,No,5849,0.0,,360.0,1.0,Urban,Y
1,LP001003,Male,Yes,1,Graduate,No,4583,1508.0,128.0,360.0,1.0,Rural,N
2,LP001005,Male,Yes,0,Graduate,Yes,3000,0.0,66.0,360.0,1.0,Urban,Y
3,LP001006,Male,Yes,0,Not Graduate,No,2583,2358.0,120.0,360.0,1.0,Urban,Y
4,LP001008,Male,No,0,Graduate,No,6000,0.0,141.0,360.0,1.0,Urban,Y


In [9]:
loan_data['Property_Area'].unique()

array(['Urban', 'Rural', 'Semiurban'], dtype=object)

In [10]:
# Using Map
# ordinal_data_map = {"Urban":1, 'Semiurban':2, "Rural":3}
# loan_data["Property_Area"] = loan_data['Property_Area'].map(ordinal_data_map)
# loan_data.head()

In [11]:
from sklearn.preprocessing import OrdinalEncoder

In [12]:
ordinal_data = [['Urban', 'Semiurban', 'Rural']]
ordinal_encoder = OrdinalEncoder(categories=ordinal_data)
loan_data['Property_Area'] = ordinal_encoder.fit_transform(loan_data[['Property_Area']])

loan_data.head()


Unnamed: 0,Loan_ID,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
0,LP001002,Male,No,0,Graduate,No,5849,0.0,,360.0,1.0,0.0,Y
1,LP001003,Male,Yes,1,Graduate,No,4583,1508.0,128.0,360.0,1.0,2.0,N
2,LP001005,Male,Yes,0,Graduate,Yes,3000,0.0,66.0,360.0,1.0,0.0,Y
3,LP001006,Male,Yes,0,Not Graduate,No,2583,2358.0,120.0,360.0,1.0,0.0,Y
4,LP001008,Male,No,0,Graduate,No,6000,0.0,141.0,360.0,1.0,0.0,Y
