# Module 1: Data Analysis and Data Preprocessing

## Section 3: Encoding categorical variables

### Part 4: Ordinal Encoding

Ordinal Encoding is a data preprocessing technique used to convert categorical variables into numerical labels while preserving the ordinal relationship between categories. Ordinal Encoding is particularly useful when working with categorical variables that have a meaningful order.

### 4.1 Understanding Ordinal Encoding

To preserve the ordinal relationship between categories Ordinal Encoding assigns a unique integer value to each category, with the order determined based on the predefined ordinality. Ordinal Encoding is suitable for categorical variables with a clear ordering, such as "low," "medium," and "high."

Ordinal Encoding allows algorithms to consider the relative importance or rank of the categories. However, it is important to ensure that the predefined order is meaningful and appropriate for the specific variable.

### 4.2 Using Ordinal Encoding

To apply Ordinal Encoding, we need a dataset with categorical variables that exhibit an ordinal relationship. The encoding process involves mapping each category to a unique integer label while respecting the predefined order. The order can be defined manually or inferred from the observed data.

In [None]:
import numpy as np
from sklearn.preprocessing import OrdinalEncoder

# Define the ranking categories in the desired order
ranking_order = ['Very Bad', 'Bad', 'Neutral', 'Good', 'Very Good']

# Sample data with the ranking categories
rankings = ['Good', 'Bad', 'Very Good', 'Neutral', 'Very Bad']

# Create an OrdinalEncoder object with the specified categories order
encoder = OrdinalEncoder(categories=[ranking_order])

# Fit and transform the ranking data using the ordinal encoder
encoded_rankings = encoder.fit_transform(np.array(rankings).reshape(-1, 1))

# Convert the encoded_rankings to integers for better presentation
encoded_rankings = encoded_rankings.astype(int).flatten()

print("Original Ranking Data:")
print(rankings)
print("\nEncoded Ranking Data (using Ordinal Encoder):")
print(encoded_rankings)

In this example, the ordinal encoder has converted the ranking "Good" to 3, "Bad" to 1, "Very Good" to 4, "Neutral" to 2, and "Very Bad" to 0, according to the specified order. Now, you have a numerical representation of the ordinal ranking, which can be used as input for machine learning models that require numerical data.

### 4.3 One-Hot Encoding vs. Ordinal Encoding

It is important to note that Ordinal Encoding differs from One-Hot Encoding. While Ordinal Encoding assigns unique integer labels to categories, One-Hot Encoding creates binary indicator variables for each category. The choice between the two techniques depends on the nature of the categorical variable (ordinal / nominal) and the requirements of the learning algorithm.

### 4.4 Summary

Ordinal Encoding is a data preprocessing technique used to convert categorical variables into numerical labels while preserving the ordinal relationship between categories. It assigns unique integer values to categories based on their predefined order. Libraries like category_encoders provide convenient implementations of Ordinal Encoding in Python.