# Ordinal(Features) and Label(output) Encoding

Steps:
1. `EDA` on Dataframe
2. `Extract` Feature/s column and Label/Target column
3. `Split` Test and Train Data
4. import an encoder and `fit` on `train data`
5. Transform `Train & Test` data and store it

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
df = pd.read_csv('customer.csv')
df.sample(5)

Unnamed: 0,age,gender,review,education,purchased
7,60,Female,Poor,School,Yes
34,86,Male,Average,School,No
24,16,Female,Average,PG,Yes
3,72,Female,Good,PG,No
31,22,Female,Poor,School,Yes


In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   age        50 non-null     int64 
 1   gender     50 non-null     object
 2   review     50 non-null     object
 3   education  50 non-null     object
 4   purchased  50 non-null     object
dtypes: int64(1), object(4)
memory usage: 2.1+ KB


# Consideration

> `Age` is Numarical column ❌<br>
> `Gender` is Nominal column ❌<br>
> `Review` is Ordinal Column ✅<br>
> `Education` is Ordinal Column ✅<br>
> `Purchased` is Label column (Binary features) ✅<br>


## Features and Label extract

In [9]:
X = df.iloc[:, 2:4]
y = df.iloc[:, -1]

## Split Train and Test Data

In [10]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

In [12]:
X.shape, X_train.shape

((50, 2), (40, 2))

# Apply Ordinal Encoding

In [13]:
from sklearn.preprocessing import OrdinalEncoder
encoder = OrdinalEncoder(categories=[['Poor', 'Average', 'Good'], ['School', 'UG', 'PG']])
encoder.fit(X_train)

## Transform X_test & X_train

In [14]:
X_train = encoder.transform(X_train)
X_test = encoder.transform(X_test)

In [15]:
X_train

array([[2., 1.],
       [2., 1.],
       [2., 1.],
       [0., 2.],
       [1., 2.],
       [1., 1.],
       [0., 0.],
       [2., 0.],
       [0., 2.],
       [1., 2.],
       [2., 1.],
       [1., 0.],
       [0., 2.],
       [0., 2.],
       [0., 2.],
       [0., 0.],
       [2., 2.],
       [2., 1.],
       [1., 0.],
       [2., 2.],
       [0., 2.],
       [1., 0.],
       [0., 0.],
       [2., 1.],
       [0., 2.],
       [0., 2.],
       [2., 0.],
       [0., 2.],
       [1., 0.],
       [2., 0.],
       [1., 1.],
       [1., 1.],
       [2., 0.],
       [0., 1.],
       [2., 2.],
       [1., 2.],
       [2., 2.],
       [0., 1.],
       [2., 0.],
       [0., 0.]])

# Apply Label encoder into Label

In [16]:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
le.fit(y_train)

## Label Transform

In [17]:
y_train = le.transform(y_train)
y_test = le.transform(y_test)

In [18]:
y_test

array([0, 0, 0, 0, 0, 1, 1, 0, 1, 1])