# Ordinal encoding and Label encoding

## Ordinal Encoding:
Used to transform categorical variables with ordered levels or labels into numeric representations. 

Each unique category  is assigned a unique integer value based on its order or rank. 

The assigned integers preserve the ordinal relationship between the categories. 
example, if we have categories like "Low," "Medium," and "High," they can be encoded as 0, 1, and 2, respectively. 

Ordinal encoding is commonly used when there is a natural order or hierarchy among the categories.

## Label Encoding:
Used to transform categorical variables into numeric representations without considering any specific order or hierarchy. 

In label encoding, each unique category or label is assigned a unique integer value, starting from 0. 

The assigned integers do not imply any inherent order or rank. 

For example, if we have categories like "Red," "Green," and "Blue," they can be encoded as 0, 1, and 2, respectively. 

Mostly used for target columns or output categorical column.

Label encoding is often used when there is no meaningful ordinal relationship among the categories.

In [1]:
import pandas as pd
import numpy as np

def read_csv_file(file_name):
    data = pd.read_csv(file_name)
    data = data.iloc[:,[2,3,4]]
    return data


def ordinal_encoder_and_label_encoder(data):
    from sklearn.model_selection import train_test_split
    X_train,X_test,Y_train,Y_test = train_test_split(data.iloc[:,2:4],
                                                     data.iloc[:,-1],
                                                     test_size=0.2,
                                                     random_state=2)
    
    #ordinal encoder:
    from sklearn.preprocessing import OrdinalEncoder
    
    data['education'].value_counts()
    cat1 = data['review'].value_counts().index.to_numpy()
    cat2 = data['education'].value_counts().index.to_numpy()
    
    #give categories in which order you want. eg: 0:poor, 1:average, 2:good.
    oe = OrdinalEncoder(categories=[['Poor', 'Average','Good'],['School', 'UG', 'PG']]) 

    #oe = OrdinalEncoder(categories=[cat1,cat2]) #randomly assign the value
    #oe  = OrdinalEncoder() #randomly assign the value

    # fit the ordinal encoder to the train set, it will learn the parameters
    oe.fit(X_train)

    # transform train and test sets
    X_train_transform = oe.transform(X_train)
    X_test_transform = oe.transform(X_test)

    print("Number of categoreies : ",oe.categories_)
    #print("X_train transform data set using ordinal encoder : ",X_train_transform)
    print("X_test transform data set using ordinal encoder : ",X_test_transform)


    #label encoder:
    from sklearn.preprocessing import LabelEncoder
    le = LabelEncoder()
    
    # fit the label encoder to the train label set, it will learn the parameters
    le.fit(Y_train)

    # transform train and test label sets
    Y_train_transform = le.transform(Y_train)
    Y_test_transform = le.transform(Y_test)

    print("Number of clsses in label set : ",le.classes_)
    #print("Y_train transform data set using label encoder ",Y_train_transform)
    print("Y_test transform data set using label encoder ",Y_test_transform)




data = pd.read_csv('customer.csv')
ordinal_encoder_and_label_encoder(data)

Number of categoreies :  [array(['Poor', 'Average', 'Good'], dtype=object), array(['School', 'UG', 'PG'], dtype=object)]
X_test transform data set using ordinal encoder :  [[2. 1.]
 [2. 2.]
 [0. 0.]
 [2. 1.]
 [1. 0.]
 [1. 0.]
 [1. 1.]
 [0. 2.]
 [0. 2.]
 [2. 0.]]
Number of clsses in label set :  ['No' 'Yes']
Y_test transform data set using label encoder  [1 1 0 1 0 0 0 0 0 0]
