# Experiment 8: Encoding Techinques
To implement the following Encoding methods:
- Ordinal Encoding
- One Hot Encoding
- Dummy variable Encoding


## 1. OrdinalEncoding

The features are converted to ordinal integers. This results in
a single column of integers (0 to n_categories - 1) per feature.

In [1]:
import numpy as np
from sklearn.preprocessing import OrdinalEncoder
# define data
data = np.array([['red'], ['green'], ['blue']])
print("data: \n", data)
# define ordinal encoding
encoder = OrdinalEncoder()
# transform data
result = encoder.fit_transform(data)
print("\nResult: \n", result)


data: 
 [['red']
 ['green']
 ['blue']]

Result: 
 [[2.]
 [1.]
 [0.]]


## 2. One Hot Encoder

The features are encoded using a one-hot (aka 'one-of-K' or 'dummy') encoding scheme.              
This creates a binary column for each category and returns a sparse matrix or dense array.

In [2]:
# example of a one hot encoding
from sklearn.preprocessing import OneHotEncoder
# define data
data = np.array([['red'], ['green'], ['blue'],['red']])
print(data)
# define one hot encoding
# sparse- Will return sparse matrix if set True else will return an array.
encoder = OneHotEncoder(sparse=False)
# transform data
onehot = encoder.fit_transform(data)
print(onehot)
print(type(onehot))

[['red']
 ['green']
 ['blue']
 ['red']]
[[0. 0. 1.]
 [0. 1. 0.]
 [1. 0. 0.]
 [0. 0. 1.]]
<class 'numpy.ndarray'>




## 3. Dummy Variable Encoding

Dummy encoding also uses dummy (binary) variables. Instead of creating a number of dummy variables that is equal to the number of categories (k) in the variable, dummy encoding uses k-1 dummy variables. To encode the same Color variable with three categories using the dummy encoding, we need to use only two dummy variables.

In dummy encoding,

- "Red” color is encoded as [1 0] vector of size 2.
- “Green” color is encoded as [0 1] vector of size 2.
- “Blue” color is encoded as [0 0] vector of size 2.

In [4]:
# example of a dummy variable encoding
'''
OneHotEncoder : parameters
-> sparse- Will return sparse matrix if set True else will return an array.
-> drop : {'first', 'if_binary'} or an array-like of shape (n_features,), default=None
    Specifies a methodology to use to drop one of the categories per
feature. This is useful in situations where perfectly collinear
features cause problems, such as when feeding the resulting data
into an unregularized linear regression model.
'''
from sklearn.preprocessing import OneHotEncoder
# define data
data = np.array([['red'], ['green'], ['blue']])
print(data)
# define one hot encoding
encoder = OneHotEncoder(drop='first', sparse=False) # transform data
onehot = encoder.fit_transform(data)
print(onehot)

[['red']
 ['green']
 ['blue']]
[[0. 1.]
 [1. 0.]
 [0. 0.]]


