* **Both encoding techniques are outlined below**:
> A little background first: Categorical features are features that contain values that are not numeric. It would be absurd to work with non-numeric features if you ask neurons in your ANN to compute the weighted sum of inputs, and then pass through activation function, right? These maths are undefined. An obvious solution you may be intrigued to do is dropping the features! Aha! Wrong!! Every piece of data is precious... may present with valuable insights of the data samples to find the patterns to map inputs with output/targets. So, we should include them. But, how?

The answer is via "Encoding". 

Several types of encodings are used in practice. Here below are just 2 popular ones:
1. **Label Encoding**, where labels are encoded as subsequent numbers. Say, for a categorical feature named "Category" with three categorical values: {“Cat”, “Dog” or “Zebra”} can be encoded to "0", "1", "2" respectively as in figure below. The issue with this type of encoding may unintentionally impose a type of ordering of the categories, that may add bias to the training.


![label-encoding](figures/le2.png)

2. **One Hot Encoding**, ignores the ordering of the categories all together. With one-hot, we convert each categorical value into a new categorical column and assign a binary value of 1 or 0 to those columns. Each integer value is represented as a binary vector. All the values are zero, and the index is marked with a 1. Also, don't forget to remove the original categorical features. Here below just an example, how to convert the categorical feature called "Category" having the {“Cat”, “Dog” or “Zebra”} values into three new binary features: "Cat", "Dog", "Zebra".

![label-encoding](figures/ohe2.png)

**A note on the Dummy Variable Trap**
The Dummy Variable Trap occurs when two or more dummy variables created by one-hot encoding are highly correlated (i.e., becomes multi-collinear). This means that one variable can be predicted from the others, making it difficult to interpret predicted coefficient variables in regression models. In other words, the individual effect of the dummy variables on the prediction model can not be interpreted well because of multicollinearity.

Using the one-hot encoding method, a new dummy variable is created for each categorical variable to represent the presence (1) or absence (0) of the categorical variable. For example, if tree species is a categorical variable made up of the values pine, or oak, then tree species can be represented as a dummy variable by converting each variable to a one-hot vector. This means that a separate column is obtained for each category, where the first column represents if the tree is pine and the second column represents if the tree is oak. Each column will contain a 0 or 1 if the tree in question is of the column's species. These two columns are multi-collinear since if a tree is pine, then we know it's not oak and vice versa. The machine learning models trained on dataset having this multi-collinearity suffers. A remedy is to drop first (or any one) of the dummy (i.e., one-hot) features created.

In [5]:
import pandas as pd

## Let's create the dataset/animals.csv file

```txt
sample,category
1,Cat
2,Dog
3,Zebra
4,Dog
```

In [6]:
df = pd.read_csv('datasets/animals.csv')

In [7]:
df

Unnamed: 0,sample,category
0,1,Cat
1,2,Dog
2,3,Zebra
3,4,Dog


In [8]:
df_ohe = pd.get_dummies(df)

In [9]:
df_ohe

Unnamed: 0,sample,category_Cat,category_Dog,category_Zebra
0,1,True,False,False
1,2,False,True,False
2,3,False,False,True
3,4,False,True,False


In [10]:
df

Unnamed: 0,sample,category
0,1,Cat
1,2,Dog
2,3,Zebra
3,4,Dog


In [11]:
from sklearn.preprocessing import OneHotEncoder

enc = OneHotEncoder(handle_unknown='ignore', drop='first', sparse_output=False)
encoded_col = enc.fit_transform(df[['category']])


In [12]:
encoded_col

array([[0., 0.],
       [1., 0.],
       [0., 1.],
       [1., 0.]])

In [13]:
enc.categories_

[array(['Cat', 'Dog', 'Zebra'], dtype=object)]

In [14]:
enc_list = []

In [15]:
ohe_df_features = []

In [16]:
ohe_df_feature = pd.DataFrame(encoded_col, 
			 columns=list(enc.categories_[0][1:]), #dropped first column
			 ) 




In [17]:
ohe_df_features.append(ohe_df_feature)

In [18]:
# #concat all onehot features 
ohe_dataset = pd.concat(ohe_df_features,axis=1)

In [19]:
#match index with given data
ohe_dataset.index = df.index
#drop the categorical feature columns
df = df.drop(columns=['category'] )

#merge rest of the dataset with these new onehot features
df = df.join(ohe_dataset)