## One Hot Encoding in Machine Learning
Most machine learning algorithms cannot work with categorical data
We need to represent this kind of data with something the algorithms can consume and work with.

One hot encoding helps transform the categorical data into binary vectors.

First, the categorical values is mapped to integer values.

Then, each integer value is represented as a binary vector that is all zero values except the index of the integer, which is marked with a 1.

For example if we had this data
```[python]
[3, 4, 7, 1]
```
We can one hot encode it to

```[python]
[[0,0,0,1,0,0,0,0]
 [0,0,0,0,1,0,0,0]
 [0,0,0,0,0,0,0,1]
 [0,1,0,0,0.0,0,0]]
```

#### Encoding with Keras
The keras libraby has a function called `to_categorical` that is used to encode integer data.

Let's look at an example:

In [16]:
# import keras
from keras.utils import to_categorical
import pandas as pd
import numpy as np

# Create an array
arr = np.array(range(20))
# One-hot encode it
hot_encoded = to_categorical(arr)
print(hot_encoded)

[[1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0.

#### Encoding with Scikit-Learn
We can one hot encode our data in two ways:
* OneHotEncoder –– this binary encodes integer data
* LabelEncoder –– this encodes labels into integer values

Let's go ahead and try it out

In [9]:
# import the classes
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
import numpy as np

# create the data
data = ['cold', 'cold', 'warm', 'cold', 'hot', 'hot', 'warm', 'cold', 'warm', 'hot']
data_arr = np.array(data)

# encode using the label encoder
encoder = LabelEncoder()
labels_encoded = encoder.fit_transform(data_arr)
print(labels_encoded)


# reshape the array before encoding it
labels_encoded = labels_encoded.reshape(len(labels_encoded), 1)
print(labels_encoded, "\n")

# encode using the one hot encoder
onehot_encoder = OneHotEncoder(sparse=False)
onehot_encoded = onehot_encoder.fit_transform(labels_encoded)
print(onehot_encoded)

[0 0 2 0 1 1 2 0 2 1]
[[0]
 [0]
 [2]
 [0]
 [1]
 [1]
 [2]
 [0]
 [2]
 [1]] 

[[1. 0. 0.]
 [1. 0. 0.]
 [0. 0. 1.]
 [1. 0. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 0. 1.]
 [1. 0. 0.]
 [0. 0. 1.]
 [0. 1. 0.]]


In [72]:
# we can also do an inverse transform on the data that's onehot encoded
inverse_transformed = encoder.inverse_transform([np.argmax(onehot_encoded[2, :])])
print(inverse_transformed)

['warm']
