# Label encoding

Classification involves many labels, as text or numbers, for example. `sklearn` expects numbers. If they are numbers initially, they can be used directly during the training phase. But labels may not be numbers.

Human-readable text labels are common, but, again `sklearn` expects numbers. This is where **label encoding** becomes involved, which converts _text_ labels to _numerical_ labels which makes the labeled data available for algorithms to operate on it.

### Import
Import `numpy` and `preprocessing`:

In [62]:
import numpy as np
from sklearn import preprocessing

### Labels
List human-readable labels:

In [63]:
input_labels = ['red', 'black', 'red', 'green', 'black', 'yellow', 'white']

### Encoder
Create `LabelEncoder()`:

In [64]:
encoder = preprocessing.LabelEncoder()

### Fit
`fit()` labels:

In [65]:
encoder.fit(input_labels)

LabelEncoder()

### Map
Print label mapping:

In [66]:
print("\nLabel mapping:")
for i, item in enumerate(encoder.classes_):
    print(item, '-->', i)


Label mapping:
black --> 0
green --> 1
red --> 2
white --> 3
yellow --> 4


### Encode
Encode _text_ labels to _numerical_ coded labels.

In [67]:
test_labels = ['green', 'red', 'black']
encoded_values = encoder.transform(test_labels)
print("\nLabels =", test_labels)
print("Encoded values =", list(encoded_values))


Labels = ['green', 'red', 'black']
Encoded values = [1, 2, 0]


### Decode
_Decode_ from _numerical_ coded labels to original _text_ labels:

In [68]:
encoded_values = [3, 0, 4, 1]
decoded_list = encoder.inverse_transform(encoded_values)
print("\nEncoded values =", encoded_values)
print("Decoded labels =", list(decoded_list))


Encoded values = [3, 0, 4, 1]
Decoded labels = ['white', 'black', 'yellow', 'green']
