## LabelBinarizer
* It assigns a unique value or number to each label in a categorical feature.
* Binarize labels in a one-vs-all fashion
* Several regression and binary classification algorithms are available in scikit-learn. A simple way to extend these algorithms to the multi-class classification case is to use the so-called one-vs-all scheme.
* At learning time, this simply consists in learning one regressor or binary classifier per class. In doing so, one needs to convert multi-class labels to binary labels (belong or does not belong to the class). LabelBinarizer makes this process easy with the transform method.
* At prediction time, one assigns the class for which the corresponding model gave the greatest confidence. LabelBinarizer makes this easy with the inverse_transform method.




In [1]:
from sklearn.preprocessing import LabelBinarizer
import numpy as np

In [2]:
lb = LabelBinarizer()

In [3]:
lb.fit([1, 2, 6, 4, 2])

LabelBinarizer(neg_label=0, pos_label=1, sparse_output=False)

In [4]:
lb.classes_

array([1, 2, 4, 6])

In [5]:
lb.fit_transform(['yes', 'no', 'no', 'yes'])

array([[1],
       [0],
       [0],
       [1]])

In [6]:
# Passing a 2D matrix for multilabel classification
lb.fit(np.array([[0, 1, 1], [1, 0, 0]]))
lb.classes_

array([0, 1, 2])

In [7]:
lb.transform([0, 1, 2, 1])

array([[1, 0, 0],
       [0, 1, 0],
       [0, 0, 1],
       [0, 1, 0]])

## MultiLabelBinarizer
Multilabelbinarizer allows us to encode multiple labels per instance. To translate the resulting array, you could build a DataFrame with this array and the encoded classes (through its "classes_" attribute).


In [8]:
from sklearn.preprocessing import MultiLabelBinarizer
import pandas as pd

In [9]:
mlb = MultiLabelBinarizer()

In [10]:
mlb.fit_transform([(1, 2), (3,11)])

array([[1, 1, 0, 0],
       [0, 0, 1, 1]])

In [11]:
mlb.classes_

array([ 1,  2,  3, 11])

In [12]:
mlb.fit_transform([{'sci-fi', 'thriller'}, {'comedy'}])

array([[0, 1, 1],
       [1, 0, 0]])

In [13]:
mlb.classes_

array(['comedy', 'sci-fi', 'thriller'], dtype=object)

In [14]:
df = pd.DataFrame({"genre": [["action", "drama","fantasy"], ["fantasy","action"], ["drama"], ["sci-fi", "drama"]]})
df

Unnamed: 0,genre
0,"[action, drama, fantasy]"
1,"[fantasy, action]"
2,[drama]
3,"[sci-fi, drama]"


In [15]:
df = pd.DataFrame(mlb.fit_transform(df['genre']),columns=mlb.classes_)

In [16]:
df

Unnamed: 0,action,drama,fantasy,sci-fi
0,1,1,1,0
1,1,0,1,0
2,0,1,0,0
3,0,1,0,1


## LabelEncoder
* Encode target labels with value between 0 and n_classes-1.
* This transformer should be used to encode target values, i.e. y, and not the input X.
* There are many ways to convert categorical values into numerical values. 
* Two common methods: One-Hot-Encoding and Label-Encoder. Both of these encoders are part of SciKit-learn library.
* These are used to convert text or categorical data into numerical data which the model expects and perform better with.

In [17]:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
le.fit([1, 2, 2, 6])

LabelEncoder()

In [18]:
le.classes_

array([1, 2, 6])

In [19]:
le.transform([1, 1, 2, 6])

array([0, 0, 1, 2])

In [20]:
le.inverse_transform([0, 0, 1, 2])

array([1, 1, 2, 6])

In [21]:
# For non-numerical labels
le.fit(["paris", "paris", "tokyo", "amsterdam"])

LabelEncoder()

In [22]:
list(le.classes_)

['amsterdam', 'paris', 'tokyo']

In [23]:
le.transform(["tokyo", "tokyo", "paris"])

array([2, 2, 1])

In [24]:
list(le.inverse_transform([2, 2, 1]))

['tokyo', 'tokyo', 'paris']