# Label Encoder
* This transforms the categorical labels with value between o and number of unique labels or classes present
* Label encoders can be used to normalize numerical labels and non numerical labels both

https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html

### Pip installing scikit-learn

In [1]:
! pip install scikit-learn

You should consider upgrading via the 'pip install --upgrade pip' command.[0m


### Importing modules

In [1]:
from sklearn import preprocessing

import pandas as pd
import numpy as np

## Numerical labels

### Creating the label encoder object

In [2]:
num_encoder = preprocessing.LabelEncoder()

### Fitting the label encoder

In [3]:
num_encoder.fit([50, 20, 60, 60])

LabelEncoder()

#### The classes_  property
Returns every unique label present 

In [4]:
num_encoder.classes_

array([20, 50, 60])

#### The transform() menthod
It transforms labels to encoded labels

In [5]:
num_encoder.transform([50, 20, 60, 60])

array([1, 0, 2, 2])

#### The inverse_transform() function
It transforms encoded labels to its original form

In [6]:
num_encoder.inverse_transform([1, 0, 2, 2])

array([50, 20, 60, 60])

In [7]:
num_encoder.inverse_transform([1, 0, 2, 2, 1, 0])

array([50, 20, 60, 60, 50, 20])

## Non numerical labels
Encoding non numerical labels

#### Creating a object of the label encoder

In [8]:
string_encoder = preprocessing.LabelEncoder()

#### Fitting the values to the label encoder object

In [9]:
string_encoder.fit(["Cloudy", "Sunny", "Windy", "Cloudy", "Rainy"])

LabelEncoder()

In [10]:
string_encoder.classes_

array(['Cloudy', 'Rainy', 'Sunny', 'Windy'], dtype='<U6')

In [11]:
string_encoder.transform(["Cloudy", "Rainy", "Sunny", "Windy"])

array([0, 1, 2, 3])

#### Insverse transform of the labels

In [12]:
list(string_encoder.inverse_transform([3, 0, 2, 1, 3]))

['Windy', 'Cloudy', 'Sunny', 'Rainy', 'Windy']

## One Hot encoder

## 1) OneHotEncoding Using `sklearn.preprocessing.OneHotEncoder`

#### `sklearn.preprocessing.OneHotEncoder` Encode categorical integer features as a one-hot numeric array

https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html

In [36]:
from sklearn.preprocessing import OneHotEncoder

import pandas as pd
import numpy as np

### Creating One hot encoder object

In [2]:
enc = OneHotEncoder()

In [9]:
majors = [['Engineering'], 
          ['Math'], 
          ['Chemistry']]

In [10]:
enc.fit(majors)

OneHotEncoder(categorical_features=None, categories=None, drop=None,
              dtype=<class 'numpy.float64'>, handle_unknown='error',
              n_values=None, sparse=True)

In [11]:
enc.transform(majors).toarray()

array([[0., 1., 0.],
       [0., 0., 1.],
       [1., 0., 0.]])

In [12]:
enc.categories_

[array(['Chemistry', 'Engineering', 'Math'], dtype=object)]

In [42]:
new_majors = [['Media Studies'], 
              ['Math'],
              ['Stats']]

In [43]:
enc.transform(new_majors).toarray()

ValueError: Found unknown categories ['Stats', 'Media Studies'] in column 0 during transform

In [44]:
enc_unk = OneHotEncoder(handle_unknown='ignore')

enc_unk.fit(majors)

OneHotEncoder(categorical_features=None, categories=None, drop=None,
              dtype=<class 'numpy.float64'>, handle_unknown='ignore',
              n_values=None, sparse=True)

In [45]:
enc_unk.transform(new_majors).toarray()

array([[0., 0., 0.],
       [0., 0., 1.],
       [0., 0., 0.]])

# LabelBinarizer 
It is a utility class to help create a label indicator matrix from a list of multi-class labels

https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelBinarizer.html

In [6]:
from sklearn.preprocessing import LabelBinarizer

import numpy as np
import pandas as pd

### Creating a label Binarizer object

In [7]:
num_binarizer = LabelBinarizer()

* neg_label : int (default: 0)
Value, with which negative labels must be encoded.
* pos_label : int (default: 1)
Value, with which positive labels must be encoded.
* sparse_output : boolean (default: False)
True if the returned array from transform is desired to be in sparse CSR format.

### Fitting the labels

In [8]:
num_binarizer.fit([2, 5, 6, 4, 5])

LabelBinarizer(neg_label=0, pos_label=1, sparse_output=False)

### Label classes

In [9]:
num_binarizer.classes_

array([2, 4, 5, 6])

### Tranforming the labels to binary form
* Each of the rows of the matrix represents each of the label classes (in alphanumeric order)
* The first row is indicating the label `2`, the second row is indicating the label `5` and so on
* Here `5` is repeating which is in 1 and 4 index and in the binarize matrix also the indication is same at roew of index 1 and 4

In [10]:
num_binarizer.transform([2, 5, 6, 4, 5])

array([[1, 0, 0, 0],
       [0, 0, 1, 0],
       [0, 0, 0, 1],
       [0, 1, 0, 0],
       [0, 0, 1, 0]])

### Here labels `2` and `6` are indicated with the same rows of number one and three as above

In [11]:
num_binarizer.transform([2, 6])

array([[1, 0, 0, 0],
       [0, 0, 0, 1]])

# MultiLabelBinarizer 
It is a utility class to help create a label indicator matrix from a list of multi-label labels

https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MultiLabelBinarizer.html#sklearn.preprocessing.MultiLabelBinarizer

In [1]:
from sklearn.preprocessing import MultiLabelBinarizer

import numpy as np
import pandas as pd

### Creating a label Binarizer object

In [2]:
multilabel_binarizer = MultiLabelBinarizer()

* neg_label : int (default: 0)
Value, with which negative labels must be encoded.
* pos_label : int (default: 1)
Value, with which positive labels must be encoded.
* sparse_output : boolean (default: False)
True if the returned array from transform is desired to be in sparse CSR format.

### Fitting the labels

In [7]:
courses = [
    ('Math', 'English'),
    ('Math', 'Science'),
    ('Geography', 'History'),
    ('Statistics', )
]

In [8]:
multilabel_binarizer.fit(courses)

MultiLabelBinarizer(classes=None, sparse_output=False)

### Label classes

In [9]:
multilabel_binarizer.classes_

array(['English', 'Geography', 'History', 'Math', 'Science', 'Statistics'],
      dtype=object)

### Tranforming the labels to binary form

In [6]:
multilabel_binarizer.transform(courses)

array([[1, 0, 0, 1, 0, 0],
       [0, 0, 0, 1, 1, 0],
       [0, 1, 1, 0, 0, 0],
       [0, 0, 0, 0, 0, 1]])

In [10]:
new_courses = [
    ('Math', 'Statistics'),
    ('Geography', 'History', 'Math')
]

In [11]:
multilabel_binarizer.transform(new_courses)

array([[0, 0, 0, 1, 0, 1],
       [0, 1, 1, 1, 0, 0]])