## Binarizer

Binarize data (set feature values to 0 or 1) according to a threshold Values greater than the threshold map to 1, while values less than or equal to the threshold map to 0. With the default threshold of 0, only positive values map to 1.

**`Syntax : sklearn.preprocessing.Binarizer(threshold=0.0, copy=True)`**

#### Two Dimensional Array

In [1]:
import numpy as np
import pandas as pd

In [6]:
x = np.array([[-2,1.3],[1,2.3],[-1,0]])
x

array([[-2. ,  1.3],
       [ 1. ,  2.3],
       [-1. ,  0. ]])

In [7]:
from sklearn import preprocessing

In [10]:
# Initialize the object
binarize = preprocessing.Binarizer(threshold=0.0) 
# Anything less than or equal to zero replace with 0
# Anything above zero replace with 1

In [11]:
# fit and transform
binarize.fit_transform(x)

array([[0., 1.],
       [1., 1.],
       [0., 0.]])

#### Data Frame

In [38]:
data = pd.read_csv("Data.csv")
data = data.dropna()
data

Unnamed: 0,Country,Age,Salary,Purchased
0,France,44.0,72000.0,No
1,Spain,27.0,48000.0,Yes
2,Germany,30.0,54000.0,No
3,Spain,38.0,61000.0,No
5,France,35.0,58000.0,Yes
7,France,48.0,79000.0,Yes
8,Germany,50.0,83000.0,No
9,France,37.0,67000.0,Yes


In [39]:
# Extract only Age column
Age = data['Age']
Age

0    44.0
1    27.0
2    30.0
3    38.0
5    35.0
7    48.0
8    50.0
9    37.0
Name: Age, dtype: float64

In [40]:
# Calculate average age
Age.mean()

38.625

In [41]:
# Binarize Age Column
# value less than or equal to 38 as 0
# value greater than 38 as 1

# Initialize the obejct with threshold 38
binarize = preprocessing.Binarizer(threshold=38)

In [42]:
# fit and transform
binarize.fit_transform(Age)

ValueError: Expected 2D array, got 1D array instead:
array=[44. 27. 30. 38. 35. 48. 50. 37.].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

In [43]:
Age = pd.DataFrame(Age)
Age

Unnamed: 0,Age
0,44.0
1,27.0
2,30.0
3,38.0
5,35.0
7,48.0
8,50.0
9,37.0


In [44]:
# Fit and Transform
binarize.fit_transform(Age)

array([[1.],
       [0.],
       [0.],
       [0.],
       [0.],
       [1.],
       [1.],
       [0.]])