One of the most common ways to impart artificial intelligence into a machine is through machine learning. The world of machine learning is broadly divided into supervised and unsupervised learning.

Supervised learning refers to the process of building a machine learning model that is based on labeled training data.

Unsupervised learning refers to the process of building a machine learning model without relying on labeled training data. Since there are no labels available, you need to extract insights based on just the data given to you. The tricky thing here is that we don't know exactly what the criteria of separation should be. Hence, an unsupervised learning algorithm needs to separate the given dataset into a number of groups in the best way possible.

# Classification
The process of classifaction is on such technique where we classify data into a given number of classes.

In Machine Learning, classification solves the problem of identifying the category to which a new data point belongs. We build the classification model based on the training dataset containing data points and the corresponding labels.

A good classification system makes it easy to find and retrieve data. This is used extensively in face recognition, spam identification, recommendation engines, and so on. The algorithms for data classification will come up with the right criteria to separate the given data into the given number of classes.

We need to provide a sufficiently large number of samples so that it can generalize those criteria. If there is an insufficient number of samples, then the algorithm will overfit to the training data. This means that it won't perform well on unknown data because it fine-tuned the model too much to fit into the patterns observed in training data. This is actually a very common problem that occurs in the world of machine learning.

## Preprocessing data
Machine learning algorithms expect data to be formatted in a certain way before they start the training process.

In [1]:
import numpy as np

In [2]:
from sklearn import preprocessing

Some sample data

In [3]:
input_data = np.array([[5.1, -2.9, 3.3],
                      [-1.2, 7.8, -6.1],
                      [3.9, 0.4, 2.1],
                      [7.3,-9.9,-4.5]])

Several different preprocessing techniques:
 - Binarization
 - Mean Removal
 - Scaling
 - Normalization

### Binarization
This process is used when we want to convert our numerical values into boolean values.

In [4]:
#Binarize data
data_binarized = preprocessing.Binarizer(threshold=2.1).transform(input_data)
print("\nBinarized data:\n", data_binarized)


Binarized data:
 [[1. 0. 1.]
 [0. 1. 0.]
 [1. 0. 0.]
 [1. 0. 0.]]


### Mean Removal
Removing the mean is a common preprocessing technique used in machine learning. It is usefull to remove the mean from our deature vector, so that each feature is centered on zero. We do this in order to remove bias from the features in our feature vector.

In [5]:
#Print means and standard deviation
print("\nBEFORE:")
print("Mean =", input_data.mean(axis=0))
print("Std deviation =", input_data.std(axis=0))


BEFORE:
Mean = [ 3.775 -1.15  -1.3  ]
Std deviation = [3.12039661 6.36651396 4.0620192 ]


In [6]:
#Remove mean
data_scaled = preprocessing.scale(input_data)
print("\nAFTER:")
print("Mean =", data_scaled.mean(axis=0))
print("Std deviation=", data_scaled.std(axis=0))


AFTER:
Mean = [1.11022302e-16 0.00000000e+00 2.77555756e-17]
Std deviation= [1. 1. 1.]


### Scaling
The value of each feature can vary between many random values. So it becomes important to scale those features that it is a level playing field for the machine learning algorithm to train on. We don't want any feature to be artificially large or small just beacuse of the nature of the measurements.

In [7]:
#Min Max scaling
data_scaler_minmax = preprocessing.MinMaxScaler(feature_range=(0,1))
data_scaled_minmax = data_scaler_minmax.fit_transform(input_data)
print("\nMin max scaled data:\n", data_scaled_minmax)


Min max scaled data:
 [[0.74117647 0.39548023 1.        ]
 [0.         1.         0.        ]
 [0.6        0.5819209  0.87234043]
 [1.         0.         0.17021277]]


### Normalization
We use the process of normalization to modify the values in the feature vector so that we can measure them on a common scale. Some of the most common forms of normalization aim to modify the values so that they sum up to 1. 
L1 Normalization, which refers to Least Absolute Deviations, works by making sure that the sum of absolute values is 1 in each row. 
L2 Normalization, which refers to Least Squares, works bu making sure that the sum of squares is 1.