# Introduction

## Definition

There are multiple definitions of machine learning in different publications. But the general idea behind machine learning is that it is a technique in which the computer learns from the data and the performs a task without explicitly being programmed.

For formally the definition that describes machine learning best is this.

> A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.
>
> *Tom Mitchell, 1997*

The most basic example of machine learning is the of detecting whether an email is spam or not (ham). The system uses existing data from the already existing emails which the user has manually labeled as spam and uses it to train itself. Now when a new email arrives it checks whether the email is similarly to that of already detected spam mail using various feature like likely words used in title, so same email id etc. Then decides (predicts) whether the new email is spam or not.

## Uses of Machine learning

Why use machine learning in the first place? Under what situations does machine learning helps?

### Preventing Complex Rule.

Let's take the email spam detection example. Suppose we have to build an spam filter using traditional programing approach. Then the ideal candidate would be to write a series of conditions using `if  else` branch. For example, if title of email contains words like "free", "amazing", "4U" or else if email is from a list of already identified email address etc. But as the spammer keeps updating his spamming technique then we will have to add new rule to the list. An soon the rule become unmanageable and complex.

Using machine learning we can let the system learn the rules on itself by comparing the data of existing classified email and keep learning on new email which were wrongly classified as safe but were actually spam.

### No traditional Solution.

Some computing problem are too difficult to solved by traditional procedure/rules approach. Consider the problem of speech recognition. To develop speech recognition software we would have to check each audio with combination of multiple world from a certain language and also multiple accents. Which is not possible to code.

The best solution is to write a algorithm which learns on itself.

### Discovering insight from data.

Machine learning can also help human learn insight from the data. Onces a machine learning algorithm has learned rule/patterns on itself then a human can inspect what the system has learned and get more insight from the data.

Applying Machine learning to discover patterns from huge data is called *data mining*.

## Types of Machine learning

Machine learning algorithms are broadly classified into three different categories.

- Whether or not they are trained with human supervision (supervised, unsupervised, semisupervised, and Reinforcement Learning)
- Whether or not they can learn incrementally on the fly (online versus batch learning)
- Whether they work by simply comparing new data points to known data points, or instead detect patterns in the training data and build a predictive model, much like scientists do (instance-based versus model-based learning)

### Supervised and Unsupervised Learning

Machine learning algorithms can be classified on the type of supervision required during the training phase.

#### Supervised Learning

In Supervised Learning, the data is supplied with label for training the algorithm.

Example of Supervised Learning is classification (Categorizing data) and regression (Predicting value from a given feature).

#### Unsupervised Learning

In Unsupervised learning the training data is unlabeled. The algorithm discover patterns or learns from data on it self.

Examples includes, Clustering algorithm to find unknown groups in data, Visualization algorithm to give a 2D or 3D visualization of data or Association Learning who's goal is to find unknown rules/association form a given data.

#### Semisupervised Learning

Some algorithm are combination of Supervised learning and Unsupervised learning.

The best example of this is Google Photos. The algorithm behind photos takes multiple photos and recognizes same person in multiple photos (Unsupervised) and then asks you to name that person and the assigned name to each present and future photo of that person with that name (Supervised).

#### Reinforcement Learning

In Reinforcement learning there is agent which performs some action in the environment and then receives feedback. The feedback is either negative or positive. And based on that feedback the agent either continues doing the think it is doing or improves its action.

Example is a game placing agent. For a chess playing agent the algorithm will play multiple games with itself and learn from the moves which result in higher score. Thus improving the moves as training improves over time.

### Online Vs Batch Learning

Other category is how the system learning. Some algorithms keeps learning continuously as the data is feed into the system. Example would be stock market prediction algo. The data stream is continuous for the algorithm and the learning of the system never stops. Such system is called Online system.

For some system the learning is incremental. Example can be Google speech recognition algorithm. Onces the model is trained the model is updated on the Android phones. And the team at Google would record more audios from different language and accents and the train the model and then later update the model again from the Android devices.

### Instance Vs Model Based Learning

Other way is how the algorithm learns from the data. One method is Instance based. In this method the algorithm has access to all the data and whenever a new instance is to categorized or predicted it check the other instances in the data which are similarly to that of new instance and assign the new instance to the same group of data or predict the value of new instance based on value of similar instance. Example would be spam detection, if a new email is 95% similar to existing spam email then possibility the email is spam.

Another way is that system build a model (equation) from the data and then use that model to predict the value for new instance. Example would be linear regression which tries to fit a line through the data and whenever a new instance is to be predicted it check where on the line the new instance fits and then predicts the value accordingly.

## Challenges in Machine Learning

There are some challenges with applying machine learning to get the right desire outcome from the data. Some of them are given below.

### Insufficient Quantity of Training Data.

When there is less data to train data the model would not fit to the ideal production data and thus the outcome would not be that accurate. It is sample as taking a small sample space and predicting the parameters of the population.

### Nonrepresentative Training Data.

Training data much be a good representative of future unknown instance. But the algorithm see a new data which is nothing similar to what it has seen in the training phase then the outcome is uncertain.

### Poor quality data.

Missing or bad data in training can also affect the outcome of the algorithm on live data.

### Irrelevant Features.

Not choosing a proper or relevant feature to predict the outcome will surely degrade the performance of the system. For example, will detecting spam email taking consideration whether it is raining will give curtain result.

### Overfitting Training Data.

Ever data has some noise in it, even training data. So creating a model which predicts the exact value for training data set is not a good idea that it has selected parameters which are too closely related to training data. Such model will performance badly on production data as the noise level will vary in production data.

### Underfitting Training Data.

Underfitting occurs when the model is too simple to learn the underlying structure of the data. For example using linear regression on a quadratic data.

## Testing and Validation

Onces training of the machine learning model is done how can we be sure that the model/algorithm will work on new unseen data inputs. The only way to make sure of this is to split the training data into two sets. One set for training the model and the other to test the model. Usually the percentage is 80% as training data and 20% as test data.

But there is a problem with this method. The problem is that you measured the generalization error multiple times on the test set, and you adapted the model and hyperparameters to produce the best model for that set. This means that the model is unlikely to perform as well on new data.

A common solution to this problem is to have a second holdout set called the validation set. You train multiple models with using the training set, select the model that perform best on the validation set, and when we are happy with then model then run a single final test against the test set to get an estimate of the generalization error.

To avoid "wasting" too much training data in validation sets, a common technique is to use cross-validation: the training set is split into complementary subsets, and each model is trained against a different combination of these subsets and validated against the remaining parts.