# Introduction to Machine learning

**Machine learning** is about making computers modify or adapt their actions (whether these actions are making predictions, or controlling a robot) so that these actions get more accurate, where accuracy is measured by how well the chosen actions reflect the correct ones

__The computational complexity__ is important because we might want to use some of the methods on very large dataset.

The complexity is often broken into two parts: 
- the complexity of training
- the complexity of applying the trained algorithm.

## 1. Key Components

1. The data that we can learn from.
2. A model of how to transform the data.
3. An objective function that quantifies how well (or badly) the model is doing.
4. An algorithm to adjust the model’s parameters to optimize the objective function.

## 2. Type of Machine Learning

__Supervised learning__ a labeled training set is provided and, based on this training set, the algorithm generalises to respond correctly to all possible inputs.

__Unsupervised learning__ Correct responses are not provided, but instead the algorithm tries to identify similarities between the inputs so that inputs that have something in common are categorised together. The statistical approach to unsupervised learning is known as density estimation.

__Semi-Supervised learning__ uses both labeled and unlabeled data samples in the training process.

__Reinforcement learning__  The learning system, called an agent in this context, can observe the environment, select and perform actions, and get rewards in return (or penalties in the form of negative rewards). It must then learn by itself what is the best strategy, called a policy, to get the most reward over time. A policy defines what action the agent should choose when it is in a given situation.

__Evolutionary learning__ Biological evolution can be seen as a learning process: biological organisms adapt to improve their survival rates and chance of having offspring in their environment. This is modeled in a computer, using an idea of fitness, which corresponds to a score for how good the current solution is.

## 3. Supervised Learning

- There is a set of data (the training data) that consists of a set of input data that has target data, which is the answer that the algorithm should produce.

- This is usually written as a set of data $(x_{i},t_{i})$, where the inputs are $x_{i}$, the targets are $t_{i}$, and the $i$ index suggests that we have lots of pieces of data, indexed by $i$ running from 1 to some upper limit $N$.

- Supervised learning models define a mapping from input data to an output prediction

### 3.1 Regression and classification problems

#### __Regression__ 
__Regression__ is about fitting a mathematical function describing a curve, so that the curve passes as close as possible to all of the datapoints. It is generally a problem of function approximation or interpolation, working out the value between values that we know.

Predicts the price of a house based on input characteristics such as the square footage and the number of bedrooms. This is a regression problem because the model returns a continuous number (rather than a category assignment). 

In contrast, when the model takes the chemical structure of a molecule as an input and predicts both the melting and boiling points. This is a multivariate regression problem since it predicts more than one number.

#### __Classification__ 
The model receives a text string containing a restaurant review as input and predicts whether the review is positive or negative. This is a __binary classification__ problem because the model attempts to assign the input to one of two categories. The output vector contains the probabilities that the input belongs to each category.

In __the multiclass classification__ problems, the model assigns the input to one of N > 2 categories.

## 4. Testing Machine Learning Algorithms

We want the ML algorithm to __generalise__ to examples that were not seen in the training set. So we divide the dataset into training and test sets.

- In learning, we want to stop the learning process before the algorithm overfits, which means that we need to know how well it is generalising at each timestep.
- We need a third set of data to use for this purpose, which is called the validation set because we’re using it to validate the learning so far.
- This is known as cross-validation in statistics. It is part of model selection: choosing the right parameters for the model so that it generalises as well as possible.

- What does overfit mean?
- As the network continues to learn, it will eventually produce a much more complex model that has a lower training error (close to zero), meaning that it has memorised the training examples, including any noise component of them, so that is has overfitted the training data.

<img src="cv.png" width=500 height=500 />

## 5. Measures

- A method that is suitable for classification problems that is known as the confusion matrix.
- For regression problems things are more complicated because the results are continuous, and so the most common thing to use is the sum-of-squares error

#### - Metrics used to help us to interpret the performance of a classifier
  
- Accuracy is the proportion of all classifications that were correct, whether positive or negative

    - Accuracy = $\frac{TP + FP}  {TP + FP + TN + FN}$


- **Sensitivity** (also known as the true positive rate) is the ratio of the number of correct positive examples to the number classified as positive, while **specificity** is the same ratio for negative examples.

- **Sensitivity and specificity** are measures of a test's ability to correctly classify a person as having a disease or not having a disease. Sensitivity refers to a test's ability to designate an individual with disease as positive. A highly sensitive test means that there are few false negative results, and thus fewer cases of disease are missed. The specificity of a test is its ability to designate an individual who does not have a disease as negative.

    - Sensitivity = $\frac{TP}  {TP + FN}$

    - Specificity = $\frac{TN}  {TN + FP}$



- **Precision** is the ratio of correct positive examples to the number of actual positive examples, while **recall** is the ratio of the number of correct positive examples out of those that were classified as positive, which is the same as sensitivity
  
    - Precision = $\frac{TP} {TP + FP}$

    - Recall = $\frac{TP} {TP + FN}$



- __precision and recall__ are to some extent inversely related, in that if the number of false positives increases, then the number of false negatives often decreases, and vice versa. They can be combined to give a single measure, the F1 measure, which can be written in terms of precision and recall as

    - $F_{1} = \frac{TP}{TP + (FN + FP) / 2}$
  

- Unbalanced Datasets
    - We can use area under the curve (AUC) or Matthew’s Correlation Coefficient