# Machine Learning Algorithms Mini-Course
by Jason Brownlee on August 12, 2019.[Here](https://machinelearningmastery.com/machine-learning-algorithms-mini-course/) in [Machine Learning Algorithms](https://machinelearningmastery.com/category/machine-learning-algorithms/)

Machine learning algorithms are a very large part of machine learning.

## Overview

### Algorithm Foundations
- Lesson 1: How To Talk About Data in Machine Learning
- Lesson 2: Principle That Underpins All Algorithms
- Lesson 3: Parametric and Nonparametric Algorithms
- Lesson 4: Bias, Variance and the Trade-off

### Linear Algorithms
- Lesson 5: Linear Regression
- Lesson 6: Logistic Regression
- Lesson 7: Linear Discriminant Analysis

### Nonlinear Algorithms
- Lesson 8: Classification and Regression Trees
- Lesson 9: Naive Bayes
- Lesson 10: k-Nearest Neighbors
- Lesson 11: Learning Vector Quantization
- Lesson 12: Support Vector Machines

### Ensemble Algorithms
- Lesson 13: Bagging and Random Forest
- Lesson 14: Boosting and AdaBoost


### Algorithm Foundations

#### Lesson 1: How To Talk About Data in Machine Learning
Data plays a big part in machine learning.

The statistical perspective of `machine learning frames data in the context of a hypothetical function (f) that the machine learning algorithm aims to learn`. Given some input variables (Input)  the function answer the question as to what is the predicted output variable (Output).

Output = f(Input)

The inputs and outputs can be referred to as __variables__ or __vectors__.

#### Lesson 2: The Principle That Underpins (sustenta) All Algorithms
Machine learning algorithms are `described as learning a target function (f) that best maps input variables (X) to an output variable (Y)`.

The most common type of machine learning is to learn the mapping $Y = f(X)$ to make predictions of Y for new X. This is called __predictive modeling__ or __predictive analytics__ and our goal is to make the most accurate predictions possible.

#### Lesson 3: Parametric and Nonparametric Algorithms
Assumptions can greatly simplify the learning process, but can also limit what can be learned. __Algorithms that simplify the function to a known form are called parametric machine learning algorithms__.

The algorithms involve two steps:

- Select a form for the function.
- Learn the coefficients for the function from the training data.

Examples of parametric machine learning algorithms
- Linear Regression.
- Logistic Regression.

__Algorithms that do not make strong assumptions about the form of the mapping function are called nonparametric machine learning algorithms__. By not making assumptions, they are free to learn any functional form from the training data.

`Non-parametric methods are often more flexible, achieve better accuracy but require a lot more data and training time`.

Examples of nonparametric algorithms 
- Support Vector Machines.
- Neural Networks.
- Decision Trees.

#### Lesson 4: Bias, Variance and the Trade-off
Machine learning algorithms can best be understood through the lens of the bias-variance trade-off (__equilibrio de sesgo-varianza__).

- Bias are the simplifying assumptions made by a model to make the target function easier to learn.
    - `Low bias`: Decision Trees.
    - `High-bias`: Linear Regression.

- Variance is the amount that the estimate of the target function will change if different training data was used. The target function is estimated from the training data by a machine learning algorithm, so we should expect the algorithm to __have some variance, not zero variance__.
    - `High-variance`: k-Nearest Neighbors
    - `Low variance`: Linear Discriminant Analysis

The goal of any predictive modeling machine learning algorithm is to achieve low bias and low variance.

The parameterization of machine learning algorithms is often a battle to balance out bias and variance.
- Increasing the bias will decrease the variance.
- Increasing the variance will decrease the bias.

### Linear Algorithms
#### Lesson 5: Linear Regression
__Predictive modeling__ is primarily concerned with minimizing the error of a model or making the most accurate predictions possible, at the expense of explainability. 

>The representation of linear regression is a `equation that describes a line that best fits the relationship between the input variables` (__x__) and the `output variables` (__y__), by `finding specific weightings for the input variables called coefficients` (__B__).

- $y = B0 + B1 * x$

Linear regression has been around for more than 200 years and has been extensively studied. Some __good rules of thumb when using this technique are to remove variables that are very similar (correlated) and to remove noise from your data__, if possible.

#### Lesson 6: Logistic Regression
Logistic regression is another technique borrowed by machine learning from the field of statistics. It is the go-to method for binary classification problems (problems with two class values).

Logistic regression is like linear regression in that the goal is to find the values for the coefficients that weight each input variable.

Unlike linear regression, the prediction for the output is transformed using a __non-linear function called the logistic function__.

The logistic function looks like a big S and will transform any value into the range 0 to 1. This is useful because we can apply a rule to the output of the logistic function to snap values to 0 and 1 (e.g. IF less than 0.5 then output 1) and predict a class value.

#### Lesson 7: Linear Discriminant Analysis
Logistic regression is a classification algorithm traditionally limited to only two-class classification problems. If you have `more than two classes then the Linear Discriminant Analysis algorithm is the preferred` linear classification technique.

The representation of LDA is pretty straight forward. It consists of statistical properties of your data, calculated for each class. For a single input variable this includes:

1. The `mean value` for each class.
2. The `variance` calculated across all classes.

Predictions are `made by calculating a discriminate value for each class` and making a prediction for the class with the `largest value`.

The technique __assumes that the data has a Gaussian distribution (bell curve)__, so it is a good idea to __remove outliers__ from your data before hand.

### Nonlinear Algorithms
#### Lesson 8: Classification and Regression Trees
The representation for the decision tree model is a binary tree. This is your binary tree from algorithms and data structures, nothing too fancy. Each node represents a single input variable (x) and a split point on that variable (assuming the variable is numeric).

The leaf nodes of the tree contain an output variable (y) which is used to make a prediction.  Predictions are made by walking the splits of the tree until arriving at a leaf node and output the class value at that leaf node.

Trees are fast to learn and very fast for making predictions. They are also often accurate for a broad range of problems and do not require any special preparation for your data.

#### Lesson 9: Naive Bayes
The model is comprised of two types of probabilities that can be calculated directly from your training data:

- The probability of each class.
- The conditional probability for each class given each x value.

Once calculated, the probability model can be used to make predictions for new data using Bayes Theorem.

When your data is real-valued it is common to __assume a Gaussian distribution (bell curve)__ so that you can easily estimate these probabilities.

__Naive Bayes is called naive because it assumes that each input variable is independent__. This is a strong assumption and unrealistic for real data, nevertheless, the technique is very effective on a large range of complex problems.

#### Lesson 10: k-Nearest Neighbors

#### Lesson 11: Learning Vector Quantization

#### Lesson 12: Support Vector Machines