# Machine Learning Basics
-------------
## What is machine learning?

Machine learning is a specific approach to artificial intelligence, that algorithmically applies statistical methods that are iterated over data, to learn specific tasks without being explicitly programmed to do so. In other words, a machine learning algorithm, is a *strategy* that gives a machine the ability to learn from data. 

The type of algorithm used depends the task that is to be completed.

---------------
### Tasks:
A task is the end-goal, or objective, that we want our machine to carry out. **It is not the learning process itself.** Some general tasks that a machine algorithm can be trained to carry out are:

* **Classification**
    * These types of tasks use algorithms that find relationships among input data and catagorize it. Image recognition, spam filters, and medical diagnosis are common applications.
* **Regression**
    * Some tasks require a numerical value output as a prediction, based on a set of data. There are several regression methods available for this, from Kernal regression, Gaussian process regression, to simple linear regression. Regression methods are used in finance, biology, engineering and many other fields. 
* **Clustering**
    * Clustering methods search for natural groupings within a dataset that may or maynot be readily observable otherwise. Sometimes clustering is used to learn about the dataset itself and separate out features in unlabeled data.
* **Transcription**
    * Sometimes it is necessary to turn an input into something that is more useful to machines or humans. Handwriting recognition, language translation are all examples of this. Deep learning applications can be used to accomplish these tasks.  
* **Anomaly detection**
    * Very common in manufacturing and fruad detection. Machine learning methods can be utilized to detect outliers, or abnormal patterns within a data stream. 
    
The tasks machine learning can accomplish are far more than can be listed here, and is ever growing with the continual rise in computational power and more sophisticated analysis methods.

** Below, is a general diagram to help select the appropriate algorithm for a specific task. **

<img src ="http://scikit-learn.org/stable/_static/ml_map.png" alt = "scikit learn cheat sheet" />

--------------------

### Training:
Once a machine learning algorithm has been selected, it must to be ***trained*** to the type of data it will encounter. 

This is where the *learning* happens. There are two main times of learning processes.

#### Supervised
Supervised learning involves inputing data with known outomes (data with labels). Durring the training, the algorithm produces and constantly adjusts a function to make predictions on future data with unknown outomes. The attribute of a data set that is to be predicted is known as the **target**. 

Simply, a supervised algorithm first looks an a dataset "with all the answers" and uses it to build a model to predict answers for new incomming data.

#### Unsupervised
Conversely, an unsupervised algorithm trains from data that is unlabeled or unclassified. It learns from the structure of the dataset and draws inferences. Unsupervised algorithms are commonly used to cluster data within a set to find similarites that were otherwise hidden, and to find anomalies withing a dataset.

----------------------

### Performance:
%%latex

A trained machine learning model must be evaluated based on quantitative performance measures. Engineers and data scientists are often interested *how* the model is correct or incorrect in it's predictions. 

#### Classifier models
In the case of classifier models, tt is sometimes usefull to know if a model has predicted true, when it was actually false, predicted false when it was actually true. To see this all at once a **confusion matrix** can be produced.

|                   | Predicted Value = no          | Predicted value = yes        |
| ----------------- | :---------------------------: | :--------------------------: | 
|**Actual Value = no**  |  true negative(TN)                | false positive (FP)|
|**Actual Value = Yes** | false negative (FN)| true positive (TP)               |

The false positive, and false negative values are often called **Type I error** and **Type II error**, respectively.

From this matrix we can compute several performace measures for our model.

* **Accuracy: ** How often was our model correct?
    * $\frac{TP+TN}{n}$       (where n = # of samples tested)  
* **Error Rate: ** How often was our model wrong?
    * $\frac{FP + FN}{n}$
* **Recall: ** How often does our model correctly predict yes?
    * $\frac{TP}{TP+FN}$ (also called sensitivity)
* **Specificity: ** How often does our model correctly predict no?
    * $\frac{TN}{TN+FP}$
* **Precision: ** When our model predicts yes, how often is it correct?
    * $\frac{TP}{FP+TP}$
----------------------------------
Another common performance measure for classification algorithms is called the **F score**, also known as the **F1 score**.
It is the harmonic average of the precision and recall values calculated from the confusion matrix. The F score is a value between 0 and 1, where 0 is the worst, and 1 is perfect precision and recall.

$F_1 = 2\big( \frac{(p) (r)}{p\  +\  r}\big)$

where p = precision, r = recall

#### Regression models
For regression models, where the task requires a numerical value, we often validate our model based on a *goodness of fit*. A clastic statistical measure is the ** coefficient of determination** or **$R^2$**. While, the actual definition of this value differs between regression models, generally, this is a measurement of the variance of the dependent variable that can be described by the independent variable. 

In most regression models, we are interested in the **residuals**. A residual ($e_i$), is the difference betweeen the predicted value, and the actual value in a dataset. 

$e_i = y - \hat{y}$ 

where y = is the observed value and $\hat{y}$ = the predicted value

Various modifications of the residuals like the mean absolute error (MAE), mean squared error (MSE), and root mean squared error (RMSE), can tell us other things about the model, such as its sensitivity to outliers.

--------------------------------

### Regression  Model Example

Let's build a model and make some predictions!!

Let's say that our school wants us to figure out how it can improve its graduation rate. The school has provided a robust dataset with information from over 700 schools in the US, including their graduation rates. 

For this example, we'll jump over to the [college_data_stats](college_data_stats.ipynb) notebook.

------------------

## Model Capacity

As you can see form the results of our linear model, we were not very succesful in predicting graduation rates. In the realm of machine learning, we would say our model is **underfitting** the dataset. This is bad, since we want accurate predictions when new data is recieved. However, we can also have the opposite happen, where our model would **overfit** the data. 

<img src ="fit.png" alt = "overfit and underfit example" />

Whether or not a model will overfit or underfit a dataset relates to a models **capacity**. The capcity of a model is its ability to descibe a data set with the functions available to it.

The range of functions that a model has available to it, is called the ** hypothesis space**. For example, in our linear regression model, the hypothesis space for our model was the set of all linear functions. 

### Bias and Variance Tradeoff

<img src ="biasVar.png" alt = "bias variance optimization" />

----------------


Conclude with no free lunch theorem