***
# Machine Learning: An Overview

Author: Olatomiwa Bifarin. <br>
PhD Candidate Biochemistry and Molecular Biology <br>
@ The University of Georgia

_This is a draft copy, a work in progress_

## Notebook Content

1.  [Definition](#1) <br>
2.  [ML Problems](#2) <br>
3.  [ML Concepts](#3) <br>
4.  [Applications](#4) <br>

## 1. Definition
<a id="1"></a>

I have read many definitions for machine learning, however none is as expantiative (and pitty) as Tom Michell's namely: <br>

`"A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E."`

An example: <br> 
Task `T`: Classify a subject as having Parkinson's disease or not <br>
Experience `E`: Voice measurements data and corresponding labels. <br>
Performance `P`: Accuracy of classification. 

Mathematically speaking, the goal is to figure out an approximating function $f$

$$f:X\rightarrow Y$$
Where $X$ are the features and labels and $Y$ the predictions.

## 2. ML Problems
<a id="2"></a>

Now, as you might have guessed, there are different kind of task `T`, and the different kind of task `T` would be solved differently. <br>
Take the Parkinson's disease above. This is what is called a <mark>classification</mark> problem, and it is expressed here in it's simplest form: binary classification. Let's say a few things about this (and similar) problems before I proceed to some other machine learning problems.  
$X$ and $Y$ are defined as follows: 

$$X: [(x_{1},y_{1}),(x_{2},y_{2})...(x_{n},y_{n})]$$
$$ Y = (y_{1},...,y_{n}) \in {0,1} $$

Once a prediction is made, we want to capture if we are doing well. We do this via something called the `loss function` or sometimes `cost function`. For a binary classification problem, it's defined as such: 

$$ c(y, f(x)) = 
\begin{cases}
0 & \text{ if } y=f(x) \\
1 & \text{ if } y \neq f(x)
\end{cases}$$

This kind of classification is called `supervised classification`: supervised because we have the labels. It doesn't take much to figure out that the other kind could be, and indeed is called `unsupervised classification`. And in this case we are no labels. This is a <mark>clustering</mark> problem. As the name suggests we use algorithms that clusters samples based on a similarity heuristics. <br> 

In the above examples, what we are attempting to predict is a qualitative variable. Now, check this out: Given the quantified metabolites in a cancer patient's urine, can we predict the size of tumor. The target variable here is a quantitative variable. This is called a <mark>regression</mark> problem - as opposed to a qualitative variable, we sort to predict a quantitative variable. However classification and regression algorithms are very much connected, for example many classification algorithms predicts classes by first estimating probabilities of classes, in this way they behave like a regression method. Also, in logistic regression for example a regression algorithm actually predicts a quantitative variable. 

Another popular kind of a machine learning problem is called <mark>ranking</mark>. And here is an example: I type the following words into a search engine like google _what are ranking machine learning problems?_ To solve the problem effectively, the algorithm will have to give me a ranked output of what I might like. This turns out to be an interesting machine learning problem. Other kinds of ML problems incude <mark>reinforcement learning</mark>, <mark>representation learning</mark>, <mark>collaborative filtering</mark>, <mark>anomaly detection</mark> e.t.c

## 3. ML Concepts
<a id="3"></a>

### Bias and Variance

Of what use is a car that cannot move. Of what use is a machine algorithms that cannot learn (i.e. generalize). And a machine learning generalize well, when it does not `underfit` or `overfit`

https://towardsdatascience.com/what-are-overfitting-and-underfitting-in-machine-learning-a96b30864690

### Inductive Bias

In philosophy, inductive reasoning is when you come to a conclusion, using a premise that does `NOT` ascertain the conclusion (as opposed to what you get in deductive reasoning.) <br> 

An example: 

<center>All swans I have seen are black</center>
<center>Therefore, all swans are black</center>

Now, recall that this is all we do in machine learning: train data with a machine learning algorithm (_premise_), and argue that such machine learning algorithm generalize to the test data (_conclusion_). Aha! And the different kind of ways machine learning algoritm does this is what is called the <mark>induction bias</mark>, the assumptions they hold.<br> 

Here are examples of a few: 

| ML Algorithm| Inductive Bias |
| --- | --- |
| ___Decision Trees___ | Shorter trees are desired, weight are given to fewer features |
| ___k-Nearest Neighbors___ | 1)Closer samples (in the euclidean space, defined by k-NN) are more likely to be same, 2) All samples are equally important |
| ___Support Vector Machines___ | Classes are separable by a hyperplane with some margin |
| ___Linear Regression___ | A linear relationship between the features and the response variable |
| ___Naive Bayes___ | Inputs are independent of each other |
| ___Perceptron___ |  |
| ___Neural Networks___ |  Number of hidden units?|

https://en.wikipedia.org/wiki/Inductive_bias

### Optimization and Regularization?

## 4. ML Applications
<a id="4"></a>

## References and Resources

- Tom Michell's Machine Learning Pg?
- Wikipedia, Induction bias
- http://www.lauradhamilton.com/inductive-biases-various-machine-learning-algorithms
- Introduction to Statistical Learning (Chapter)