# Chapter 5    


## Introduction
Deep learning requires comprehensive knowledge of the building blocks of traditional machine learning.  That includes understanding the difference between supervised and unsupervised learning, which Hyperparameters to tune, and whether to use frequentist estimators or Bayesian inference.   

## 5.1 Learning Algorithms     
> “A computer
program is said to learn from experience E with respect to some class of tasks T
and performance measure P , if its performance at tasks in T , as measured by P ,
improves with experience E.”
> - Mitchell (1997)

### 5.1.1 The Task, $T$
Machine learning tasks are usually described in terms of how the machine learning system should process an __example__.  An example is a collection of **features** that have been quantitatively measure from some object or event.  An example is a typically a vector __x__ in $R^{n}$.  Using this vector, many problems can be solved by machine learning:
- Classification: given $x_{i}$, label y
- Classification with missing values: given a partial $x_{i}$, label y
- Regression: given $x_{i}$, output a continuous value of y
- Transcription: given $x_{i}$, write out a text version of y
- Machine Translation: take a $x_{i}$ (typically the origination language) and translate it into $y_{i}$ (the destination language)
- Structured Output: take a collection, say $x_{i}$ and map it back to a smaller group $y_{j}$.  Think mapping a collection of words back to noun, adjective, or verbs.  
- Anomaly Detection: think misuse of credit cards
- Synthesis and sampling: think GAN generation of images or medical claims data
- Imputation of Missing Values: given a set $x_{i}$ but with j entries missing, the algorithm must predict what those values are
- Denoising: an algorithm is provided a messy $x_{i}$ and it maps it back to a clean $x_{i}$.  
- Density Estimation (probability mass function estimation): the algorithm is askes to learn a function that takes a vector __x__ and maps it to a probability density function if continuous and a probability mass function if discrete.  

### 5.1.2 The Performance Measue, $P$    
For tasks like classification, classification with missing inputs, and transcription, we usually use __accuracy__ to determine a model's performance.  Accuracy is the proportion of examples for which the model produces the correct output.  Additionally, the **error rate**, 1 - Accuracy, is helpful in determining model performance.  For density estimation, these two won't suffice.  The performance metric best used here is determined by reporting the average log-probability the model assigns to examples.  

To account for the fact that models will be used in a production setting with data the model hasn't seen before, we split our data into training and test sets.  Then, we calculate the performance metric on that test split.  

### 5.1.3 The Experience, $E$

Machine learning algorithms can be broadly categorized as __unsupervised__ or **supervised** by what kind of experience they are allowed to have during the learning process.  Most tasks can and will experience the entire dataset when training.  On top of that, models from the realm of reinforcement learning will see data in addition to the whole dataset as they're meant to behave independently.  The factors to train on $x_{i}$ is called a __design matrix__.  Within supervised learning, the design matrix will be trained against a vector of labels, __y__.  

### 5.1.4 Linear Regression: an Example

In [1]:
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt