# Structuring Machine Learning Projects

## Introduction to ML strategy

### Orthogonalization

- fit training set well on cost function (bigger network, better optimization algorithm)
- then, fit dev set well on cost function (regularization, bigger training set)
- then, fit test set well on cost function (bigger dev set)
- then, perform well in real world (change dev set or cost function)

## Setting up your goal

### Sinle number evaluation metric

- precision: of examples recognized as cat, what % actually are cats?
- recall: what % of actual cats are correctly recognized
- F1 score: "average" of precision and recall
    - $\dfrac{2}{\dfrac{1}{P}+\dfrac{1}{R}}$

### Satisfying and optimizaing metric

- ex. maximize accuracy subject to running_time $\le 100ms$
- $N$ metrics: $1$ optimizing, $N-1$ satisfying

### Train/dev/test distributions

- choose a dev set and test set to reflect data you expect to get in the future and consider important to do well on
- dev and test set must come from the same distribution

### Size of dev/test sets

- set your test set to be big enough to give high confidence in the overall performance of your system

### When to change dev/test sets and metrics

- if doing well on your metric and dev/test set does not correcpond to doing well on your application, change your metric and/or dev/test set

## Comparing to human-level performance

### Why human-level performance

- while ML is worse than human, you can
    - get labeled data from human
    - gain insight from manual error analysis (why did a person get this right?)
    - better analysis of bias/variance
    
### Avoidable bias

- human error as a proxy for bays error
- gap between human and training error: avoidable bias
- gap between training and dev error: variance

### Two fundamental assumptions of supervised learning

- you can fit the training set pretty well ~ avoidable bias
- training set performance generalizes pretty well to dev/test set ~ variance
- avoidable bias
    - traing bigger model
    - train longer / use better optimization algorithms
    - NN architecture / hyperparameters search
- dev error
    - more data
    - regularization
    - NN architecture / hyperparameters search

## Error analysis

### Carrying out error analysis

- look at dev examples to evaluate ideas
- ex. cat detection
    - dog being recognized as cats
    - big cats (lions, panthers, etc) being recognized as cats
    - blurry images
    
### Cleaning up incorrectly labeled data

- DL algorithms are quite robust to random errors (not systematic) errors in training set
- consider errors due to incorrect labels vs. errors due to other causes
- apply same process to your dev and test sets to make sure they continue to come from the same distributions
- consider examining examples your algorithms got right as well as ones it got wrong
- train and dev/test data may now come from slightly different distributions

#### Build your first system quickly, then iterate

- set up dev/test set and metric
- build initial system quickly
- use bias/variance analysis & error analysis to prioritize next steps

## Mismatched training and dev/test set

### Training and testing on different distributions

- ex. cat (data from webpages (200,000) and mobile app (10,000))
    - train: 200,000 from web + 5,000 from mobile
    - dev: 2,500 from mobile
    - test: 2,500 from mobile
    - this is to ensure dev & test sets come from the same distribution
- ex. speech recognition
    - training: purchased data, smart speaker control, voice keyboard
    - dev/test: speech activated rearview mirror
    
### Bias and variance with mismatched data distribution

- training-dev set: same distribution as training set, but not used for training
- gap between human-level and training error? avoidable bias
- gap between training and training-dev error? variance
- gap between training-dev and dev error? data mismatch
- gap between test and dev error? degree of overfitting to dev set

### Addressing data mismatch

- carry out manual error analysis to try to understand difference between training and dev/test sets
- make training data more similar, or collect more data similar to dev/test sets
    - artificial data synthesis
    
## Learning from mutiple tasks

### Transfer learning (from A to B)

- task A and B have the same input $x$
- you have a lot more data for task A than task B
- low level features from A could be helpful for learning B

### Multi-task learning

- training on a set of tasks that could benefit from having shared lower-level features
- usually, amount of data you have for each task is quite similar
- can train a big enough neural network to do well on all the tasks

## End-to-end deep learning

Speech recognition example
- audio $\rightarrow$ features $\rightarrow$ phonemes $\rightarrow$ workds $\rightarrow$ trainscript vs. audio $\rightarrow$ trainscript

Machine translation example
- english $\rightarrow$ text analysis $\rightarrow$ $\dots$ $\rightarrow$ french vs. english $\rightarrow$ french

Pros
- let the data speak
- less hand-designing of components needed

Cons
- may need large amount of data
- exclude potentially useful hand-designed components