## Advice for applying machine learning

#### This week's learning objectives:
* Evaluate and then modify your learning algorithm or data to improve your model's performance
* Evaluate your learning algorithm using cross validation and test datasets.
* Diagnose bias and variance in your learning algorithm
* Use regularization to adjust bias and variance in your learning algorithm
* Identify a baseline level of performance for your learning algorithm
* Understand how bias and variance apply to neural networks
* Learn about the iterative loop of Machine Learning Development that's used to update and improve a machine learning model
* Learn to use error analysis to identify the types of errors that a learning algorithm is making
* Learn how to add more training data to improve your model, including data augmentation and data synthesis
* Use transfer learning to improve your model's performance.
* Learn to include fairness and ethics in your machine learning model development
* Measure precision and recall to work with skewed (imbalanced) datasets

#### How can we evalulate how good a model is doing?
Since the model is trained on a certain dataset, evaluating the model using the same dataset is not a true evaluation as the model used this dataset to learn, so it should be close to 0 loss.
The goal is to have a model that predicts values that are not in the dataset it used to learn. \
If we cannot get more data to test the model with, we can split our current dataset to a training set, and testing set (maybe 70% - 30%), and use the training set to fit the model, and test how good it is using the test set

#### Splitting dataset into training, dev set, and test set
you might have used the entire dataset to train your models. In practice however, it is best to hold out a portion of your data to measure how well your model generalizes to new examples. This will let you know if the model has overfit to your training set.
It is common to split your data into three parts:

* ***training set*** - used to train the model
* ***cross validation set (also called validation, development, or dev set)*** - used to evaluate the different model configurations you are choosing from. For example, you can use this to make a decision on what polynomial features to add to your dataset.
* ***test set*** - used to give a fair estimate of your chosen model's performance against new examples. This should not be used to make decisions while you are still developing the models.

#### Why do we need dev set?
We can use the dev set to choose make decisions on the best architecture or polynomial degree, for instance, our model will use. We train the data using the data set, validate model deicisions using dev set; once we have decided how to build the model, we only then test the model using the test set, to make sure it is a fair evaluation.

#### Diagnosing High bias and high variance
We can identify high bias (under fitting) and high variance(over fitting) from the cost of both the training and dev sets. \
A model with **high cost in training set and high cost in dev set** is a sign of **overfitting(high bias)** \
A model with **low cost in training set and high cost in dev set** is a sign of **undefitting(low bias)** \
**Regularization** also has an effect on the model's overfitting or underfitting. A very high regularization rate will lead to a high bias and under fits the model as the model is punished when increasing the weights, so most of the time f(x) will end up being approximately b, which is probably a straight line with no slope. \
On the other hand, a very low regularization rate, will lead to a high variance (over fitting) as the model can customize the function as it wants, as it is not punished on changing the weights values.


#### How can we establish a baseline level of performance?
We can define whether the performance error is high or not comparted to the following:

* **Human level performance**
* **Competing algorithms performance**
* **Guess based on experience**

#### Bias and variance in neural network:
In a neural network, one trick that can work, not always, is if you have bias, you can increase your network size, by either increasing number of layers or number of units, and this would fix high bias, or at least it will be better than a smaller network; if you have a a high variance, you can train your model with more data and it will get better.
However, increasing the size of your network will be a computaional cost challenge that I personally don't favor, and also getting more data to train your model can be challenging, but this is one way of getting rid of high bias or high variance. \
It hardly ever hurts to have a larger neural network so long as you regularize appropriately. one caveat being that having a larger neural network can slow down your algorithm. So maybe that's the one way it hurts, but it shouldn't hurt your algorithm's performance for the most part and in fact it could even help it significantly


#### Iterative loop of ML development:

1- **Choose architecture**: choose model, data that will be used, etc. \
2- **Train Model**: this is where you train the model, and most of the time you will not get best result first time so you will have to diagnose your model for better results.\
3- **Diagnositcs**: This is where you do all the model learning tweaking when you have high bias, variance, etc.

#### Adding data:
During the error analysis in the diagnostics step, you can add more data if needed per the error analysis. However, you don't always have to add new data, you can use your current data and use \
*data augmentation* on your current dataset, to produce a new dataset that you can use, and help your system be more robust. (e.g. slighly blur images, lower contrast, add background noises to a speech recognitiion model data, etc.). \
Also, you can try *generating synthetic data* that would help your model; an example would be if you are training a model that recognises letters, you can type letters in different formats and colours in a notepad, and take screenshots of the letters, and use this data to train your model too.

#### Transfer learning:
You can pass the learnings of one model to another similar model (a model that is kind of similar e.g. a model that recognises digits and a model that recognises animals). \
Option 1: You can pass the output layers parameters as initial values, if the output classes are the same
Option 2: You can pass the hidden layers parameters as initial values, and create a new output layer from scratch that learns on its own.
\
The steps are the following: \
1- Supervised pretraining, which is passing the parameters to your model \
2- Fine tune your model to lower the cost using your learning alogrithm to lower cost

The reason this can work is because within the model, similar models can have very close values, due to the similarities between the model detection training (e.g. detecting edges in image recognition)

#### Full cycle of a ML project:

1- **Scope project**: Define project \
2- **Collect data**: Define and collect data \
3- **Train Model**: training, error analysis, iterative improvement \
4- **Deploy to production**: Deploy, monitor and maintain system

#### ML model deployment:
The ML model will be hosted on an Inference server, and the mobile app or website can then make an API call with x inputs to the inference server, and the inference server would reply with the prediction of the model. 

#### Error metrics for skewed data:
Skewed dataset is where a large percentage of the dataset expects 0, making the accuracy of the algorithm reasonable when it always predicts 0. \
We can use Precision or Recall to make sure that our algorithm is not just predicting 0 all the time with skewed data, because accuracy won't be helpful to find how good the model is with skewed data.

* **Precision**: Is the number of true positives over the predicted positives
* **Recall**: Is the number of true positives over the actual positives.

These two metrics will help us identify how good our algorithm is with skewed data, and they shouldn't be 0, meaning that number of TP is 0

#### Tradeoff between precision and recall:
In the logsitics regression case, raising the threshold will lead to higher precision, as we are predicting less positives now, and will lower the recall. On the other hand, lowering the threshold lower precision and higher recall.

One way to decide what algorithm to use is to use the F1 score! \
**F1 score** is getting the Harmonic mean of both the precision and the recall: 2PR/P+R