#### Cost function for the NN
1. Learning algorithm for fitting the parameters of a neural network given a training set.

#### Backpropagation Algorithm
"Backpropagation" is neural-network terminology for minimizing our cost function, just like what we were doing with gradient descent in logistic and linear regression. We want to minimize our cost function J using an optimal set of parameters in 'theta'.

<br>
<img src="../img/nn/numerical_estimate_gradient.png" width="500"/>

#### Random Initialization of 'theta'

#### Combine all peaces togather, implement a neural network learning algorithm
1. Pick some network architecture and by architecture I just mean connectivity pattern between the neurons.
2. Choices of how many hidden units in each layer and how many hidden layers, those are architecture choices. For example with three input units and five hidden units and four output units versus one of 3, 5 hidden, etc...

<br>
<img src="../img/nn/connectivety_pattern.png" width="500"/>

#### How to choose network architecture
1. First, the number of input units well that's pretty well defined. And once you decides on the fix set of features x the number of input units will just be, you know, the dimension of your features x(i) would be determined by that.
2. If you are doing multiclass classifications the number of output of this will be determined by the number of classes in your classification problem. And just a reminder if you have a multiclass classification where y takes on say values between.
3. If you have a multiclass classification where y takes on say values between 1 and 10 you have 10 possible classes.
4. Number of hidden units and the number of hidden layers, a reasonable default is to use a single hidden layer and so this type of neural network shown on the left with just one hidden layer is probably the most common.
5. Usually the number of hidden units in each layer will be maybe comparable to the dimension of x, comparable to the number of features.

#### 6 steps what we need to implement in order to trade in neural network

1. Set up the neural network and to randomly initialize the values of the weights. And we usually initialize the weights to small values near zero.
2. Implement forward propagation so that we can input any excellent neural network and compute h of x which is this output vector of the y values.
3. Implement code to compute this cost function j of theta.
4. Implement back-prop, or the back-propagation algorithm, to compute these partial derivatives terms, partial derivatives of j of theta with respect to the parameters.
5. use gradient checking to compare these partial derivative terms that were computed. So, I've compared the versions computed using back propagation versus the partial derivatives computed using the numerical estimates as using numerical estimates of the derivatives. So, I do gradient checking to make sure that both of these give you very similar values.
6. Use an optimization algorithm such as gradient descent, or one of the advanced optimization methods such as LB of GS, contract gradient has embodied into fminunc or other optimization methods. (optimization methods to try to minimize j of theta as a function of the parameters theta)

///!!! CODE HERE !!!



### How to improve a Learning Algorithm and improve the selected model

1. Get more training examples
2. Reduce amount of features by "feature importance"
3. Try getting additional features (collect more data and use like additional features)
4. Try adding polynomial features (x1^2, x2^2, x1, x2, etc)
5. Decreasing regularization parameter (lambda)
6. Increasing regularization parameter (lambda)



### Evaluating hypothesis
To evaluate a hypothesis, given a dataset of training examples, we can split up the data into two sets: a training set and a test set. Typically, the training set consists of 70 % of your data and the test set is the remaining 30 %.

<br>
<img src="../img/evaluate_learning/test_set_error_calculation.png" width="900"/>

### Model Selection and Train/Cross Validation/Test Sets

Cross-validation is a technique for evaluating ML models by training several ML models on subsets of the available input data and evaluating them on the complementary subset of the data.
1. Use cross-validation to detect overfitting, ie, failing to generalize a pattern.


Just because a learning algorithm fits a training set well, that does not mean it is a good hypothesis. It could over fit and as a result your predictions on the test set would be poor.

### Improve the performance of the learning algorithm (Bias "underfit" vs Variance "overfit")

1. High bias (underfit) - high error
2. Variance (overfiting) - very low error

<img src="../img/evaluate_learning/bias_variance.png" width="900"/>
<br>
<img src="../img/evaluate_learning/combinations_train_cros_val.png" width="900"/>

1. We need to distinguish whether bias or variance is the problem contributing to bad predictions.
2. High bias is underfitting and high variance is overfitting. Ideally, we need to find a golden mean between these two.

<img src="../img/evaluate_learning/bias_variance_diagnostic.png" width="900"/>


<br>

### Learning curves
Training an algorithm on a very few number of data points (such as 1, 2 or 3) will easily have 0 errors because we can always find a quadratic curve that touches exactly those number of points.


### Our decision process can be broken down as follows:
1. Getting more training examples: Fixes high variance
2. Trying smaller sets of features: Fixes high variance
3. Adding features: Fixes high bias
4. Adding polynomial features: Fixes high bias
5. Decreasing λ: Fixes high bias
6. Increasing λ: Fixes high variance.


### Diagnosing Neural Networks
1. A neural network with ***fewer parameters*** is prone to ***underfitting***. It is also computationally cheaper.
2. A large neural network with ***more parameters*** is prone to ***overfitting***. It is also computationally expensive. In this case you can use regularization (increase λ) to address the overfitting.

Using a single hidden layer is a good starting default. You can train your neural network on a number of hidden layers using your cross validation set. You can then select the one that performs best.


### Model Complexity Effects

### Train/Test/CV error calculation

<img src="../img/evaluate_learning/train_test_valid_err.png" width="900"/>

<br>
<img src="../img/evaluate_learning/high_bias.png" width="900"/>
<br>
<img src="../img/evaluate_learning/high_variance.png" width="900"/>




## Questions:
0. You train a learning algorithm, and find that it has unacceptably high error on the test set.  You plot the learning curve, and obtain the figure below.  Is the algorithm suffering from high bias, high variance, or neither?

<br>
<img src="../img/evaluate_learning/high_bias_suffering.png" width="300"/>
<br>

-  High bias


1. In which of the following circumstances is getting more training data likely to significantly help a learning algorithm’s performance?
- Algorithm is suffering from high variance.
- (cross validation error) is much larger than J(train)

2. You train a learning algorithm, and find that it has unacceptably high error on the test set.  You plot the learning curve, and obtain the figure below.  Is the algorithm suffering from high bias, high variance, or neither?
<br>
<img src="../img/evaluate_learning/question_metrics_suffering.png" width="300"/>
<br>

-  High variance



3. Suppose you have implemented regularized logistic regression to classify what object is in an image (i.e., to do object recognition). However, when you test your hypothesis on a new set of images, you find that it makes unacceptably large errors with its predictions on the new images.  However, your hypothesis performs well (has low error) on the training set. Which of the following are promising steps to take? Check all that apply.

NOTE: Since the hypothesis performs well (has low error) on the training set, it is suffering from high variance (overfitting)

- Try increasing the regularization parameter λ.
- Try using a smaller set of features.
- Get more training examples.

"The gap in errors between training and test suggests a high variance problem in which the algorithm has overfit the training set. Increasing the regularization parameter will reduce overfitting and help with the variance problem."


4. Suppose you have implemented regularized logistic regression to predict what items customers will purchase on a web shopping site. However, when you test your hypothesis on a new set of customers, you find that it makes unacceptably large errors in its predictions. Furthermore, the hypothesis performs **poorly** on the training set. Which of the following might be promising steps to take? Check all that apply.

NOTE: Since the hypothesis performs poorly on the training set, it is suffering from high bias (underfitting)

- Try decreasing the regularization parameter λ.
- Try adding polynomial features.
- Try to obtain and use additional features.
"The poor performance on both the training and test sets suggests a high bias problem. Adding more complex features will increase the complexity of the hypothesis, thereby improving the fit to both the train and test data."



???  regularization parameter λ - Regularization parameter will reduce overfitting and help with the variance problem.

### ML system design

### Supervised learning example spam/not spam
