# Week 2
## Multivariate Linear Regression
### Feature scaling
When different features have very different scales, it can pose problems to the use of gradient descent. The ellipses of the objective function become very skinny. Therefore, the convergence of the optimization problem becomes very slow as the gradient direction doesn't always point to the global optimum. In contrast, when the different features have similar scales, the ellipses are close to circles where the gradient direction points to the global optimum, more or less.

As a result, when gradient descent is used, it is a good idea to scale the features so they have similar scales. However, when the analytical solution is available in the case of multiple least squares, it is not necessary to scale the features.

## Computing Parameters Analytically
### Gradient descent vs normal equation
Solving the normal equation involves solving for ${(X^TX)}^{-1}$ whose computational complexity is roughly $O(n^3)$. Therefore, for large data sets where $n$ is large (e.g., $n>10,000$), the use of normal equation is slow but gradient descent still works well when $n$ is large.

### Invertibility of ${(X^TX)}$
This matrix is not invertible (singular) in two scenarios:
1. There are linearly dependent features (collinearity).
2. There are more features than observations/training examples. (Delete features or use regularization to resolve this.)

# Week 3
## Classification and Representation
### Classification
Classification problems cannot be solved using linear regression approach. For example, if we have a 3-class classification problem and the three classes are coded 1, 2 and 3. If we consider the codes as the values of the response variable in the context of linear regression, then we are enforcing that the three classes to be ordered as such, and also that the differences between the neighboring classes are one. In a general classification problem, there is no natural way to convert qualitative response variable to unique quantitative response variable. This is only one problem with using linear regression for classification.

### Decision Boundary
The decision boundary can be nonlinear if we add nonlinear (e.g., polynomial) terms in the logistic regression. So it can handle data sets that are not linearly separable.

## Logistic Regression Model
### Simplified Cost Function
The need to find a convex cost function (due to the use of the logistic function as the hypothesis) leads to the use of the following cost function:

$J(\theta)=\frac{1}{m}\Sigma_{i=1}^m [-y\log (h_\theta(x))-(1-y)\log (1-h_\theta(x))]$

The cost approaches infinity if the predicted probability is 0 (or 1) but the actual class is 1 (or 0).

## Solving the Problem of Overfitting
### Regularized Linear Regression
$L_2$ regularization of the linear regression (ridge regression) also has an analytical solution for $\theta$. $\theta=(X^TX+\lambda \mathrm{diag}(0,1,1,...,1))^{-1}X^Ty$. The matrix $X^TX+\lambda \mathrm{diag}(0,1,1,...,1)$ is always invertible with a positive $\lambda$ even when there are fewer traning examples than features. $L_0$ regularization is NP-hard. $L_1$ regularization is lasso regression.

# Week 4
## Motivations
### Nonlinear Hypotheses
The motivating example used in this class to prove that classification algorithms like logistic regression doesn't work well is computer vision problem where the computer is asked to identify the object in an image. A small 50x50 image has 2500 pixels, so it's 2500 features if the image is gray scale and 7500 features if the image is RGB. If the decision boundary is nonlinear, then we need to include nonlinear terms of the original features (e.g., polynomial terms) to help us capture that nonlinear boundary, and then the number of features will grow very quickly (e.g., the number of quadratic terms is $O(n^2)$). So algorithms like logistic regression can't handle this kind of problems.

## Neural Networks
### Model Representation I
The function that converts the inputs to the output is called the activation function, which can be a logistic function.