# Logistic Regression

In this exercise, you will build a logistic regression model to predict whether a student gets admitted into a university. Suppose that you are the administrator of a university department and you want to determine each applicant's chance of admission based on their results on two exams. You have historical data from previous applicants that you can use as a training set for logistic regression. For each training example, you have the applicant's scores on two exams and the admissions decision.

Your task is to **build a classification model** that estimates an applicant's probability of admission based the scores from those two exams.

## Logistic Function

Before you start with the actual cost function, recall that the logistic regression hypothesis is defined as:

$$h_\theta(x) = g(\theta^Tx)$$

where function $g$ is the sigmoid function. The sigmoid function is defined as:

$$g(z) = \frac{1}{1 + e^{-z}}$$

## Cost Function

The cost function as a logistic regression hypothesis is defined as:

$$J(\theta) = \frac{1}{m} \sum_{i=1}^{m} [ -y^{(i)}\ log (h_{\theta}(x^{(i)})) - (1 - y^{(i)})\ log(1 - h_{\theta}(x^{(i)})) ]$$

## Gradient Descent

The gradient of the cost is a vector of the same length as $\theta$ where the $j$th element (for $j=0, 1,..., n$) is defined as follows:

$$\frac{\partial J(\theta)}{\partial \theta_j} = \frac{1}{m} \sum_{i-1}^{m} (h_{\theta}(x^{(i)}) - y^{(i)})x_j^{(i)}$$

In order to optimize the iteration between Gradient Descent and Cost Function to find the best value of $\theta$, you may use the SciPy function `opt` from `scipy.optimize` as described below:

In [None]:
import scipy.optimize as opt
result = opt.fmin_tnc(func=cost_function, x0=theta, fprime=gradient_descent, args=(X, y))
cost_function(result[0], X, y)

where `cost_function` is your implementation of the cost function, `gradient_descent` is your implementation of the gradient descent function, `theta` is your initial theta value (for this problem, you may initialize $\theta$ as with zeros). Finally, `args` receives your (100, 3) and (100, 1) matrices corresponding to the training set and true labels.

## TIPS

- In order to improve computational cost, add 1 values to the X matrix as:

```
[[1, v1, v2],
 [1, v3, v4],
 [1, v5, v6],
 ...
]
```



## STEPS

- Load the content of the file data.txt
- Plot the values separated by each class to check the distribution of the values
- Implement the Sigmoid function
- Test the Sigmoid function using values from -10 to 10 and check if it reaches the point (0, 0.5).
- Implement the Cost Function
- Test the Cost Function using $\theta=[0., 0., 0]$ :: $J(\theta)$ should give 0.6931
- Implement Gradient Descent 
- Test Gradient Descent : Gradient should return [-0.1, -12.00921659, -11.26284221]
- Apply the SciPy optimization function : this function should return the values of $\theta$ = [-25.16131859, 0.20623159, 0.20147149]
- Test all occurrences of X with this new value of theta and check the number of correct predictions
