# Problem: Logistic regression

In this exercice we consider students that apply to a Master program in an University. Their chance of being accepted depend on several criteria among which the marks they obtained to two exams (called Exam1 and Exam2).

We want to build a simple probabilistic model based only on the results obtained for these two exams. For this we use Logistic Regression to calculate the probability for a student to be accepted to the Master program given the marks he obtained for each exam:

$$\text{p(accepted)} = \sigma(w_0 + w_1 \cdot \text{Exam1} + w_2 \cdot \text{Exam2}),$$

where $\sigma()$ is the sigmoid function.

By analysing the success rate of the applicants of the previous years it was possible to determine the weights of this model, and these were estimated to be:

$$ w_0 = -44.997, w_1 = 0.36357, w_2 = 0.35662$$

In this exercice we would like to estimate the probability of 100 new students to be accepted to the Master program given the marks that they obtained to their exams.


### Initialize

In [None]:
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline

## 0) Question on lecture

Explain how the model is trained:

a) What is the loss function used here and why ?

b) What method is used to minimize the loss function ?

c) How are determined the weights $w_0$, $w_1$ and $w_2$ ?


## 1) The data
The text file below contains the marks obtained by 100 students at the two exams. 

a) Print the first 10 entries of the file.

b) Represent this dataset on a figure with two axes: Exam1 and Exam2.

In [None]:
# Load and read file
from numpy import genfromtxt
student_marks = genfromtxt('marks_nolabels.txt', delimiter=',')


## 2) The model

a) Write two functions, `sigmoid(x)` and `dsig(x)` that returns, respectively $\sigma(x)$ and $\frac{\text{d} \sigma}{\text{d} x}(x)$.

b) Write a function `predict` that returns the probability for each student to be accepted by the University.

## 3) Probabilities

a) With the values of the weights $w_0,w_1,w_2$ given above calculate the probability for each student to be accepted. Represent the distribution of all probabilities in a histogram.

b) For a threshold of p(accepted) = 0.5 how many how many students are predicted to pass or fail the acceptation at the University ?

## 4) Results

a) Represent the data in a figure with two axes (Exam1 and Exam2). Show with a red marker students that are predicted to fail the admission and with a blue marker students who are predicted to pass.

b) Modify this figure showing this time the markers with a color depending on the probability value: from dark red for low probabilities to dark blue for high probabilities.

c) Add a line showing the decision boundary separing both classes.  Hint: this line correspond to points for which p(accepted)=0.5, that is, students for which the weighted sum is such that $(w_0 + w_1 \cdot \text{Exam1} + w_2 \cdot \text{Exam2})=0$.

************************************************

# Exercices

All questions below are independent of each other. 

## 1) Neural network with Pytorch

Look below at the pytorch implementation of a fully connected neural network (do not try to run the cell). 

a) How is called this specific NN architecture ? In which cases can it be used ?

b) How many weights, including bias terms, need to be determined for each layer and in total for this network ?

c) Modify the code by introducing two layers of dimension `hidden_layer2=25` with a sigmoid activitation function.
Be careful to respect this specific NN architecture.

## 2) Implementation of a simple NN with Numpy

In the chap.3 of the lectures, pages 54-56, an illustration of a NN calculation is given. 

a) Implement this example, using **only** the `numpy` library, and check you obtain the same results as in page 56 after 1 iteration. For this define all necessary functions, apply forward pass and backward pass, etc.

b) Give the result (weights, output value) after another iteration. 

## 3) Regression

Describe the figure below. What is the difference between these models ? Which one seems to generalize better ?

By the way is this figure an example of linear regression ? Why ?

<center><img src="fit.png" width="600" /></center>
