<a href="https://colab.research.google.com/github/faaabi93/ml-from-scratch/blob/master/Logistic_Regression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Logistic Regression

## Log-Odds

In Linear Regression the coefficients of the features are multiplied by their respective feature values and then the intercept is added.  
In Logistic Regression the same multiplication (of the feature coefficients and feature values) is made and the intercept is added. But instead of the prediction, one gets the log-odds.  

The Log-Odds are another way of expressing the probability of a sample belonging to the *positive* class.  
  
$Odds = \frac{P(event)}{P(not event)}$

The odds tell us how many more times likely an event is to occure than not occure.
E.g. a student will pass an exam with the probability of 0.7, they will fail with a probability of $1 - 0.7 = 3$.

$Odds\,of\,passing = \frac{0.7}{0.3} = 2.\overline{3}$


The log odds are then understood as the logarithm of the odds.

$Log\,odds\,of\,passing = \log(2.\overline{3}) = 0.847$

For the Logistic Regression model the log-odds (z below) are calculated by summing the product of each feature value by its respective coefficient and adding the intercept. This allows us to map our feature values to a measure of how likely it is that a data sample belogngs to the positive class.

$z = b_0 + b_1x_1 + ... + b_nx_n$

$b_0$ is the intercept.
$b_1$, $b_2$, ...  $b_n$ are the coefficients of the features $x_1$, $x_2$, ... $x_n$.

Which is a *dot product*. This can be performed using numpy's `np.dot()` method. 
If we have the feature matrix `features`, coefficient vector `coefficients`, and an `intercept`, we can calculate the log-odds in numpy:

```
log_odds = np.dot(features, coefficients) + intercept
```

### Creating a log_odds function

In [0]:
import numpy as np

def log_odds(features, coefficients, intercept):
  return np.dot(features, coefficients) + intercept

## Sigmoid Function

To create the S-shaped curve we need the [Sigmoid Function](https://de.wikipedia.org/wiki/Sigmoidfunktion).

This function is a special case of the more genereal logistic function.
By plugging the log-odds into the Sigmoid Function, we map the log-odds `z` to the range `[0, 1]`.

$h(z) = \frac{1}{1 + e^-z}$

`e^(-z)` is the expon. function. It can be called with `np.exp(-z)`.



In [0]:
def sigmoid(z):
  return 1/(1+ np.exp(-z))

## Log Loss 

$-\frac{1}{m}\displaystyle\sum_{i=1}^{m} y^(i) \log{h(z^{(i)}))} + (1-y^{(i)}) \log{(1 - h(z^{(i)})}$ 



*   `m` is the total number of data samples
*   `y_i` is the class of data sample `i`
*   `z_i` is the log-odds of sample `i`
*   `h(z_i)` is the sigmoid of the log-odds of sample `i`, which is the probability of sample `i` belonging to the positive class

In Python it looks like this:

In [0]:
def log_loss(probabilities,actual_class):
  return np.sum(-(1/actual_class.shape[0])*(actual_class*np.log(probabilities) + (1-actual_class)*np.log(1-probabilities)))

## Classification Thresholding

We need a threshold for making a desicion.
The default threshold for many algorithms is `0.5`. If the predicted probability is higher than the threshold, the classification of the sample is the positiv class.

But the threshold need to be interchangeable to suffice specific needs and use-cases for the model.

In [0]:
def predict_class(features, coefficients, intercept, threshold):
  calculated_log_odds = log_odds(features, coefficients, intercept)
  probabilities = sigmoid(calculated_log_odds)
  return np.where(probabilities >= threshold, 1, 0)