Before diving into **Odds, log(Odds) and the sigmoid function**, we need to have some basic knowledge of **Logistic Regression**, as it uses all of the above mentioned concepts.

# Logistic Regression

- A classification algorithm that classifies objects into one of several given classes
- Don't let the name confuse you, even though it has 'Regression' in it, it is actually a classification algorithm

Here classifying objects into classes essentially means generating a probability of an object belonging to a particular class.


# The Sigmoid function

aka Logistic Function

- It is a mathematical function whose range always lies between 0 and 1
- It is due to this fact, the sigmoid function has several applications including ML
- In this context, it is used to transform a **Linear Regression** function to a **Logistic Regression** function.

The curve look like:
![image.png](attachment:image.png)

The mathematical notation:
![image-2.png](attachment:image-2.png)

## Understanding how the logistic function works:

Consider a dataset with
- 1 feature: previous year income
- 1 label: Did the person pay back the loan

|Prev. Income   | Loan Paid  |
|---|---|
| -3 | 0 |
| -2 | 0 |
| -1 | 0 |
| 0 | 0 |
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |

There is a clear relationship between the feature and the label

![image-3.png](attachment:image-3.png)

Here the y-axis is the probability.
- If a new point comes up, it will incident on the logistic function curve.
- If it meets the curve at y >= 0.5, then we classify it as 1, else 0.

**Example: If someone's income is 1:**
![image-4.png](attachment:image-4.png)

There is a 90% probability that the person will pay off the loan.
So **loan paid off = 1 (Confidence = 90%)**


# Mathematics behind Linear -> Logistic Regression

## Odds and Log Odds

What we need for Logistic regression?
- A function that outputs values between 0 and 1.

So we plug in the Linear regression equation into the sigmoid function.
![image-5.png](attachment:image-5.png)

It looks like:
![image-6.png](attachment:image-6.png)

### The question arises: 

> How do we interpret the relationship between $\beta$ and $\hat y$

## Odds:

The odds of some event happening.

eg. 
1. The odds of the horse winning the race is: 100 to 1
2. The odds in favour of my team winning the match = 1 to 4 (or) 1/4
![image-7.png](attachment:image-7.png)

### Definition:
> Odds of an event = **P / (1 - P)**

Probability of the event happening divided by the probability of event not happening.

**Odds of Heads in a coin toss**: 1 / 1

How? 0.5 / 1 - 0.5 = 1/ 1

**What are the odds of 6 appearing in a random throw of a die?**  
1 against 5 (or) 1/5

### Note:
> Odds are not probabilities

- If the odds are against something, it will be between 0 to 1
- If the odds are in favour of something, its value will be > 1

## Why do we need log of odds?

- 0 - 1 => Odds of losing
- 1 - inf => Odds of winning

This asymmetry makes it difficult for comparisons. The magnitude of unfavourable odds seem way smaller than favourable odds.

![image-8.png](attachment:image-8.png)

Taking log of the odds solves this and makes everything symmetrical

Log of numbers from range:
- 0-1 => 0 to -inf
- 1-inf => 0 to _inf

Its symmetric
![image-9.png](attachment:image-9.png)

## Modelling log of the odds as linear comb. of features

> ![image-10.png](attachment:image-10.png)

Power of e becomes positive.
![image-11.png](attachment:image-11.png)

Taking log on both sides
![image-12.png](attachment:image-12.png)

### Note: The log of ratio of probabilities is called 'logit' function
Forms basis for the logistic function

Log odds of y-hat will look like a straight line (Linear Regression)

## What would the curve look like in terms of log odds of y-hat

On the logarithmic scale the logistic/sigmoid function is a straight line

![image-13.png](attachment:image-13.png)

![image-14.png](attachment:image-14.png)

> Here the coefficients $\beta\$ are in terms of change in log odds

Now that the $\beta$ is in terms of log odds of $\hat y$, is it easier to interpret? **Not really**

Log odds scale is non linear and so $\beta$ cannot be linked to it as unit increase as it was in linear regression

### Sign of $\beta$

- +ve => increase in likelihood of belonging to 1 class with increase in x
- -ve => decrese in likelihood of belonging to 1 class with increase in x

Magnitudes of the coefficients $\beta$ cannot be interpreted easily, but we can try **comparing magnitudes of the coefficients** to decide which features have stringest effect on O/P prediction.


# Odds ratio & Log odds ratio

##  Odds ratio:

Ratio of odds

Consider 2 odds:
- 2/4 
- 3/1

Ratio of these odds will be:  (2/4)/(3/1) = 0.17

If numerator < denominator: range (0, 1]

If numerator > denominator: range (1, inf)

![image-15.png](attachment:image-15.png)

Just like odds, log of odds makes things symmetrical

The log scale is very helpful, 
- log((2/4)/(3/1)) = -1.79
- log((3/1)/(2/4)) = +1.79

![image-16.png](attachment:image-16.png)

## How is this odds ratio helpful?

Can be used to compare how odds of one event affect odds of another.

Consider a dataset.

![image-17.png](attachment:image-17.png)

Here we can use odds ratio to determine if there is a relationship between mutated gene and cancer.

- Given that person has mutated gene, what are the odds that the person has cancer
    1. (23/117) / (6/210) = 6.88

The odds are 6.88 given that person with mutated gene will also have cancer. log odds = 1.93

### Odds ratio here indicates a relationship between the mutated gene and cancer

The values of the odds ratio correspond to the magnitude of the effect

If the ratio values are:
- Large: mutated gene is good predictor of cancer
- Small: mutated gene is not such a good predictor of cancer

## To know whether an odds ratio or log odds ratio is statistically significant, we perform any one or all of the following tests:

1. Fisher's Exact test
2. Chi-Square test
3. Wald test