# 2. Logistic Regression

---

## References

[Geeks for Geeks - Logistic Regression in Machine Learning](https://www.geeksforgeeks.org/understanding-logistic-regression/)

[Scikit Learn - LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html)

[StatQuest: Logistic Regression](https://youtu.be/yIYKR4sgzI8?si=k2EGU-u74p-jecQW)

[Google Developer Program](https://developers.google.com/machine-learning/crash-course/logistic-regression/)

---

## Notes

**Characteristics**

- used for (often binary) classification 
- produces independent variables (features) to probability between $0$ and $1$
- outcome should be discrete value (classes $0$ or $1$)
- gives probabilistic values for each class
- extension of linear regression
- uses sigmoid function

**Equations**
$$z = b + w_1x_1 + w_2x_2 + \cdots + w_nx_n = b + \vec{w}^T\vec{x}$$
$$\hat{y} = \sigma(z)=\frac{1}{1+e^{-z}}$$
$$\text{returns}\, \begin{cases}1&\hat{y}\geq0.5\\0&\hat{y}<0.5\end{cases}$$

- $e^z$: odd, the ratio of the probability of favorable outcomes and that of unfavorable outcomes ($\frac{p}{1-p}$)
- $\sigma$: probability ($p$)


**Loss Function**

Binary Cross-Entropy Loss (Log Loss):
$$\ell(\hat{y}_i,y_i) = -y_i\log(\hat{y}_i)-(1-y_i)\log(1-\hat{y}_i)$$
$$J(\vec{w},b)=-\frac{1}{n}\sum\limits_{i=1}^{n}(y_i\log(\hat{y}_i)+(1-y_i)\log(1-\hat{y}_i))$$

**Derivatives**

\begin{align*}
    \frac{\partial\ell}{\partial \hat{y}} &= -\frac{y_i}{\hat{y}_i}+\frac{1-y_i}{1-\hat{y}_i}\\
    &=\frac{\hat{y}_i-y_i}{\hat{y}_i(1-\hat{y}_i)}\\
    \\
    \frac{\partial \hat{y}}{\partial z} &= \frac{e^{-z}}{(1+e^{-z})^2}\\
    &= \frac{1}{1+e^{-z}}\cdot\left(1-\frac{1}{1+e^{-z}}\right)\\
    &= \hat{y}_i\left(1-\hat{y}_i\right)\\
    \\
    \frac{\partial z}{\partial \vec{w}} &= \vec{x}\\
    \frac{\partial z}{\partial b} &= 1
\end{align*}

**Gradients**

\begin{align*}
    \frac{\partial J}{\partial \vec{w}}&=\frac{1}{n}\sum\frac{\partial\ell}{\partial \vec{w}}\\
    &=\frac{1}{n}\sum\left(\frac{\partial\ell}{\partial \hat{y}}\cdot\frac{\partial \hat{y}}{\partial z}\cdot\frac{\partial z}{\partial \vec{w}}\right)\\
    &=\frac{1}{n}\sum\limits(\hat{y}-y)\vec{x}
\end{align*}

\begin{align*}
    \frac{\partial J}{\partial b}&=\frac{1}{n}\sum\frac{\partial\ell}{\partial b}\\
    &=\frac{1}{n}\sum\left(\frac{\partial\ell}{\partial \hat{y}}\cdot\frac{\partial \hat{y}}{\partial z}\cdot\frac{\partial z}{\partial b}\right)\\
    &=\frac{1}{n}\sum\limits(\hat{y}-y)
\end{align*}

**Parameter Update**

$$\vec{w}=\vec{w}-\alpha\left(\frac{1}{n}\sum(y-\hat{y})\vec{x}\right)$$
$$b=b-\alpha\left(\frac{1}{n}\sum(y-\hat{y})\right)$$

---

## Comments

Although multinomial logistic regression exists, only binomial model will be implemented.