### Iris
Dataset Iris is a fine example to run Logit Regression, which is another algorithm of ML.

In [12]:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

In [13]:
# load iris dataset
iris = load_iris()
X = iris.data
y = iris.target

For logistic regression, we use the same method in linear regression. In linear model, we predict label $\hat{y_i} \in \mathbb{R}$ for each $x_i\in D,x_i = (x_1,x_2,\cdots,x_n)$. And, our model is linear, so
$$\hat{y_i} = w_0 + w_1x_{i1} + \cdots w_nx_{in}$$
Through LSM or Gradient Descent(BGD,SGD or mini-BGD) learning on $D$ and labelset $Y$, we can get to the parameter $\hat{w} = (w_0,w_1,\cdots,w_n)^T$ to predict. Let $\hat{x_i} = (1, x_i1, x_i2,\cdots,x_in)$,then $\hat{y_i} = \hat{x_i}\hat{w}$. For each $x_i$(such $x_i$ total sum is $m$, which is $|D|$), let $X$ be
$$
\begin{pmatrix}
1 & x_{11} & x_{12} & \cdots & x_{1n} \\
1 & x_{21} & x_{22} & \cdots & x_{2n} \\
\vdots & \vdots & \vdots & \ddots & \vdots \\
1 & x_{m1} & x_{m2} & \cdots & x_{mn}
\end{pmatrix}
$$
and $\hat{y}$ be $(\hat{y_1},\hat{y_2},\cdots,\hat{y_m})^T$, then
$$\hat{y} = X\hat{w}$$

Above are the linear regression model, but for logistic regression the label $y_i \in \{0,1\}$, how to correct the output to fit the new label?
One method is to use Sigmoid function $\sigma(z) = \frac{1}{1 + \exp{(-z)}}$.
Let what we predict in logistic regression is the possibility $\hat{y_i} = \sigma(\hat{x_i}\hat{w}) = \frac{1}{1 + \exp(\hat{x_i}\hat{w})}$, and
$$
\begin{aligned}
\hat{y_i}
\begin{cases}
\geq 0.5 , y_i = 1 \\
< 0.5 , y_i = 0 
\end{cases}
\end{aligned}
$$
Now we know $\hat{y_i} = p(y_i = 1|\hat{x_i},\hat{w})$,for $\hat{y_i} = \sigma({\hat{x_i}\hat{w}})$, we have
$$
\begin{aligned}
    & \hat{y_i} = \sigma({\hat{x_i}\hat{w}}) \Rightarrow \\
    & ln{\frac{\hat{y_i}}{1 - \hat{y_i}}} = \hat{x_i}\hat{w}
\end{aligned}
$$
where $1 - \hat{y_i}$ is $p(y_i = 0|\hat{x_i},\hat{w})$, $\frac{\hat{y_i}}{1 - \hat{y_i}} = \frac{p(y_i = 1|\hat{x_i},\hat{w})}{p(y_i = 0|\hat{x_i},\hat{w})}$ reveals the odds of $y_i = 1$.
Then we use Entropy as loss function.$l(\hat{w}) = y_iln(p(y_i = 1|\hat{x_i},\hat{w})) + (1 - y_i)p(y_i = 0|\hat{x_i},\hat{w})$.The Cost function is
$$
J(\hat{w}) = -\frac{1}{m}\sum_{i = 1}^{m}l(\hat{w}) + \frac{\lambda}{2m}\sum_{j = 0}^{n}w_j^2
$$.

The next step is GD.
$$
\hat{w}^* = \argmin_{\hat{w}} J(\hat{w})
$$
And
$$
\hat{w}^{t + 1} = \hat{w}^t - \alpha\frac{\partial J(\hat{w}^t)}{\partial\hat{w}^t}
$$
where $\alpha$ is learning rate.

In [14]:
X = X[y != 2]
y = y[y != 2]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LogisticRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)
accuracy 
conf_matrix
class_report

'              precision    recall  f1-score   support\n\n           0       1.00      1.00      1.00        12\n           1       1.00      1.00      1.00         8\n\n    accuracy                           1.00        20\n   macro avg       1.00      1.00      1.00        20\nweighted avg       1.00      1.00      1.00        20\n'