# Logistic Regression

## Fundamental Concept

![img](logistic_regression.png)

- Logistic regression is an algorithm that learns the probability of an event occurring.
- Usually, binary classification is performed, but more than three classification problems can also be dealt with.

## Algorithm

- The deflection w_0 is added to the weight vector w corresponding to the data x to calculate (W^T)x + W_0.
- The same is true of learning the weight vector w and the deflection w_0 from the data.
- Unlike linear regression, the probability is calculated, so the range of output results should be between 0 and 1.
- Therefore, a value between 0 and 1 is returned using the sigmoid function.
- sigmoid function: f(z) = 1/(1+e^(-z))

![img](sigmoid_function.png)

- f(z)(w^T+w_0) is calculated as a probability p in which a label corresponding to data x is y as a sigmoid function.
- Binary classification usually takes the probability of a prediction result of 0.5 as a threshold.
- When learning, the error is minimized with the loss function of logistic regression.
- Find the minimum value of the loss function value while calculating the slope of the logistic regression function value.

## Sample Code

In [1]:
import numpy as np
from sklearn.linear_model import LogisticRegression

X_train = np.r_[np.random.normal(3, 1, size = 50),
               np.random.normal(-1, 8, size = 50)].reshape((100, -1))

y_train = np.r_[np.ones(50), np.zeros(50)]

model = LogisticRegression(solver = 'lbfgs')
model.fit(X_train, y_train)
model.predict_proba([[0], [1], [2]])[:, 1]

array([0.45884196, 0.4852828 , 0.51180628])

In [2]:
from sklearn.datasets import load_breast_cancer
data = load_breast_cancer()
X = data.data
y = data.target
X = X[:, :]


from sklearn.linear_model import LogisticRegression
model = LogisticRegression(solver = 'lbfgs', max_iter=10000)
model.fit(X, y)
y_pred = model.predict(X)

### accuracy evaluation

In [3]:
from sklearn.metrics import accuracy_score
accuracy_score(y, y_pred)

0.9578207381370826

### Cross Validation

In [4]:
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
cv = KFold(5, shuffle = True)
cross_val_score(model, X, y, cv = cv, scoring = 'accuracy')

array([0.96491228, 0.96491228, 0.96491228, 0.92982456, 0.96460177])

## Determination boundary

- When unknown data is put into the model trained to solve the classification problem and classified, the classification result changes to a boundary of some data.
- The boundary in which the classification results change is called the determination boundary.
- In logistic regression, the decision boundary is where the result of calculated probability is 50%.