![](https://hleecaster.com/wp-content/uploads/2019/12/logreg01.png)

![](https://hleecaster.com/wp-content/uploads/2019/12/logreg02.png)

### Log-Odds
* 개념

> 선형 회귀에서는 각 속성의 값에다가 계수(coefficient)에 각 곱하고 절편(intercept)을 더해서 예측 값(-∞에서 + ∞)을 구함.

> 로지스틱 회귀에서는 마지막에 예측 값 대신 log-odds를 구함.

![Odds](https://hleecaster.com/wp-content/uploads/2019/12/logreg03.png)

![](https://hleecaster.com/wp-content/uploads/2019/12/logreg04.png)

![](https://hleecaster.com/wp-content/uploads/2019/12/logreg05.png)

 log-odds를 Sigmoid 함수에 넣어서 0부터 1사이의 값으로 변환

![시스모이드함수](https://hleecaster.com/wp-content/uploads/2019/12/logreg08.png)

###Log Loss(로그 손실)
![](https://hleecaster.com/wp-content/uploads/2019/12/logreg10.png)

![](https://hleecaster.com/wp-content/uploads/2019/12/logreg11.png)

![](https://hleecaster.com/wp-content/uploads/2019/12/logreg12.png)


![](https://quantifyinghealth.com/wp-content/uploads/2021/05/Logistic-regression-equation.png)

![](http://people.linguistics.mcgill.ca/~morgan/qmld-book/04-CDA_files/figure-html/logodds-1.png)

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
csv_data = pd.read_csv("./Data/fish_all.csv")
csv_data.head()

In [None]:
fish_all = csv_data[["Weight", "Length", "Diagonal", "Height", "Width"]]
fish_target = csv_data["Species"]
print(fish_all.shape, fish_target.shape)

In [None]:
from sklearn.model_selection import train_test_split
train_data, test_data, train_target, test_target = train_test_split(fish_all, fish_target, stratify=fish_target, random_state=42)
print(train_data.shape, test_data.shape)

In [None]:
from sklearn.preprocessing import StandardScaler
ss = StandardScaler()
train_scaled = ss.fit_transform(train_data)

In [None]:
test_scaled = ss.transform(test_data)

In [None]:
import numpy as np

z = np.arange(-5, 5, 0.1)
phi = 1/(1+np.exp(-z))
plt.plot(z, phi)
plt.xlabel("z")
plt.ylabel("phi")
plt.show() 

### 이항분류

In [None]:
br_sm_indexes = (train_target == "Bream") | (train_target == "Smelt")
train_scaled2 = train_scaled[br_sm_indexes]
train_target2 = train_target[br_sm_indexes]
print(train_scaled2.shape)

In [None]:
br_sm_indexes2 = (test_target == "Bream") | (test_target == "Smelt")
test_scaled2 = test_scaled[br_sm_indexes2]
test_target2 = test_target[br_sm_indexes2]
print(test_scaled2.shape)

In [None]:
from sklearn.linear_model import (로지스틱 회귀 클래스)

LR_model = LogisticRegression()
LR_model.fit(train_scaled2, train_target2)
print("훈련데이터의 정확도 : ", LR_model.score(train_scaled2, train_target2))
print("검증데이터의 정확도 : ", LR_model.score(test_scaled2, test_target2))

In [None]:
print(LR_model.predict(train_scaled2[:5]))

In [None]:
print(test_target2[:5])

In [None]:
print(LR_model.predict_proba(test_scaled2[:5]))

In [None]:
[0.02544183 0.00467123 0.00439073 0.00129744 0.00526059]

In [None]:
print(LR_model.coef_, LR_model.intercept_)

In [None]:
print(LR_model.classes_)

In [None]:
z_value = LR_model.decision_function(test_scaled2[:5])
print(z_value)

In [None]:
from scipy.special import expit
print(expit(z_value))

### 다항분류

In [None]:
model = LogisticRegression(C=20, max_iter=1000) #기본값:C=1(값이 클수록 강도가 약함), max_iter=100
model.fit(train_scaled, train_target)
print("훈련데이터의 정확도 : ", model.score(train_scaled, train_target))
print("테스트데이터의 정확도 : ", model.score(test_scaled, test_target))

In [None]:
print(model.classes_)
print(model.predict(test_scaled[:5]))

In [None]:
print(test_target[:5])

In [None]:
import numpy as np
proba = model.predict_proba(test_scaled[:5])
print(np.round(proba, decimals=3))

In [None]:
[[0.    0.029 0.237 0.003 0.685 0.01  0.035]
 [0.    0.032 0.576 0.001 0.35  0.003 0.039]
 [0.    0.062 0.558 0.001 0.336 0.017 0.026]
 [0.003 0.93  0.001 0.    0.051 0.    0.015]
 [0.001 0.882 0.004 0.    0.094 0.002 0.017]]

In [None]:
z_value2 = model.decision_function(test_scaled[:5])
print(np.round(z_value2, decimals=3))

In [None]:
from scipy.special import softmax
proba2 = softmax(z_value2, axis=1)
print(np.round(proba2, decimals=3))