## 2-1) 선형 모델(linear model)
- 입력 특성에 대한 선형함수를 만들어 예측을 수행

- 회귀의 선형 모델(가설 함수)  
    > $\widehat{y} = w[0]*x[0]+w[1]*x[1]+ ... +w[p]*x[p]+b$  
    (또는 $h_\theta(x) = \theta_0+\theta_1*x_1++\theta_2*x_2+ ...$ 처럼 나타내기도)

 $x[0], x[1], ... , x[p]$ : 하나의 샘플에 대한 특성(feature) (p+1 개의 feature가 있다)  
 $w, b$ : 모델이 학습할 파라미터

 - 특성(feature)이 1개일 때 : 직선
 - 특성(feature)이 2개일 때 : 평면
 - 특성(feature)이 n개일 때 : 초평면(hyperplane)  
   → **feature가 많은 dataset의 경우 선형모델이 매우 훌륭한 성능을 보일 수 있다!**

회귀를 위한 선형모델은 다양함
- training set으로부터 파라미터 $w, b$를 학습하는 방법
- 모델의 복잡도를 제어하는 방법  
  에 따라 차이가 난다

----

### 1. 선형 회귀(linear regression)  
##### (또는 최소제곱법(ordinary least squares, OLS))
- 가장 간단한 회귀용 선형 알고리즘
- y_test와 y_test_predict 사이의 평균제곱오차(MSE)를 최소화하는 $w, b$값을 찾는다
  - 평균제곱오차(MSE): $MSE = {1 \over n}\sum_{i=1}^n (y_i-\widehat{y}_i)^2$

- $w$ (기울기 파라미터, 가중치(weight), 계수(coefficient))  
  - lr 객체의 coef_ 속성에 저장되어 있다(```mode.coef_```)  
  - 각 입력 feature에 하나씩 대응되는 NumPy 배열

- $b$ (편향(offset) 파라미터, 절편(intercept) 파라미터)  
  - lr 객체의 intercept_ 속성에 저장되어 있다(```model.intercept_```)  
  - 실수(float)값 1개

- 장점
  - 매개변수가 없다
- 단점
  - 그래서 모델의 복잡도를 제어할 방법이 없다
  - 고차원의 데이터셋 → 선형 모델의 성능이 너무 높아져 overfitting 될 가능성이 증가할 수 있다
  - **→ 복잡도를 제어할 수 있는 모델을 사용해야 한다!**

### 선형회귀 실습해보기

In [2]:
# 필요한 모듈 불러오기
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [3]:
# 보스턴 집값 데이터 csv 파일 준비
boston_df = pd.read_csv('./data/boston.csv')
boston_df

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,MEDV
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.0900,1.0,296.0,15.3,396.90,4.98,24.0
1,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.90,9.14,21.6
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,242.0,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3.0,222.0,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3.0,222.0,18.7,396.90,5.33,36.2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
501,0.06263,0.0,11.93,0.0,0.573,6.593,69.1,2.4786,1.0,273.0,21.0,391.99,9.67,22.4
502,0.04527,0.0,11.93,0.0,0.573,6.120,76.7,2.2875,1.0,273.0,21.0,396.90,9.08,20.6
503,0.06076,0.0,11.93,0.0,0.573,6.976,91.0,2.1675,1.0,273.0,21.0,396.90,5.64,23.9
504,0.10959,0.0,11.93,0.0,0.573,6.794,89.3,2.3889,1.0,273.0,21.0,393.45,6.48,22.0


In [4]:
X = boston_df.drop(columns=['MEDV'])
X

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.0900,1.0,296.0,15.3,396.90,4.98
1,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.90,9.14
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,242.0,17.8,392.83,4.03
3,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3.0,222.0,18.7,394.63,2.94
4,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3.0,222.0,18.7,396.90,5.33
...,...,...,...,...,...,...,...,...,...,...,...,...,...
501,0.06263,0.0,11.93,0.0,0.573,6.593,69.1,2.4786,1.0,273.0,21.0,391.99,9.67
502,0.04527,0.0,11.93,0.0,0.573,6.120,76.7,2.2875,1.0,273.0,21.0,396.90,9.08
503,0.06076,0.0,11.93,0.0,0.573,6.976,91.0,2.1675,1.0,273.0,21.0,396.90,5.64
504,0.10959,0.0,11.93,0.0,0.573,6.794,89.3,2.3889,1.0,273.0,21.0,393.45,6.48


In [5]:
y = boston_df[['MEDV']]
y

Unnamed: 0,MEDV
0,24.0
1,21.6
2,34.7
3,33.4
4,36.2
...,...
501,22.4
502,20.6
503,23.9
504,22.0


#### a. 단순 선형 회귀 실습

In [6]:
# 1. 범죄율(CRIM)로 집값 예측해보기
X_CRIM = X[['CRIM']]
X_CRIM

Unnamed: 0,CRIM
0,0.00632
1,0.02731
2,0.02729
3,0.03237
4,0.06905
...,...
501,0.06263
502,0.04527
503,0.06076
504,0.10959


In [7]:
X_train, X_test, y_train, y_test = train_test_split(
    X_CRIM, y, test_size=0.2, random_state=5
)

In [8]:
lr_model = LinearRegression().fit(X_train, y_train)

In [9]:
print("lr_model.coef_:{}".format(lr_model.coef_))
print("lr_model.intercept_:{}".format(lr_model.intercept_))

lr_model.coef_:[[-0.41546547]]
lr_model.intercept_:[24.12202188]


In [10]:
# R^2값으로의 평가하기
print("training set score: {}".format(lr_model.score(X_train, y_train)))
print("test set score: {}".format(lr_model.score(X_test, y_test)))

training set score: 0.15130948174423697
test set score: 0.14522288591819743


- $R^2$값이 좋지는 않으나 훈련 셋트와 테스트 셋트의 점수가 매우 비슷하다
  → 과소적합인 상태

In [11]:
# 평균 제곱근 오차(MSE)로 모델 성능 평가하기
from sklearn.metrics import mean_squared_error

y_test_predict = lr_model.predict(X_test)

mse = mean_squared_error(y_test, y_test_predict)
print("평균 제곱근 오차값: {}".format(mse))

평균 제곱근 오차값: 66.92380714139914


- 너무 높다 (MSE는 0에 가까울수록 모델이 좋은 예측을 한다)
----

In [12]:
# 2. 집의 나이(AGE)로 집값 예측해보기
X_AGE = X[['AGE']]
X_AGE

Unnamed: 0,AGE
0,65.2
1,78.9
2,61.1
3,45.8
4,54.2
...,...
501,69.1
502,76.7
503,91.0
504,89.3


In [13]:
# train/test split
X_train, X_test, y_train, y_test = train_test_split(
    X_AGE, y, test_size=0.2, random_state=5
)

# model 훈련시키기
lr_model_2 = LinearRegression().fit(X_train, y_train)

In [14]:
print("lr_model_2.coef_:{}".format(lr_model_2.coef_))
print("lr_model_2.intercept_:{}".format(lr_model_2.intercept_))

lr_model_2.coef_:[[-0.12402883]]
lr_model_2.intercept_:[31.04617413]


In [15]:
# R^2값으로의 평가하기
print("training set score: {}".format(lr_model_2.score(X_train, y_train)))
print("test set score: {}".format(lr_model_2.score(X_test, y_test)))

print("\n")

# 평균 제곱근 오차(MSE)로 모델 성능 평가하기
y_test_predict = lr_model_2.predict(X_test)

mse = mean_squared_error(y_test, y_test_predict)
print("평균 제곱근 오차값: {}".format(mse))

training set score: 0.143432785607861
test set score: 0.1334414836868656


평균 제곱근 오차값: 67.8462187008521


- 이 결과 역시 CRIM으로 선형회귀를 한 것과 비슷한 결과
------

#### b. 다중 선형 회귀(multiple linear regression) 실습 
##### ($\neq$ 다항 회귀(Polynomial regression))

In [16]:
# 1. 보스턴 집값 data로 집값 예측하기
X.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0,296.0,15.3,396.9,4.98
1,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.9,9.14
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,242.0,17.8,392.83,4.03
3,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3.0,222.0,18.7,394.63,2.94
4,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3.0,222.0,18.7,396.9,5.33


In [17]:
X.shape

(506, 13)

In [18]:
# train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=5)

# 모델 훈련하기
lr = LinearRegression().fit(X_train, y_train)

In [19]:
print("lr.coef_:{}".format(lr.coef_)) # 13개의 항이 있는 데이터
print("lr.intercept_:{}".format(lr.intercept_))

lr.coef_:[[-1.30799852e-01  4.94030235e-02  1.09535045e-03  2.70536624e+00
  -1.59570504e+01  3.41397332e+00  1.11887670e-03 -1.49308124e+00
   3.64422378e-01 -1.31718155e-02 -9.52369666e-01  1.17492092e-02
  -5.94076089e-01]]
lr.intercept_:[37.91248701]


In [20]:
# R^2값으로의 평가하기
print("training set score: {}".format(lr.score(X_train, y_train)))
print("test set score: {}".format(lr.score(X_test, y_test)))

print("\n")

# 평균 제곱근 오차(MSE)로 모델 성능 평가하기
y_test_predict = lr.predict(X_test)

mse = mean_squared_error(y_test, y_test_predict)
print("평균 제곱근 오차값: {}".format(mse))

training set score: 0.738339392059052
test set score: 0.7334492147453087


평균 제곱근 오차값: 20.86929218377072


- 단순 선형회귀 모델에 비해 성능이 향상되었다

-----

In [40]:
# 2. 당뇨병 data로 당뇨 수치 예측하기
from sklearn.datasets import load_diabetes

diabetes_dataset = load_diabetes()

In [41]:
print('diabetes_dataset.keys():\n{}'.format(diabetes_dataset.keys()))

diabetes_dataset.keys():
dict_keys(['data', 'target', 'frame', 'DESCR', 'feature_names', 'data_filename', 'target_filename', 'data_module'])


In [42]:
diabetes_dataset.DESCR

'.. _diabetes_dataset:\n\nDiabetes dataset\n----------------\n\nTen baseline variables, age, sex, body mass index, average blood\npressure, and six blood serum measurements were obtained for each of n =\n442 diabetes patients, as well as the response of interest, a\nquantitative measure of disease progression one year after baseline.\n\n**Data Set Characteristics:**\n\n  :Number of Instances: 442\n\n  :Number of Attributes: First 10 columns are numeric predictive values\n\n  :Target: Column 11 is a quantitative measure of disease progression one year after baseline\n\n  :Attribute Information:\n      - age     age in years\n      - sex\n      - bmi     body mass index\n      - bp      average blood pressure\n      - s1      tc, total serum cholesterol\n      - s2      ldl, low-density lipoproteins\n      - s3      hdl, high-density lipoproteins\n      - s4      tch, total cholesterol / HDL\n      - s5      ltg, possibly log of serum triglycerides level\n      - s6      glu, blood sugar

In [24]:
diabetes_dataset.data.shape

(442, 10)

In [26]:
diabetes_dataset.feature_names

['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']

In [28]:
# train/test split
X_train, X_test, y_train, y_test = train_test_split(
    diabetes_dataset.data, diabetes_dataset.target, test_size=0.2, random_state=5
)

In [29]:
lr = LinearRegression().fit(X_train, y_train)

In [30]:
print("lr.coef_:{}".format(lr.coef_)) # 10개의 항이 있는 데이터
print("lr.intercept_:{}".format(lr.intercept_))

lr.coef_:[   2.72308829 -255.94291747  522.84096403  353.09406901 -827.60149738
  543.34104068  115.94257227  214.6877495   694.94897032   32.73339672]
lr.intercept_:152.22190213007212


In [31]:
# R^2값으로의 평가하기
print("training set score: {}".format(lr.score(X_train, y_train)))
print("test set score: {}".format(lr.score(X_test, y_test)))

print("\n")

# 평균 제곱근 오차(MSE)로 모델 성능 평가하기
y_test_predict = lr.predict(X_test)

mse = mean_squared_error(y_test, y_test_predict)
print("평균 제곱근 오차값: {}".format(mse))

training set score: 0.5115517387428321
test set score: 0.5271558947230806


평균 제곱근 오차값: 2981.5873043126107


- MSE 값이 왜 이렇게 크게 나올까?
----

#### 편향(Bias)
- 모델이 너무 간단해서 데이터의 관계를 잘 학습하지 못하는 경우, 모델의 bias가 높다고 한다  
#### 분산(Variance)
- 데이터셋 별로 모델이 얼마나 일관된 성능을 보여주는지
  → 분산이 낮을수록 성능을 일관적으로 보여주는 것이다
        
#### 편향-분산 트레이드오프(Bias-Variance Tradeoff)
- 일반적으로 편향과 분산은 하나가 줄어들수록 하나는 늘어나는 관계   
  ⇒ 적당한 모델(과소적합과 과적합의 적당한 밸런스..)을 찾는 것이 머신러닝에서 중요하다. 이것이 머신러닝 프로그램의 성능과 밀접한 관계가 있기 때문.
        
```preprocessing.PolynomialFeatures``` : 입력변수의 차수를 늘릴 수 있다

In [None]:
# 당뇨병 데이터의 차수를 늘려 다항회귀를 해보자

In [46]:
X = pd.DataFrame(diabetes_dataset.data, columns=diabetes_dataset.feature_names)
X

Unnamed: 0,age,sex,bmi,bp,s1,s2,s3,s4,s5,s6
0,0.038076,0.050680,0.061696,0.021872,-0.044223,-0.034821,-0.043401,-0.002592,0.019907,-0.017646
1,-0.001882,-0.044642,-0.051474,-0.026328,-0.008449,-0.019163,0.074412,-0.039493,-0.068332,-0.092204
2,0.085299,0.050680,0.044451,-0.005670,-0.045599,-0.034194,-0.032356,-0.002592,0.002861,-0.025930
3,-0.089063,-0.044642,-0.011595,-0.036656,0.012191,0.024991,-0.036038,0.034309,0.022688,-0.009362
4,0.005383,-0.044642,-0.036385,0.021872,0.003935,0.015596,0.008142,-0.002592,-0.031988,-0.046641
...,...,...,...,...,...,...,...,...,...,...
437,0.041708,0.050680,0.019662,0.059744,-0.005697,-0.002566,-0.028674,-0.002592,0.031193,0.007207
438,-0.005515,0.050680,-0.015906,-0.067642,0.049341,0.079165,-0.028674,0.034309,-0.018114,0.044485
439,0.041708,0.050680,-0.015906,0.017293,-0.037344,-0.013840,-0.024993,-0.011080,-0.046883,0.015491
440,-0.045472,-0.044642,0.039062,0.001215,0.016318,0.015283,-0.028674,0.026560,0.044529,-0.025930


In [39]:
y = pd.DataFrame(diabetes_dataset.target, columns=['diabetes'])
y.head()

Unnamed: 0,diabetes
0,151.0
1,75.0
2,141.0
3,206.0
4,135.0


In [47]:
from sklearn.preprocessing import PolynomialFeatures

polynomial_transformer = PolynomialFeatures(2)

polynomial_data = polynomial_transformer.fit_transform(X)
features = polynomial_transformer.get_feature_names_out(X.columns) # 변수 이름 생성

X_2 = pd.DataFrame(polynomial_data, columns=features)
X_2.head()

Unnamed: 0,1,age,sex,bmi,bp,s1,s2,s3,s4,s5,...,s3^2,s3 s4,s3 s5,s3 s6,s4^2,s4 s5,s4 s6,s5^2,s5 s6,s6^2
0,1.0,0.038076,0.05068,0.061696,0.021872,-0.044223,-0.034821,-0.043401,-0.002592,0.019907,...,0.001884,0.000113,-0.000864,0.000766,7e-06,-5.2e-05,4.6e-05,0.000396,-0.000351,0.000311
1,1.0,-0.001882,-0.044642,-0.051474,-0.026328,-0.008449,-0.019163,0.074412,-0.039493,-0.068332,...,0.005537,-0.002939,-0.005085,-0.006861,0.00156,0.002699,0.003641,0.004669,0.0063,0.008502
2,1.0,0.085299,0.05068,0.044451,-0.00567,-0.045599,-0.034194,-0.032356,-0.002592,0.002861,...,0.001047,8.4e-05,-9.3e-05,0.000839,7e-06,-7e-06,6.7e-05,8e-06,-7.4e-05,0.000672
3,1.0,-0.089063,-0.044642,-0.011595,-0.036656,0.012191,0.024991,-0.036038,0.034309,0.022688,...,0.001299,-0.001236,-0.000818,0.000337,0.001177,0.000778,-0.000321,0.000515,-0.000212,8.8e-05
4,1.0,0.005383,-0.044642,-0.036385,0.021872,0.003935,0.015596,0.008142,-0.002592,-0.031988,...,6.6e-05,-2.1e-05,-0.00026,-0.00038,7e-06,8.3e-05,0.000121,0.001023,0.001492,0.002175


In [53]:
X_train, X_test, y_train, y_test = train_test_split(
    X_2, y, test_size=0.2, random_state=5
)

lr = LinearRegression().fit(X_train, y_train)

In [54]:
print("lr.coef_:{}".format(lr.coef_)) 
print("lr.intercept_:{}".format(lr.intercept_))

lr.coef_:[[-1.52588410e-07  9.10306855e+01 -3.06233456e+02  4.67700330e+02
   3.61591007e+02 -5.94543410e+04  5.22843802e+04  2.18398508e+04
  -1.40239903e+02  2.01536451e+04  1.92020823e+01  1.23445725e+03
   2.21027145e+03 -8.69530526e+02  1.40467810e+03  1.28846366e+03
  -7.01495718e+03  5.24801018e+03  7.79811197e+03  8.93143615e+02
   1.23427225e+03 -1.84918528e+00  8.34183469e+02  1.77882929e+03
   4.23987239e+03 -3.18967964e+03 -2.79394336e+03 -5.62500702e+03
  -3.37373891e+01  1.84539784e+03  2.39341257e+02  4.49071207e+03
  -5.58359864e+03  4.88929685e+03  8.65305784e+02 -6.76633805e+02
   1.50419307e+03  1.45750563e+03 -6.13552032e+02  1.69774076e+04
  -1.23305988e+04 -5.07515091e+03 -5.60684946e+02 -6.35004360e+03
  -3.59802309e+03  2.16828276e+04 -2.21304478e+04 -1.09865431e+04
  -1.80396896e+04  1.69625997e+05 -3.86597695e+03  5.95328464e+03
  -1.41636969e+03  4.12871917e+03 -1.52180536e+05  2.59976232e+03
   3.92491797e+03  1.63587654e+04 -6.74261229e+04  5.68761698e+03
 

In [55]:
# R^2값으로의 평가하기
print("training set score: {}".format(lr.score(X_train, y_train)))
print("test set score: {}".format(lr.score(X_test, y_test)))

print("\n")

# 평균 제곱근 오차(MSE)로 모델 성능 평가하기
y_test_predict = lr.predict(X_test)

mse = mean_squared_error(y_test, y_test_predict)
print("평균 제곱근 오차값: {}".format(mse))

training set score: 0.6018450281347472
test set score: 0.46853924300468697


평균 제곱근 오차값: 3351.203130405066


- train set의 $R^2$값은 증가했으나 test set의 $R^2$값은 오히려 감소   
  → overfitting?
----

### 2. 릿지 회귀(Ridge regression)  
- 회귀를 위한 선형모델. 선형회귀와 같은 예측함수 사용
- Ridge 회귀에섯의 가중치($w$) 선택은...
  1) train data를 잘 예측하기 위함
  2) 추가 제약조건을 만족시키기 << 규제(regularization) >>
     - 가중치의 절대값을 가능한 한 작게 만들기

### 3. 라쏘 회귀(Lasso regression)  
- 회귀를 위한 선형모델. 선형회귀와 같은 예측함수 사용
- Lasso 회귀에섯의 가중치($w$) 선택은...
  1) train data를 잘 예측하기 위함
  2) 추가 제약조건을 만족시키기 << 규제(regularization) >>
     - 가중치의 절대값을 가능한 한 작게 만들기

> #### 규제(regularization) : overfitting 되지 않도록 모델을 강제로 제한하는 것
> - **L2 규제 (Ridge 회귀에서 사용)**
>   - 수학적으로 릿지는 계수의 L2 norm의 제곱을 페널티로 적용한다
>   - $MSE$ 식에 $+\alpha\sum_{j=1}^m w_j^2$ ($\alpha$ 값으로 페널티 효과를 조절)
> - **L1 규제 (Lasso 회귀에서 사용)**
>   - 수학적으로 라쏘는 계수의 L1 norm을 페널티로 적용한다
>   - $MSE$ 식에 $+\alpha\sum_{j=1}^m \left\vert w_j \right\vert$ ($\alpha$ 값으로 페널티 효과를 조절)
>   - **라쏘 회귀시 어떤 계수는 정말로 0이 된다 => 모델에서 완전히 제외되는 특성이 생긴다**  
>     → feature selection이 자동으로 이뤄진다고 볼 수 있음

- 규제(Regularizatiton)의 효과 이해하기  
  1. $\alpha$의 값에 따라 모델의 coef_ 속성이 어떻게 달라지는지 조사해보기  
  2. $\alpha$ 값을 고정하고 training data의 크기를 변화시켜보기  
     - 학습곡선(learning curve) : dataset의 크기에 따른 모델의 성능변화를 나타낸 그래프  
       예) 충분히 많은 데이터 → 규제 항이 덜 중요해져서 릿지 회귀와 선형 회귀의 성능이 같아질 것이다

-----

#### c. 릿지 회귀/라쏘 회귀 실습 

## 2-2) 분류용 선형 모델(linear model)
- 예측을 위한 방정식  
  $\widehat{y} = w[0]*x[0]+w[1]*x[1]+ ... +w[p]*x[p]+b > 0$  
  (특성들의 가중치항을 그냥 사용하지 않고, 예측한 값을 임계치 0과 비교한다. (0보다 작으면 클래스 -1, 0보다 크면 클래스 +1))

- 회귀용 선형 모델에서는 출력 $\widehat{y}$이 특성의 선형함수  
  - (직선, 평면, 초평면 등)  
- 분류용 선형 모델에서는 결정 경계(decision boundary)가 입력의 선형 함수
  - (선, 평면, 초평면을 사용해서 두 개의 클래스를 구분하는 분류기) 

#### 1) 이진 분류(binary classification) 선형 모델  
##### a. 로지스틱 회귀(logistic regression)  

- cf) 선형회귀 : train dataset에 가장 잘 맞는 *일차함수*를 찾는 것
- **로지스틱 회귀** : train dataset에 가장 잘 맞는 *시그모이드 함수*를 찾는 것
- 시그모이드 함수 : $S(x) = \frac{1}{1+e^-x} $
- ```from sklearn.linear_model import LogisticRegression```
 
##### b. 서포트 벡터 머신(support vector machine, SVM) [Link](https://sanghyu.tistory.com/7)
- SVM의 decision rule : 가중치 벡터 w와 직교하면서 margin이 최대가 되는 선형을 찾는다
  - support vector : 두 개의 클래스에 각각 해당되는 data set들의 최외각에 있는 sample들  
  - margin (m) : support vector를 통해 구한 두 카테고리 사이의 거리  
    → 이를 최대화 해야한다 **<margin의 optimization>**  
- ```from sklearn.svm import LinearSVC```

#### 2) 다중 클래스 분류용 선형 모델
##### 
- 로지스틱 회귀를 제외한 많은 선형 분류 모델은 태생적으로 이진 분류만을 지원함
- 로지스틱 회귀는 소프트맥스(softmax) 함수를 사용한 다중 클래스 분류 알고리즘을 지원한다
  - 기존의 시그모이드 함수로 다중클래스 분류시 각각의 클래스에 대해 0 ~ 1 사이값을 출력 → 가장 큰 출력값의 클래스로 분류한다
    *하나의 sample에 대한 예측값으로 모든 가능한 클래스에 대해 정답일 확률의 합이 1이 되도록 구할 수는 없을까? → __softmax 함수__*
- 소프트맥스 회귀(softmax Regression) (또는 다항로지스틱 회귀(Multinomial Logistic Regression))
  - 분류해야 할 클래스가 총 k개일때, k차원의 벡터를 입력받아 각 클래스에 대한 확률을 추정한다
    - 범주 k에 대한 점수 $s_k(x) = (\theta^{(k)})^{T}x$
      → 소프트맥스함수에 통과시켜 범주 k에 속할 확률($\widehat{p}_k$)을 구하기  
  - $\widehat{p}_k = \sigma(s(x))_k = \frac{e^{s_j(x)}}{\sum_{j=1}^K e^{s_j(x)}}$