# 선형모델의 성능 측정

## Mean Absolute Error (잔차 절대값들의 sum)

MAE = $\frac{1}{m} \sum_{i=1}^m \left\lvert y_{i} - \hat{y_{i}} \right\rvert = \frac{1}{m} \sum_{i=1}^m \left\lvert e_{i} \right\rvert$

In [1]:
from sklearn.metrics import mean_absolute_error

y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]
mean_absolute_error(y_true, y_pred)

0.5

## RMSE (잔차들의 제곱의 sum의 root를 취한 값)

RMSE = $\sqrt{\frac{1}{m}\sum_{i=1}^m(y_{i} - \hat{y_{i}})^2}$

sklearn에는 mse가 있기 떄문에 root는 직접 취해줘야함

In [4]:
from sklearn.metrics import mean_squared_error

y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]
# root 안씌운 상태!
mean_squared_error(y_true, y_pred), (mean_squared_error(y_true, y_pred))**1/2


(0.375, 0.1875)

## R squared (0~1 사이의 숫자로 클수록 높은 적합도)

$R^{2} = 1 - \frac{ \Sigma_{i}( y_{i} - \hat{y_{i}} )^{2} }{ \Sigma_{i}( y_{i} - \mu )^{2}}$

In [12]:
from sklearn.metrics import r2_score

y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]

r2_score(y_true, y_pred) 

0.9486081370449679

## train, test 나누기

In [6]:
import numpy as np
from sklearn.model_selection import train_test_split

X, y = np.arange(10).reshape((5,2)), range(5)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.33, random_state = 42)

In [11]:
X, y

(array([[0, 1],
        [2, 3],
        [4, 5],
        [6, 7],
        [8, 9]]),
 range(0, 5))

In [7]:
X_train

array([[4, 5],
       [0, 1],
       [6, 7]])

In [8]:
X_test

array([[2, 3],
       [8, 9]])

In [9]:
y_train

[2, 0, 3]

In [10]:
y_test

[1, 4]