## 1.3 다중 회귀
- 서로 다른 특성의 데이터를 여러개 사용한 선형회귀

    - ex) 농어의 길이, 높이, 두께 여러개 활용
    - ax + by + cz + d
- 특성 공학 : 기존 특성을 활용해 새로운 특성을 만들어냄.
    - ex) 농어의 길이 * 높이
    - #### a * 길이 + b * 높이 + c * 너비 + d * 길이^2 + e * 높이 * 길이

In [2]:
import pandas as pd

df = pd.read_csv('data/Fish.csv')
perch_df = df.loc[df['Species'] == 'Perch']
perch_df.head()

Unnamed: 0,Species,Weight,Length1,Length2,Length3,Height,Width
72,Perch,5.9,7.5,8.4,8.8,2.112,1.408
73,Perch,32.0,12.5,13.7,14.7,3.528,1.9992
74,Perch,40.0,13.8,15.0,16.0,3.824,2.432
75,Perch,51.5,15.0,16.2,17.2,4.5924,2.6316
76,Perch,70.0,15.7,17.4,18.5,4.588,2.9415


In [3]:
perch_full = perch_df[['Length2', 'Height', 'Width']] # 길이, 높이, 두께 
perch_weight = perch_df[['Weight']]

In [4]:
# 1. 

from sklearn.model_selection import train_test_split
train_input, test_input, train_target, test_target = \
train_test_split(perch_full, perch_weight)

#### #2. from sklearn.preprocessing import PolynomialFeatures
- 특성을 부여하면 특성으로 조합할 수 있는 여러가지 경우의 수 제공

In [7]:
# 2-1 숫자를 직접 입력한 경우

from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures() #괄호 안에 (include_bias = False)를 하면 1을 빼줌.
poly.fit([[3,5]])
poly.transform([[3,5]])

array([[ 1.,  3.,  5.,  9., 15., 25.]])

In [9]:
# 2-2 특성공학 활용

poly = PolynomialFeatures(include_bias = False) 
poly.fit(train_input)
train_poly = poly.transform(train_input)
train_poly.shape

(42, 9)

In [10]:
# 2-3 조합이 어떻게 이루어진지 확인

poly.get_feature_names_out() # 9개

array(['Length2', 'Height', 'Width', 'Length2^2', 'Length2 Height',
       'Length2 Width', 'Height^2', 'Height Width', 'Width^2'],
      dtype=object)

In [11]:
# 3. 회귀

from sklearn.linear_model import LinearRegression

lr = LinearRegression()
lr.fit(train_poly, train_target)
lr.score(train_poly, train_target)

0.9911701415308894

In [12]:
test_poly = poly.transform(test_input)

In [13]:
lr.score(test_poly, test_target)

0.9737023817158488

### - 더 많은 가짓수 만들경우 

In [14]:
poly = PolynomialFeatures(degree = 5, include_bias = False)
poly.fit(train_input)
train_poly = poly.transform(train_input)
test_poly = poly.transform(test_input)
train_poly.shape

(42, 55)

In [15]:
lr.fit(train_poly, train_target)

In [16]:
print(lr.score(train_poly, train_target))
print(lr.score(test_poly, test_target))

0.9999999999691369
-5698.402870551511


### - 규제
- 훈련 세트의 과도한 학습 방지 (과대 적합 방지) 목적
    - 정규화 : from sklearn.preprocessing import StandardScaler

In [17]:
from sklearn.preprocessing import StandardScaler

ss = StandardScaler()
ss.fit(train_poly)

train_scaled = ss.transform(train_poly)
test_scaled = ss.transform(test_poly)

In [18]:
train_scaled[:5]

array([[ 0.75464299,  1.24622159,  1.66315334,  0.65506815,  0.96013509,
         1.22868418,  1.26763617,  1.55405021,  1.83796444,  0.52959881,
         0.7554657 ,  0.9585061 ,  0.99481568,  1.21696807,  1.44490544,
         1.24765653,  1.48883474,  1.73476266,  1.98206741,  0.3984709 ,
         0.5763385 ,  0.73795613,  0.76966137,  0.94907785,  1.13627571,
         0.97905683,  1.17707627,  1.38271962,  1.59381822,  1.20473421,
         1.42182441,  1.64604666,  1.87483223,  2.10535854,  0.272209  ,
         0.41566556,  0.54679537,  0.57405455,  0.72092631,  0.87572021,
         0.74832641,  0.91212321,  1.08416009,  1.26307028,  0.93917975,
         1.12092602,  1.31105284,  1.50788593,  1.7094651 ,  1.14696084,
         1.34741227,  1.55611585,  1.77101788,  1.98975735,  2.20976054],
       [-0.47818962, -0.23396854, -0.62181953, -0.57229646, -0.47125511,
        -0.63768418, -0.37927373, -0.54728005, -0.6791729 , -0.62055423,
        -0.57135559, -0.65681008, -0.5247271 , -0.

## 1.4 규제 적용한 선형 회귀 분석
- 계수의 크기를 줄이는 것이 목적 

- 릿지  : 계수를 제곱한 값을 기준으로 규제 적용 (선호)
    - ridge = Ridge(alpha = alpha)                          #최적인 알파값을 찾아야 함. 
- 라쏘 : 절댓값을 기준으로 규제 적용
    - 아예 계수를 0으로 만들 수 있음.

In [21]:
# 1. 릿지

from sklearn.linear_model import Ridge

ridge = Ridge(alpha = 0.1)
ridge.fit(train_scaled, train_target)

print(ridge.score(train_scaled, train_target))
print(ridge.score(test_scaled, test_target))

0.9914580435929138
0.9805551568695215


In [25]:
# 2. 라쏘

from sklearn.linear_model import Lasso

lasso = Lasso(alpha = 0.1)
lasso.fit(train_scaled, train_target)

print(lasso.score(train_scaled, train_target))
print(lasso.score(test_scaled, test_target))

0.9911174144497916
0.9805792102845133


  model = cd_fast.enet_coordinate_descent(
