# 데이터 스케일링

- scaling 종류
scikit-learn에서는 다음과 같은 스케일링 클래스를 제공합니다.



<br>

- StandardScaler: 평균이 0과 표준편차가 1이 되도록 변환.
```python 
from sklearn.preprocessing import StandardScaler
```

- MinMaxScaler: 최대값이 각각 1, 최소값이 -1 이 되도록 변환
```python
from sklearn.preprocessing import MinMaxScaler
```

- RobustScaler(X): 중앙값(median)이 0, IQR(interquartile range)이 1이 되도록 변환.
```python
from sklearn.preprocessing import RobustScaler
```
- MaxAbsScaler(X): 0을 기준으로 절대값이 가장 큰 수가 1또는 -1이 되도록 변환
```python
from sklearn.preprocessing import MaxAbsScaler
```

In [4]:
# 붓꽃 데이터로 학습해보기 
import numpy as np 
import pandas as pd 

from sklearn.datasets import load_iris
iris = load_iris() # 붓꽃 데이터를 가져옴 

# 데이터 프레임으로.. 
iris_df = pd.DataFrame(data=iris.data,
                       columns=iris.feature_names)
iris_df['label'] = iris.target
iris_df

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),label
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0
...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,2
146,6.3,2.5,5.0,1.9,2
147,6.5,3.0,5.2,2.0,2
148,6.2,3.4,5.4,2.3,2


In [5]:
iris_df.head(3)

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),label
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0


In [6]:
X = iris_df.drop('label',axis=1)
X.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2


<div class="alert alert-success" data-title="">
  <h2><i class="fa fa-tasks" aria-hidden="true"></i> StandardScaler 사용하여 스케일링
  </h2>
</div>


In [7]:
from sklearn.preprocessing import StandardScaler
standard_scaler = StandardScaler() # 정의 
X_scaled = standard_scaler.fit_transform(X) # 변환 
# 변환값은 numpy.array

In [13]:
# numpy.array -->  dataFrame 으로 
pd.DataFrame(data=X_scaled,
             columns=X.columns)

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
0,-0.900681,1.019004,-1.340227,-1.315444
1,-1.143017,-0.131979,-1.340227,-1.315444
2,-1.385353,0.328414,-1.397064,-1.315444
3,-1.506521,0.098217,-1.283389,-1.315444
4,-1.021849,1.249201,-1.340227,-1.315444
...,...,...,...,...
145,1.038005,-0.131979,0.819596,1.448832
146,0.553333,-1.282963,0.705921,0.922303
147,0.795669,-0.131979,0.819596,1.053935
148,0.432165,0.788808,0.933271,1.448832


<div class="alert alert-success" data-title="">
  <h2><i class="fa fa-tasks" aria-hidden="true"></i>  MinMaxScaler 사용하여 스케일링
  </h2>
</div>

```python
from sklearn.preprocessing import MinMaxScaler
```

In [39]:
# X 를 minmaxscaling 
from sklearn.preprocessing import MinMaxScaler
mm_scaler = MinMaxScaler(feature_range=(-1, 1)) # 정의 
# 변환 == fit_transform
X_mn = mm_scaler.fit_transform(X) # X_mn: numpy 

In [40]:
X_mn.min() # X_mn 에서 최소값 뽑기 

-1.0

In [41]:
X_mn.max() # X_mn 에서 최대값 뽑기 

1.0000000000000002

In [42]:
X_mn.mean() # 평균값

-0.10261377903327057

In [32]:
tmp = pd.DataFrame(X_mn, columns=X.columns)
tmp

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
0,22.222222,62.500000,6.779661,4.166667
1,16.666667,41.666667,6.779661,4.166667
2,11.111111,50.000000,5.084746,4.166667
3,8.333333,45.833333,8.474576,4.166667
4,19.444444,66.666667,6.779661,4.166667
...,...,...,...,...
145,66.666667,41.666667,71.186441,91.666667
146,55.555556,20.833333,67.796610,75.000000
147,61.111111,41.666667,71.186441,79.166667
148,52.777778,58.333333,74.576271,91.666667


<div class="alert alert-success" data-title="">
  <h2><i class="fa fa-tasks" aria-hidden="true"></i>  RobustScaler 사용하여 스케일링
  </h2>
</div>

```python
from sklearn.preprocessing import RobustScaler
```

In [None]:
from sklearn.preprocessing import RobustScaler
scaler = RobustScaler()
X_scaled = scaler.fit_transform(X)

<div class="alert alert-success" data-title="">
  <h2><i class="fa fa-tasks" aria-hidden="true"></i>  MaxAbsScaler 사용하여 스케일링
  </h2>
</div>

```python
from sklearn.preprocessing import MaxAbsScaler
```

In [None]:
from sklearn.preprocessing import MaxAbsScaler
scaler = MaxAbsScaler()
X_scaled = scaler.fit_transform(X)