# data scaling
- 표준화 스케일링(standard scaling)
    - 평균0, 표준편차 1이 되도록 변경하는 방법
- robust scaling
    - 중앙값과 사분위수를 이용해 표준화 스케일을 변경하여 극단값의 영향을 거의 받지 않음
- min-max scaling
    - 데이터의 최댓값이 1, 최솟값이 0으로 범위 제한
- normal scaling(normalization)
    - 벡터의 유클리디안 길이가 1이 되도록 데이터값 변경
    - 주로 벡터 길이는 상관없고 방향만 고려할 때 사용
    - 다른 스케일링 기법과 다르게 행 기준


In [1]:
from sklearn.datasets import load_breast_cancer

# 유방암 데이터 세트 불러오기
data = load_breast_cancer()

# 데이터 정보 확인
print(data.DESCR)

# 특징(features) 확인
print(data.feature_names)

# 타겟(target) 확인
print(data.target)


.. _breast_cancer_dataset:

Breast cancer wisconsin (diagnostic) dataset
--------------------------------------------

**Data Set Characteristics:**

    :Number of Instances: 569

    :Number of Attributes: 30 numeric, predictive attributes and the class

    :Attribute Information:
        - radius (mean of distances from center to points on the perimeter)
        - texture (standard deviation of gray-scale values)
        - perimeter
        - area
        - smoothness (local variation in radius lengths)
        - compactness (perimeter^2 / area - 1.0)
        - concavity (severity of concave portions of the contour)
        - concave points (number of concave portions of the contour)
        - symmetry
        - fractal dimension ("coastline approximation" - 1)

        The mean, standard error, and "worst" or largest (mean of the three
        worst/largest values) of these features were computed for each image,
        resulting in 30 features.  For instance, field 0 is Mean Radi

In [5]:
import pandas as pd
# 데이터프레임 생성
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target
df.head()

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension,target
0,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,...,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,0
1,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,0.05667,...,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902,0
2,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,0.05999,...,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758,0
3,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,0.09744,...,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173,0
4,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,0.05883,...,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678,0


In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 569 entries, 0 to 568
Data columns (total 31 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   mean radius              569 non-null    float64
 1   mean texture             569 non-null    float64
 2   mean perimeter           569 non-null    float64
 3   mean area                569 non-null    float64
 4   mean smoothness          569 non-null    float64
 5   mean compactness         569 non-null    float64
 6   mean concavity           569 non-null    float64
 7   mean concave points      569 non-null    float64
 8   mean symmetry            569 non-null    float64
 9   mean fractal dimension   569 non-null    float64
 10  radius error             569 non-null    float64
 11  texture error            569 non-null    float64
 12  perimeter error          569 non-null    float64
 13  area error               569 non-null    float64
 14  smoothness error         5

In [10]:
# standard scaling
from sklearn.preprocessing import StandardScaler

print(df['mean area'])
std = StandardScaler()
std.fit(df['mean area'])
x_std = std.transform(df[['mean area']])
print(x_std)

0      1001.0
1      1326.0
2      1203.0
3       386.1
4      1297.0
        ...  
564    1479.0
565    1261.0
566     858.1
567    1265.0
568     181.0
Name: mean area, Length: 569, dtype: float64


ValueError: Expected 2D array, got 1D array instead:
array=[1001.  1326.  1203.   386.1 1297.   477.1 1040.   577.9  519.8  475.9
  797.8  781.  1123.   782.7  578.3  658.8  684.5  798.8 1260.   566.3
  520.   273.9  704.4 1404.   904.6  912.7  644.8 1094.   732.4  955.1
 1088.   440.6  899.3 1162.   807.2  869.5  633.   523.8  698.8  559.2
  563.   371.1 1104.   545.2  531.5 1076.   201.9  534.6  449.3  561.
  427.9  571.8  437.6 1033.   712.8  409.  1152.   656.9  527.2  224.5
  311.9  221.8  645.7  260.9  499.   668.3  269.4  394.1  250.5  502.5
 1130.   244.   929.4  584.1  470.9  817.7  559.2 1006.  1245.   506.3
  401.5  520.  1878.  1132.   443.3 1075.   648.2 1076.   466.1  651.9
  662.7  728.2  551.7  555.1  705.6 1264.   451.1  294.5  412.6  642.5
  582.7  143.5  458.7  298.3  336.1  530.2  412.5  466.7 1509.   396.5
  290.2  480.4  629.9  334.2  230.9  438.6  245.2  682.5  782.6  982.
  403.3 1077.  1761.   640.7  553.5  588.7  572.6 1138.   674.5 1192.
  455.8  748.9  809.8  761.7 1075.   506.3  423.6  399.8  678.1  384.8
  288.5  813.   398.   512.2  355.3  432.8  432.   689.5  640.1  585.
  519.4  203.9  300.2  381.9  538.9  460.3  963.7  880.2  448.6  366.8
  419.8 1157.  1214.   464.5 1686.   690.2  357.6  886.3  984.6  685.9
  464.1  565.4  736.9  372.7  349.6  227.2  302.4  832.9  526.4  508.8
 2250.  1311.   766.6  402.   710.6  317.5 1041.   420.3  428.9  463.7
  609.9  507.4  288.1  477.4  671.4  516.4  588.9 1024.  1148.   642.7
  461.   951.6 1685.   597.8  481.9  716.6  295.4  904.3  529.4  725.5
 1290.   428.  2499.   948.   610.7  578.9  432.2  321.2 1230.  1223.
  568.9  561.3  313.1  761.3  546.4  641.2  329.6  684.5  496.4  503.2
  895.   395.7  386.8 1319.   279.6  603.4 1670.  1306.   623.9  920.6
  575.3  476.5  389.4  590.  1155.   337.7  541.6  512.2  347.   406.3
 1364.   407.4 1206.   928.2 1169.   602.4 1207.   713.3  773.5  744.9
 1288.   933.1  947.8  758.6  928.3 1419.   346.4  561.   512.2  344.9
  632.6  388.  1491.   289.9  998.9  435.6  396.6 1102.   572.3  587.4
 1138.   427.3 1145.   805.1  516.6  489.   441.   515.9  394.1  396.
  651.   687.3  513.7  432.7  492.1  582.7  363.7  431.1  633.1  334.2
 1217.   471.3 1247.   334.3  403.1  417.2  537.3  246.3  566.2  530.6
  418.7  664.9  504.1  409.1  221.2  481.6  461.4 1027.   244.5  477.3
  324.2 1274.   504.8 1264.   457.9  489.9  616.5  446.   813.7  826.8
  793.2  514.   387.3  390.   464.4  918.6  514.3 1092.   310.8 1747.
  641.2  280.5  373.9 1194.   420.3  321.6  445.3  668.7  402.7  426.7
  421.   758.6 2010.   716.6  384.6  485.8  512.   593.7  241.   278.6
  491.9  546.1  496.6  838.1  552.4 1293.  1234.   458.4 1546.  1482.
  840.4  711.8 1386.  1335.   579.1  788.5  338.3  562.1  580.6  361.6
  386.3  372.7  447.8  462.9  541.8  664.7  462.   596.6  392.  1174.
  321.6  234.3  744.7 1407.   446.2  609.1  558.1  508.3  378.2  431.9
  994.   442.7  525.2  507.6  469.1  370.   800.   514.5  991.7  466.1
  399.8  373.2  268.8  693.7  719.5  433.8  271.2  803.1  495.   380.3
  409.7  656.1  408.2  575.3  289.7  307.3  333.6  359.9  381.1  501.3
  685.   467.8 1250.  1110.   673.7  599.5  509.2  611.2  592.6  606.5
  371.5  928.8  585.9  340.9  990.   441.3  981.6  674.8  659.7 1384.
  432.  1191.   442.5  644.2  492.9  557.2  415.1  537.9  520.2  290.9
  930.9 2501.   646.1  412.7  537.3  542.9  536.9  286.3  980.5  408.8
  289.1  449.9  686.9  465.4  358.9  506.9  618.4  599.4  404.9  815.8
  455.3  602.9  546.3  571.1  747.2  476.7  666.  1167.   420.5  857.6
  466.5  992.1 1007.   477.3  538.7  680.9  485.6  480.1 1068.  1320.
  689.4  595.9  476.3 1682.   248.7  272.5  453.1  366.5  819.8  731.3
  426.   680.7  556.7  658.8  701.9  391.2 1052.  1214.   493.1  493.8
  257.8 1841.   388.1  571.   293.2  221.3  551.1  468.5  594.2  445.2
  422.9  416.2  575.5 1299.   365.6 1308.   629.8  406.4  178.8  170.4
  402.9  656.4  668.6  538.4  584.8  573.2  324.9  320.8  285.7  361.6
  360.5  378.4  507.9  264.   514.3  321.4  311.7  271.3  657.1  403.5
  600.4  386.   716.9 1347.  1479.  1261.   858.1 1265.   181. ].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

In [11]:
# 로버스트 스케일링
from sklearn.preprocessing import RobustScaler

robust = RobustScaler
robust.fit(df['mean area'])
x_robust = robust.transform(df[['mean area']])
print(df['mean area'])
print(x_robust)

AttributeError: 'Series' object has no attribute '_validate_params'