# 第5回課題
第5回課題では，以下の項目を満たすように分析してください．

明確な解答は用意していませんが，少なくとも間違っていないと言える推論をするようにしてください．

- データの取得
- データの確認
 - 特徴量の確認
 - ラベルの確認
- 仮説を立てる
- データの分析
- モデルの訓練
- 予測
- 精度の確認
- 仮説の確認

# データの取得
wine data 以外を利用してデータ分析してください．

なるべく，ラベルと属性から推論が立てやすいデータを選択するとやりやすいです．

In [1]:
from sklearn.datasets import load_breast_cancer
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

SEED = 64

In [2]:
# https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_breast_cancer.html より、肺がん患者の悪性分類

data = load_breast_cancer()

# データの確認
データ分析以前にわかることを確認．特徴量の属性やドメイン知識など
- 特徴量の確認
- ラベルの確認

In [3]:
data.keys()

dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names', 'filename'])

## dataについて

データ数: 569
特徴量数: 30
目的変数: [0,1] の 2値分類


In [4]:
# 特徴量
X = pd.DataFrame(data.data, columns=data.feature_names)
X.columns

Index(['mean radius', 'mean texture', 'mean perimeter', 'mean area',
       'mean smoothness', 'mean compactness', 'mean concavity',
       'mean concave points', 'mean symmetry', 'mean fractal dimension',
       'radius error', 'texture error', 'perimeter error', 'area error',
       'smoothness error', 'compactness error', 'concavity error',
       'concave points error', 'symmetry error', 'fractal dimension error',
       'worst radius', 'worst texture', 'worst perimeter', 'worst area',
       'worst smoothness', 'worst compactness', 'worst concavity',
       'worst concave points', 'worst symmetry', 'worst fractal dimension'],
      dtype='object')

## 特徴量について

https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic) より、


a) radius (mean of distances from center to points on the perimeter) → がんの大体の半径

b) texture (standard deviation of gray-scale values) → レントゲン写真か何かのグレースケールの標準偏差？

c) perimeter → 周の長さ

d) area → 面積

e) smoothness (local variation in radius lengths) → 滑らかさ(半径の長さの局所変動) 外径に影響

f) compactness (perimeter^2 / area - 1.0) → 密度のようなもの

g) concavity (severity of concave portions of the contour) → 外径のくぼみのひどさ？

h) concave points (number of concave portions of the contour) → 凹んでいる部分の数

i) symmetry → 対称性?

j) fractal dimension ("coastline approximation" - 1) → フラクタル次元


## 平均、誤差、最悪の場合の関係性について



In [5]:
# 誤差がどのように計算されているのか不明
X['mean area'][0], X["area error"][0], X["worst area"][0]

(1001.0, 153.4, 2019.0)

In [6]:
X['mean compactness'][0], X["compactness error"][0], X["worst compactness"][0]

(0.2776, 0.04904, 0.6656)

## ドメイン知識として
悪性腫瘍ほど

境界がわかりにくい

・smoothnessが小さい?

・面積に対して周の長さが大きい → compactness 大?

・fractal dimension も大きそう
                 
広がる速度が早い

・textureに影響?

対称性, 凹みに関する情報以外はわかりやすそう

In [7]:
# 目的変数
y = pd.Series(data.target, name="target")

# 仮説を立てる
データについて，導きたい仮説と結論を整理してください．

ドメイン知識を基に、平均compactness, fractal dimension, texture が大きく、smoothnessが小さいほど、悪性である可能性が高そう

# データの分析

In [8]:
df = X.join(y)
df.describe()

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension,target
count,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,...,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0
mean,14.127292,19.289649,91.969033,654.889104,0.09636,0.104341,0.088799,0.048919,0.181162,0.062798,...,25.677223,107.261213,880.583128,0.132369,0.254265,0.272188,0.114606,0.290076,0.083946,0.627417
std,3.524049,4.301036,24.298981,351.914129,0.014064,0.052813,0.07972,0.038803,0.027414,0.00706,...,6.146258,33.602542,569.356993,0.022832,0.157336,0.208624,0.065732,0.061867,0.018061,0.483918
min,6.981,9.71,43.79,143.5,0.05263,0.01938,0.0,0.0,0.106,0.04996,...,12.02,50.41,185.2,0.07117,0.02729,0.0,0.0,0.1565,0.05504,0.0
25%,11.7,16.17,75.17,420.3,0.08637,0.06492,0.02956,0.02031,0.1619,0.0577,...,21.08,84.11,515.3,0.1166,0.1472,0.1145,0.06493,0.2504,0.07146,0.0
50%,13.37,18.84,86.24,551.1,0.09587,0.09263,0.06154,0.0335,0.1792,0.06154,...,25.41,97.66,686.5,0.1313,0.2119,0.2267,0.09993,0.2822,0.08004,1.0
75%,15.78,21.8,104.1,782.7,0.1053,0.1304,0.1307,0.074,0.1957,0.06612,...,29.72,125.4,1084.0,0.146,0.3391,0.3829,0.1614,0.3179,0.09208,1.0
max,28.11,39.28,188.5,2501.0,0.1634,0.3454,0.4268,0.2012,0.304,0.09744,...,49.54,251.2,4254.0,0.2226,1.058,1.252,0.291,0.6638,0.2075,1.0


In [61]:
mean_columns = pd.Index(data.feature_names[:10])
error_columns = pd.Index(data.feature_names[10:20])
worst_columns = pd.Index(data.feature_names[20:30])

In [9]:
# 水準数が (0, 1)の 2なので、単に相関を取ってみる
corr_target = pd.Series([df[col].corr(df.target) for col in df.columns], index=df.columns)

In [10]:
corr_target.abs().sort_values(ascending=False)

target                     1.000000
worst concave points       0.793566
worst perimeter            0.782914
mean concave points        0.776614
worst radius               0.776454
mean perimeter             0.742636
worst area                 0.733825
mean radius                0.730029
mean area                  0.708984
mean concavity             0.696360
worst concavity            0.659610
mean compactness           0.596534
worst compactness          0.590998
radius error               0.567134
perimeter error            0.556141
area error                 0.548236
worst texture              0.456903
worst smoothness           0.421465
worst symmetry             0.416294
mean texture               0.415185
concave points error       0.408042
mean smoothness            0.358560
mean symmetry              0.330499
worst fractal dimension    0.323872
compactness error          0.292999
concavity error            0.253730
fractal dimension error    0.077972
smoothness error           0

## わかったこと

・目的変数と意味のわからなかった凹みに関する特徴量の相関関係が強い

・errorの値は比較的相関関係が弱い

・平均と最悪の値の相関関係は割とバラバラ

In [63]:
# 平均と最悪を分けてみる
corr_target_mean = corr_target[mean_columns]
corr_target_worst = corr_target[worst_columns]

In [64]:
corr_target_mean.abs().sort_values(ascending=False)

mean concave points       0.776614
mean perimeter            0.742636
mean radius               0.730029
mean area                 0.708984
mean concavity            0.696360
mean compactness          0.596534
mean texture              0.415185
mean smoothness           0.358560
mean symmetry             0.330499
mean fractal dimension    0.012838
dtype: float64

In [65]:
corr_target_worst.abs().sort_values(ascending=False)

worst concave points       0.793566
worst perimeter            0.782914
worst radius               0.776454
worst area                 0.733825
worst concavity            0.659610
worst compactness          0.590998
worst texture              0.456903
worst smoothness           0.421465
worst symmetry             0.416294
worst fractal dimension    0.323872
dtype: float64

In [66]:
corr_target_mean.abs().sort_values(ascending=False).values < corr_target_worst.abs().sort_values(ascending=False).values

array([ True,  True,  True,  True, False, False,  True,  True,  True,
        True])

## わかったこと
・ mean と worst の相関の強さの順番が完全に一致 → どちらか一方だけの利用でも良さそう？

・ mean と worst の相関の大小関係は concavity と compactness 以外 worst の方が相関が強い

In [67]:
# 一応 error に関しても同じことを行う
corr_target_error = corr_target[error_columns]

In [68]:
corr_target_error.abs().sort_values(ascending=False)

radius error               0.567134
perimeter error            0.556141
area error                 0.548236
concave points error       0.408042
compactness error          0.292999
concavity error            0.253730
fractal dimension error    0.077972
smoothness error           0.067016
texture error              0.008303
symmetry error             0.006522
dtype: float64

何も読みとれなかった。

# モデルの訓練
データは分割するようにしてください．交叉検証についてはまだ説明したいので自由にしてください．

また，モデルの訓練およびデータ分割時に乱数の指定がないなど再現性がなさそうなコードは認めないことにします．

モデルは自由ですが，sklearn を想定しています．自作ライブラリも可ですが，自作の場合，該当箇所は適当に読みます．

In [32]:
# RandomForest と SVM を利用
# 評価は日分類でよくみる F1値 を利用
# GridSearchCV の使い方は理解してるつもりですが、理論はわかってないです
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import f1_score, accuracy_score

In [18]:
# SVMを使うため、特徴量を正規化
scaler = MinMaxScaler()
X = scaler.fit_transform(X)
X = pd.DataFrame(X, columns=data.feature_names)

In [19]:
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=SEED)

In [23]:
# Random Forest
rf_base = RandomForestClassifier(n_jobs=4, random_state=SEED)
rf_param_grid = {"n_estimators": (100, ), "max_depth": (3, 5, 7), "max_features": (0.5, 0.7, 0.9),
             "class_weight": (None, "balanced")}
rf_clf = GridSearchCV(rf_base, rf_param_grid, "f1", cv=5, verbose=2)

In [24]:
rf_clf.fit(X_train, y_train)

Fitting 5 folds for each of 18 candidates, totalling 90 fits
[CV] class_weight=None, max_depth=3, max_features=0.5, n_estimators=100 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  class_weight=None, max_depth=3, max_features=0.5, n_estimators=100, total=   2.2s
[CV] class_weight=None, max_depth=3, max_features=0.5, n_estimators=100 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    2.2s remaining:    0.0s


[CV]  class_weight=None, max_depth=3, max_features=0.5, n_estimators=100, total=   0.3s
[CV] class_weight=None, max_depth=3, max_features=0.5, n_estimators=100 
[CV]  class_weight=None, max_depth=3, max_features=0.5, n_estimators=100, total=   0.3s
[CV] class_weight=None, max_depth=3, max_features=0.5, n_estimators=100 
[CV]  class_weight=None, max_depth=3, max_features=0.5, n_estimators=100, total=   0.3s
[CV] class_weight=None, max_depth=3, max_features=0.5, n_estimators=100 
[CV]  class_weight=None, max_depth=3, max_features=0.5, n_estimators=100, total=   0.3s
[CV] class_weight=None, max_depth=3, max_features=0.7, n_estimators=100 
[CV]  class_weight=None, max_depth=3, max_features=0.7, n_estimators=100, total=   0.3s
[CV] class_weight=None, max_depth=3, max_features=0.7, n_estimators=100 
[CV]  class_weight=None, max_depth=3, max_features=0.7, n_estimators=100, total=   0.3s
[CV] class_weight=None, max_depth=3, max_features=0.7, n_estimators=100 
[CV]  class_weight=None, max_depth

[Parallel(n_jobs=1)]: Done  90 out of  90 | elapsed:   32.3s finished


GridSearchCV(cv=5, error_score=nan,
             estimator=RandomForestClassifier(bootstrap=True, ccp_alpha=0.0,
                                              class_weight=None,
                                              criterion='gini', max_depth=None,
                                              max_features='auto',
                                              max_leaf_nodes=None,
                                              max_samples=None,
                                              min_impurity_decrease=0.0,
                                              min_impurity_split=None,
                                              min_samples_leaf=1,
                                              min_samples_split=2,
                                              min_weight_fraction_leaf=0.0,
                                              n_estimators=100, n_jobs=4,
                                              oob_score=False, random_state=64,
                                     

In [41]:
# svm
svc_base = SVC(random_state=SEED)
svc_param_grid = {
    "C": (0.01, 0.1, 1.0, 10.0, 100.0), "kernel": ('linear', 'poly', 'rbf', 'sigmoid'), 
    "gamma": ("auto", "scale")
}
svc_clf = GridSearchCV(svc_base, svc_param_grid, "f1", verbose=2, cv=5)

In [42]:
svc_clf.fit(X_train, y_train)

Fitting 5 folds for each of 40 candidates, totalling 200 fits
[CV] C=0.01, gamma=auto, kernel=linear ...............................
[CV] ................ C=0.01, gamma=auto, kernel=linear, total=   0.0s
[CV] C=0.01, gamma=auto, kernel=linear ...............................
[CV] ................ C=0.01, gamma=auto, kernel=linear, total=   0.0s
[CV] C=0.01, gamma=auto, kernel=linear ...............................
[CV] ................ C=0.01, gamma=auto, kernel=linear, total=   0.0s
[CV] C=0.01, gamma=auto, kernel=linear ...............................
[CV] ................ C=0.01, gamma=auto, kernel=linear, total=   0.0s
[CV] C=0.01, gamma=auto, kernel=linear ...............................
[CV] ................ C=0.01, gamma=auto, kernel=linear, total=   0.0s
[CV] C=0.01, gamma=auto, kernel=poly .................................
[CV] .................. C=0.01, gamma=auto, kernel=poly, total=   0.0s
[CV] C=0.01, gamma=auto, kernel=poly .................................
[CV] ..........

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s


[CV] ................... C=0.01, gamma=auto, kernel=rbf, total=   0.0s
[CV] C=0.01, gamma=auto, kernel=sigmoid ..............................
[CV] ............... C=0.01, gamma=auto, kernel=sigmoid, total=   0.0s
[CV] C=0.01, gamma=auto, kernel=sigmoid ..............................
[CV] ............... C=0.01, gamma=auto, kernel=sigmoid, total=   0.0s
[CV] C=0.01, gamma=auto, kernel=sigmoid ..............................
[CV] ............... C=0.01, gamma=auto, kernel=sigmoid, total=   0.0s
[CV] C=0.01, gamma=auto, kernel=sigmoid ..............................
[CV] ............... C=0.01, gamma=auto, kernel=sigmoid, total=   0.0s
[CV] C=0.01, gamma=auto, kernel=sigmoid ..............................
[CV] ............... C=0.01, gamma=auto, kernel=sigmoid, total=   0.0s
[CV] C=0.01, gamma=scale, kernel=linear ..............................
[CV] ............... C=0.01, gamma=scale, kernel=linear, total=   0.0s
[CV] C=0.01, gamma=scale, kernel=linear ..............................
[CV] .

[Parallel(n_jobs=1)]: Done 200 out of 200 | elapsed:    1.9s finished


GridSearchCV(cv=5, error_score=nan,
             estimator=SVC(C=1.0, break_ties=False, cache_size=200,
                           class_weight=None, coef0=0.0,
                           decision_function_shape='ovr', degree=3,
                           gamma='scale', kernel='rbf', max_iter=-1,
                           probability=False, random_state=64, shrinking=True,
                           tol=0.001, verbose=False),
             iid='deprecated', n_jobs=None,
             param_grid={'C': (0.01, 0.1, 1.0, 10.0, 100.0),
                         'gamma': ('auto', 'scale'),
                         'kernel': ('linear', 'poly', 'rbf', 'sigmoid')},
             pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
             scoring='f1', verbose=2)

# 予測

In [44]:
# Random Forest
rf_y_pred = rf_clf.predict(X_test)
rf_y_pred = pd.Series(rf_y_pred)

In [45]:
# SVC
svc_y_pred = svc_clf.predict(X_test)
svc_y_pred = pd.Series(svc_y_pred)

# 精度の確認

In [31]:
# Random Forest
f1_score(y_test, rf_y_pred)

0.9583333333333334

In [33]:
accuracy_score(y_test, rf_y_pred)

0.9440559440559441

In [46]:
# SVC
f1_score(y_test, svc_y_pred)

0.9746192893401014

In [47]:
accuracy_score(y_test, svc_y_pred)

0.965034965034965

どちらも高い値に見える。
Random ForestよりSVCの方が少し値が高い

# 仮説の確認
最初に立てた仮説を確認してください．

予測傾向など精度ベースでも仮説を確認することはできますが，決定木や可視化を用いる方が楽かもしれません．

In [48]:
# Random Forestの Feature Importances を調べる

rf_importances = pd.Series(rf_clf.best_estimator_.feature_importances_, index=data.feature_names)
rf_importances.sort_values(ascending=False)

worst concave points       0.320178
worst perimeter            0.176677
worst area                 0.139982
mean concave points        0.120589
worst radius               0.101025
worst texture              0.023199
mean texture               0.021481
worst concavity            0.010650
area error                 0.010010
mean area                  0.009718
mean concavity             0.009263
worst smoothness           0.006004
mean radius                0.004489
mean perimeter             0.004164
worst fractal dimension    0.003923
worst compactness          0.003733
radius error               0.003675
mean symmetry              0.003561
worst symmetry             0.003399
texture error              0.003360
smoothness error           0.003210
fractal dimension error    0.002890
mean compactness           0.002633
mean fractal dimension     0.002227
symmetry error             0.002182
perimeter error            0.002077
concavity error            0.002060
concave points error       0

In [37]:
corr_target.abs().sort_values(ascending=False)

target                     1.000000
worst concave points       0.793566
worst perimeter            0.782914
mean concave points        0.776614
worst radius               0.776454
mean perimeter             0.742636
worst area                 0.733825
mean radius                0.730029
mean area                  0.708984
mean concavity             0.696360
worst concavity            0.659610
mean compactness           0.596534
worst compactness          0.590998
radius error               0.567134
perimeter error            0.556141
area error                 0.548236
worst texture              0.456903
worst smoothness           0.421465
worst symmetry             0.416294
mean texture               0.415185
concave points error       0.408042
mean smoothness            0.358560
mean symmetry              0.330499
worst fractal dimension    0.323872
compactness error          0.292999
concavity error            0.253730
fractal dimension error    0.077972
smoothness error           0

# おまけ
worst_columns と mean_columns どちらか一方でも良さそう → とりあえずそこの相関を取ってみる

In [92]:
[X[worst_columns[i]].corr(X[mean_columns[i]]) for i in range(10)]    # どれも非常に高い → 片方で十分

[0.9695389726112059,
 0.912044588840421,
 0.9703868870426395,
 0.9592133256499001,
 0.8053241954943624,
 0.8658090398022628,
 0.8841026390943821,
 0.9101553142985938,
 0.699825797643731,
 0.7672967792384361]

In [72]:
# svm
svc_base = SVC(random_state=SEED)
svc_param_grid = {
    "C": (0.01, 0.1, 1.0, 10.0, 100.0), "kernel": ('linear', 'poly', 'rbf', 'sigmoid'), 
    "gamma": ("auto", "scale")
}
svc_clf = GridSearchCV(svc_base, svc_param_grid, "f1", verbose=2, cv=5)

In [73]:
svc_clf.fit(X_train[worst_columns], y_train)

Fitting 5 folds for each of 40 candidates, totalling 200 fits
[CV] C=0.01, gamma=auto, kernel=linear ...............................
[CV] ................ C=0.01, gamma=auto, kernel=linear, total=   0.0s
[CV] C=0.01, gamma=auto, kernel=linear ...............................
[CV] ................ C=0.01, gamma=auto, kernel=linear, total=   0.0s
[CV] C=0.01, gamma=auto, kernel=linear ...............................
[CV] ................ C=0.01, gamma=auto, kernel=linear, total=   0.0s
[CV] C=0.01, gamma=auto, kernel=linear ...............................
[CV] ................ C=0.01, gamma=auto, kernel=linear, total=   0.0s
[CV] C=0.01, gamma=auto, kernel=linear ...............................
[CV] ................ C=0.01, gamma=auto, kernel=linear, total=   0.0s
[CV] C=0.01, gamma=auto, kernel=poly .................................
[CV] .................. C=0.01, gamma=auto, kernel=poly, total=   0.0s
[CV] C=0.01, gamma=auto, kernel=poly .................................
[CV] ..........

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s


[CV] ............... C=0.01, gamma=scale, kernel=linear, total=   0.0s
[CV] C=0.01, gamma=scale, kernel=linear ..............................
[CV] ............... C=0.01, gamma=scale, kernel=linear, total=   0.0s
[CV] C=0.01, gamma=scale, kernel=linear ..............................
[CV] ............... C=0.01, gamma=scale, kernel=linear, total=   0.0s
[CV] C=0.01, gamma=scale, kernel=poly ................................
[CV] ................. C=0.01, gamma=scale, kernel=poly, total=   0.0s
[CV] C=0.01, gamma=scale, kernel=poly ................................
[CV] ................. C=0.01, gamma=scale, kernel=poly, total=   0.0s
[CV] C=0.01, gamma=scale, kernel=poly ................................
[CV] ................. C=0.01, gamma=scale, kernel=poly, total=   0.0s
[CV] C=0.01, gamma=scale, kernel=poly ................................
[CV] ................. C=0.01, gamma=scale, kernel=poly, total=   0.0s
[CV] C=0.01, gamma=scale, kernel=poly ................................
[CV] .

[Parallel(n_jobs=1)]: Done 200 out of 200 | elapsed:    1.4s finished


GridSearchCV(cv=5, error_score=nan,
             estimator=SVC(C=1.0, break_ties=False, cache_size=200,
                           class_weight=None, coef0=0.0,
                           decision_function_shape='ovr', degree=3,
                           gamma='scale', kernel='rbf', max_iter=-1,
                           probability=False, random_state=64, shrinking=True,
                           tol=0.001, verbose=False),
             iid='deprecated', n_jobs=None,
             param_grid={'C': (0.01, 0.1, 1.0, 10.0, 100.0),
                         'gamma': ('auto', 'scale'),
                         'kernel': ('linear', 'poly', 'rbf', 'sigmoid')},
             pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
             scoring='f1', verbose=2)

In [75]:
svc_y_pred_worst = svc_clf.predict(X_test[worst_columns])
svc_y_pred_worst = pd.Series(svc_y_pred_worst)

In [78]:
f1_score(y_test, svc_y_pred_worst)

0.9693877551020408

In [79]:
accuracy_score(y_test, svc_y_pred_worst)

0.958041958041958

精度がほぼ変わらず、訓練時間は 1.9s → 1.4s に短縮することができた。

# 感想

3時間前後かかった。

ラボワークの知識を基にやったので流れとしては苦労しなかったが、ハイパラ探索の手法でGridSearchを使うと良いのを知っていても、実数の場合どれくらいの値を入れればいいのか、
SVMはスケーリングを行う必要があるのか、などで悩んだ。

データが簡単なため、途中で目的を見失いがちになった。特に、仮定があっているか確かめるために、ただデータをそのまま入れただけのRandom Forestのfeature_importances_を見るというのは、
よく考えると仮定をする意味自体を失っているような気がする。

後々考えると列名からしてあからさまでデータ数的にも必要性はない気もするが、meanとworstの値がほとんど同じ特徴を示していることに気づき、不要な列を削ることができたのはよかったと思う。

有志で範囲だったSVMをせっかくなので使ってみたが、非線形SVMのハイパーパラメータはほとんどがカーネル関数に対したものだったため、あまり理解した気がしなくて残念だった。
