# StratifiedKFold 与 KFold 

## StratifiedKFold

`StratifiedKFold` 是分层 `KFold` 的意思。

In [1]:
import numpy as np

from sklearn.model_selection import StratifiedKFold


X = [[i] for i in range(100)]
y = ['A'] * 30 + ['B'] * 30 + ['C'] * 30 + ['D'] * 10

In [4]:
skf = StratifiedKFold(n_splits=10)
for train_index, test_index in skf.split(X, y):
    # print('train_index,', len(train_index), 'test_index,', len(test_index))
    class_dict = {}
    for index in train_index:
        # 做一个计数
        class_dict[y[index]] = class_dict.setdefault(y[index], 0) + 1
    print(class_dict)

{'A': 27, 'B': 27, 'C': 27, 'D': 9}
{'A': 27, 'B': 27, 'C': 27, 'D': 9}
{'A': 27, 'B': 27, 'C': 27, 'D': 9}
{'A': 27, 'B': 27, 'C': 27, 'D': 9}
{'A': 27, 'B': 27, 'C': 27, 'D': 9}
{'A': 27, 'B': 27, 'C': 27, 'D': 9}
{'A': 27, 'B': 27, 'C': 27, 'D': 9}
{'A': 27, 'B': 27, 'C': 27, 'D': 9}
{'A': 27, 'B': 27, 'C': 27, 'D': 9}
{'A': 27, 'B': 27, 'C': 27, 'D': 9}


可以看到，`StratifiedKFold` 分成 $10$ 折，每一折里面 A、B、C、D 的比例都是一样的。

---

# KFold

In [9]:
from sklearn.model_selection import KFold

kf = KFold(n_splits=10)
print(kf)

KFold(n_splits=10, random_state=None, shuffle=False)


可以看到 `KFold` 就不保证分出来的每个类别的占比都一致。

In [10]:
for train_index, test_index in kf.split(X, y):
    # print('train_index,', len(train_index), 'test_index,', len(test_index))
    class_dict = {}
    for index in train_index:
        # 做一个计数
        class_dict[y[index]] = class_dict.setdefault(y[index], 0) + 1
    print(class_dict)

{'A': 20, 'B': 30, 'C': 30, 'D': 10}
{'A': 20, 'B': 30, 'C': 30, 'D': 10}
{'A': 20, 'B': 30, 'C': 30, 'D': 10}
{'A': 30, 'B': 20, 'C': 30, 'D': 10}
{'A': 30, 'B': 20, 'C': 30, 'D': 10}
{'A': 30, 'B': 20, 'C': 30, 'D': 10}
{'A': 30, 'B': 30, 'C': 20, 'D': 10}
{'A': 30, 'B': 30, 'C': 20, 'D': 10}
{'A': 30, 'B': 30, 'C': 20, 'D': 10}
{'A': 30, 'B': 30, 'C': 30}
