StratifiedKFold yields slightly different class label distributions (prior to 0.22 different train and test sizes)

#### Description
Unfortunately, probably due to the stratification balancing, the train and test size can vary between iteration.  This is pretty highly undesirable and it should always rebalance to same sizes.  Otherwise combining all the folds and using efficient vectorized operations to do computations in a matrix with an extra dimension is not easily possible.

This honestly feels like a poor stratification algorithm.  At worst this is a flat out bug.  At best, its a mode of operation, and a Boolean flag should be added such as something like `force_exact_division=True`.  Of course, in some cases it is impossible to balance the classes, but even in those cases, it could still prefer to guarantee equal distribution, rather than guarantee equal class sizes since we are talking about only fractional remainders here - and the class sizes are not going to be perfect percentage wise balanced anyway as it can be impossible for example if only 1 value belongs to a class.  This is surprisingly awkward functionality.

#### Steps/Code to Reproduce
```
import numpy as np
from sklearn.model_selection import StratifiedKFold
kfold = StratifiedKFold(n_splits=5, shuffle=True, random_state=0)
X, y = np.arange(7590)[:,np.newaxis], np.random.RandomState(0).randint(0, 3, 7590)
for (train, test) in kfold.split(X, y):
    print(len(train), len(test))
kfold = StratifiedKFold(n_splits=5, shuffle=True, random_state=0)
for (train, test) in kfold.split(X, y):
    assert(len(test) == 7590/5)
    assert(len(train) == 7590*4/5)
```

#### Expected Results
```
6072 1518
6072 1518
6072 1518
6072 1518
6072 1518
```

#### Actual Results
```
6071 1519
6071 1519
6072 1518
6072 1518
6074 1516
Traceback (most recent call last):

  File "<ipython-input-106-1c41a73d61b2>", line 8, in <module>
    assert(len(test) == 7590/5)

AssertionError
```

Seems to always be in order from `min(len(train))` to `max(len(train))`

#### Versions
Windows-10-10.0.18362-SP0
Python 3.7.4 (default, Aug  9 2019, 18:34:13) [MSC v.1915 64 bit (AMD64)]
NumPy 1.17.3
SciPy 1.3.1
Scikit-Learn 0.21.3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

StratifiedKFold yields slightly different class label distributions (prior to 0.22 different train and test sizes) #15783

Description

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

StratifiedKFold yields slightly different class label distributions (prior to 0.22 different train and test sizes) #15783

Description

Description

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions