-
-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parameter n_classes in make_classification does not work as expected in extreme cases. #16789
Comments
Thanks! Well the code is
and |
Fix in version 1.0?
|
Simple cases like the following would also produce something unexpected: X, y = make_classification(n_samples=40, n_informative=8, n_classes=20,
random_state=0, flip_y=0.5)
len(np.unique(y))
# 16 How do we should we move forward? Open a PR tagged 1.0 and merge when the time comes? |
@oXwvdrbbj8S4wo9k8lSN @thomasjpfan @rth This was merged in master Oct 2019 084a351 So it seems this is the expected behavior, given the flip_y parameter is default to 0.01, i.e the probability of some classes randomly assigned. The higher this number is, the more likely you see unbalanced/unexpected label distribution. |
Describe the bug
According to the documentation, n_classes corresponds to the number of classes (or labels) of the classification problem. Therefore, I expected exactly n classes in the generated y. However, in at least one case, this did not work. I am well aware that the combination of parameters is absolutely stupid. This behavior was only discovered in a unit test of a wrapper function that was fed with random inputs.
Steps/Code to Reproduce
Expected Results
The passed number of classes (n_classes):
34
Actual Results
33
Versions
System:
python: 3.7.6 | packaged by conda-forge | (default, Mar 5 2020, 15:27:18) [GCC 7.3.0]
executable: /opt/conda/bin/python
machine: Linux-4.19.76-linuxkit-x86_64-with-debian-buster-sid
Python dependencies:
pip: 20.0.2
setuptools: 46.0.0.post20200311
sklearn: 0.22.2.post1
numpy: 1.18.1
scipy: 1.4.1
Cython: 0.29.15
pandas: 1.0.3
matplotlib: 3.1.3
joblib: 0.14.1
Built with OpenMP: True
The text was updated successfully, but these errors were encountered: