-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
Closed
Description
Describe the bug
sklearn.dataset.make_blobs
consumes an unusual amount of memory.
Steps/Code to Reproduce
command time -f '%M' python -c '
from sklearn import datasets
blobs_opts = {
"n_samples": 10**4,
"n_features": 10**4,
"centers": 10,
"random_state": 10
}
X, y = datasets.make_blobs(**blobs_opts)
None
'
Expected Results
The max memory consumption should be around 10**(4+4)*8=800000000 bytes = 763MB
.
Actual Results
The max memory consumption is 2426524 kilobytes = 2.4GB
. (The time
command returns the result in kilobytes.)
Versions
System:
python: 3.8.10 | packaged by conda-forge | (default, May 11 2021, 07:01:05) [GCC 9.3.0]
executable: /root/miniconda3/bin/python
machine: Linux-5.4.144+-x86_64-with-glibc2.10
Python dependencies:
pip: 21.3.1
setuptools: 59.8.0
sklearn: 1.0.2
numpy: 1.21.5
scipy: 1.7.3
Cython: None
pandas: 1.3.5
matplotlib: 3.4.3
joblib: 1.1.0
threadpoolctl: 3.0.0
Built with OpenMP: True