Skip to content

sklearn.dataset.make_blobs takes too much RAM #22244

@NightMachinery

Description

@NightMachinery

Describe the bug

sklearn.dataset.make_blobs consumes an unusual amount of memory.

Steps/Code to Reproduce

command time -f '%M' python -c '
from sklearn import datasets

blobs_opts = {
    "n_samples": 10**4,
    "n_features": 10**4,
    "centers": 10,
    "random_state": 10
}
X, y = datasets.make_blobs(**blobs_opts)
None
'

Expected Results

The max memory consumption should be around 10**(4+4)*8=800000000 bytes = 763MB.

Actual Results

The max memory consumption is 2426524 kilobytes = 2.4GB. (The time command returns the result in kilobytes.)

Versions

System:
    python: 3.8.10 | packaged by conda-forge | (default, May 11 2021, 07:01:05)  [GCC 9.3.0]
executable: /root/miniconda3/bin/python
   machine: Linux-5.4.144+-x86_64-with-glibc2.10

Python dependencies:
          pip: 21.3.1
   setuptools: 59.8.0
      sklearn: 1.0.2
        numpy: 1.21.5
        scipy: 1.7.3
       Cython: None
       pandas: 1.3.5
   matplotlib: 3.4.3
       joblib: 1.1.0
threadpoolctl: 3.0.0

Built with OpenMP: True

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions