Skip to content

BUG: Multiple threads introduce significant delays in numpy concatenate #24252

@BrandonSmithJ

Description

@BrandonSmithJ

Describe the issue:

The observation is that using np.concatenate with multiple threads uses more CPU, more than 4x peak memory, and takes ~5x longer than using an explicit for loop to concatenate:

from concurrent.futures import ThreadPoolExecutor
import numpy as np
import time, psutil

N_ELE = 5e5


def concat1(x):
    return np.concatenate(x, dtype=np.int64)

def concat2(x):
    total = np.r_[0, np.cumsum(list(map(len, x)))]
    y = np.empty(total[-1], dtype=np.int64)
    for i, n in enumerate(x):
        y[total[i]:total[i+1]] = n
    return y 

def mwe(concat, nest_type=list):
    a = [nest_type([1,1])] + [nest_type([1])]*int(N_ELE)
    return concat( np.array(a, dtype=object) )

def multithread(f, concat, workers, jobs=6):
    label = f'{concat.__name__} using {workers} workers'
    with ThreadPoolExecutor(max_workers=workers) as pool:
        proc = psutil.Process()
        start = time.time()
        psutil.cpu_percent()
        list(pool.map(f, [concat] * jobs))
        print(f'{label} finished in {time.time()-start:.1f} seconds')
        print(f'CPU usage: {psutil.cpu_percent()}%')
        print(f'Mem usage: {proc.memory_info().peak_wset/1e6:,.1f} MB\n')

# Function combinations generate the same result
assert((mwe(concat1) == mwe(concat2)).all())

# 1 worker vs 6 workers
multithread(mwe, concat2, 1)
multithread(mwe, concat1, 1)
print()

multithread(mwe, concat2, 6)
multithread(mwe, concat1, 6)
print()

Output:

concat2 using 1 workers finished in 1.8 seconds
CPU usage: 13.3%
Mem usage: 119.6 MB

concat1 using 1 workers finished in 1.9 seconds
CPU usage: 13.9%
Mem usage: 140.2 MB


concat2 using 6 workers finished in 1.8 seconds
CPU usage: 13.4%
Mem usage: 140.2 MB

concat1 using 6 workers finished in 9.9 seconds
CPU usage: 24.6%
Mem usage: 560.1 MB

I've tried to minimize the comparisons here, but one other observation is that by using list as the nest_type I observe the above. If I use np.array however (which produces a nested object array of arrays rather than lists), I find that the problem exists for both concat1 and concat2 - implying there's something happening with the array representation specifically that's causing an issue, and not necessarily with concatenate.

Runtime information:

System:
    python: 3.10.12 | packaged by Anaconda, Inc. | (main, Jul  5 2023, 19:09:20) [MSC v.1916 64 bit (AMD64)]
executable: env_crest\python.exe
   machine: Windows-10-10.0.19045-SP0

Python dependencies:
      sklearn: 1.2.1
          pip: 23.1.2
   setuptools: 67.8.0
        numpy: 1.24.3
        scipy: 1.10.1
       Cython: None
       pandas: 1.5.3
   matplotlib: 3.7.1
       joblib: 1.2.0
threadpoolctl: 2.2.0

Built with OpenMP: True

threadpoolctl info:
       filepath: env_crest\Library\bin\mkl_rt.2.dll
         prefix: mkl_rt
       user_api: blas
   internal_api: mkl
        version: 2023.1-Product
    num_threads: 6
threading_layer: intel

       filepath: env_crest\vcomp140.dll
         prefix: vcomp
       user_api: openmp
   internal_api: openmp
        version: None
    num_threads: 12

Context for the issue:

Maybe tangentially related to this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions