-
-
Notifications
You must be signed in to change notification settings - Fork 11.6k
Closed
Labels
Description
Describe the issue:
The observation is that using np.concatenate with multiple threads uses more CPU, more than 4x peak memory, and takes ~5x longer than using an explicit for loop to concatenate:
from concurrent.futures import ThreadPoolExecutor
import numpy as np
import time, psutil
N_ELE = 5e5
def concat1(x):
return np.concatenate(x, dtype=np.int64)
def concat2(x):
total = np.r_[0, np.cumsum(list(map(len, x)))]
y = np.empty(total[-1], dtype=np.int64)
for i, n in enumerate(x):
y[total[i]:total[i+1]] = n
return y
def mwe(concat, nest_type=list):
a = [nest_type([1,1])] + [nest_type([1])]*int(N_ELE)
return concat( np.array(a, dtype=object) )
def multithread(f, concat, workers, jobs=6):
label = f'{concat.__name__} using {workers} workers'
with ThreadPoolExecutor(max_workers=workers) as pool:
proc = psutil.Process()
start = time.time()
psutil.cpu_percent()
list(pool.map(f, [concat] * jobs))
print(f'{label} finished in {time.time()-start:.1f} seconds')
print(f'CPU usage: {psutil.cpu_percent()}%')
print(f'Mem usage: {proc.memory_info().peak_wset/1e6:,.1f} MB\n')
# Function combinations generate the same result
assert((mwe(concat1) == mwe(concat2)).all())
# 1 worker vs 6 workers
multithread(mwe, concat2, 1)
multithread(mwe, concat1, 1)
print()
multithread(mwe, concat2, 6)
multithread(mwe, concat1, 6)
print()
Output:
concat2 using 1 workers finished in 1.8 seconds
CPU usage: 13.3%
Mem usage: 119.6 MB
concat1 using 1 workers finished in 1.9 seconds
CPU usage: 13.9%
Mem usage: 140.2 MB
concat2 using 6 workers finished in 1.8 seconds
CPU usage: 13.4%
Mem usage: 140.2 MB
concat1 using 6 workers finished in 9.9 seconds
CPU usage: 24.6%
Mem usage: 560.1 MB
I've tried to minimize the comparisons here, but one other observation is that by using list
as the nest_type I observe the above. If I use np.array
however (which produces a nested object array of arrays rather than lists), I find that the problem exists for both concat1 and concat2 - implying there's something happening with the array representation specifically that's causing an issue, and not necessarily with concatenate.
Runtime information:
System:
python: 3.10.12 | packaged by Anaconda, Inc. | (main, Jul 5 2023, 19:09:20) [MSC v.1916 64 bit (AMD64)]
executable: env_crest\python.exe
machine: Windows-10-10.0.19045-SP0
Python dependencies:
sklearn: 1.2.1
pip: 23.1.2
setuptools: 67.8.0
numpy: 1.24.3
scipy: 1.10.1
Cython: None
pandas: 1.5.3
matplotlib: 3.7.1
joblib: 1.2.0
threadpoolctl: 2.2.0
Built with OpenMP: True
threadpoolctl info:
filepath: env_crest\Library\bin\mkl_rt.2.dll
prefix: mkl_rt
user_api: blas
internal_api: mkl
version: 2023.1-Product
num_threads: 6
threading_layer: intel
filepath: env_crest\vcomp140.dll
prefix: vcomp
user_api: openmp
internal_api: openmp
version: None
num_threads: 12
Context for the issue:
Maybe tangentially related to this issue.