Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Actually remember to release the GIL before doing some multithreading tests. #664

Merged
merged 2 commits into from Jan 24, 2021

Conversation

jpivarski
Copy link
Member

No description provided.

@jpivarski
Copy link
Member Author

With this PR, the ForthMachine releases the Python GIL, so it scales. These were tested on a c5.18xlarge AWS instance (which has 72 cores). The ceiling might be the rate of memory access: my laptop maxes out at 2000 MB/sec.

AwkwardForth (future Uproot)

Of course, Uproot's original deserializer in Python doesn't scale at all.

Python (current Uproot)

@jpivarski jpivarski merged commit 3e6226f into main Jan 24, 2021
@jpivarski jpivarski deleted the jpivarski/parallel-processing-with-AwkwardForth branch January 24, 2021 21:35
@jpivarski
Copy link
Member Author

Ran it a few more times, averaged, and got a smoother curve.

AwkwardForth (future Uproot) (1)

@jpivarski
Copy link
Member Author

It looks like the (single-threaded) raw data copying rate is 2000 MiB/sec, so it's not surprising that we hit a wall at 5000 MiB/sec.

>>> import numpy as np
>>> import time
>>> array = np.ones(10000*1024**2, np.uint8)
>>> begin = time.time(); array2 = np.copy(array); print(time.time() - begin)
4.9701831340789795
>>> del array2
>>> begin = time.time(); array2 = np.copy(array); print(time.time() - begin)
4.9057841300964355
>>> del array2
>>> begin = time.time(); array2 = np.copy(array); print(time.time() - begin)
4.90533709526062
>>> del array2
>>> begin = time.time(); array2 = np.copy(array); print(time.time() - begin)
4.904132127761841

@jpivarski
Copy link
Member Author

It has the same limiting read speed when multiple threads are reading the same array.

>>> import numpy as np
>>> import time
>>> import concurrent.futures
>>> executor = concurrent.futures.ThreadPoolExecutor(2)
>>> begin = time.time(); tmp = list(executor.map(np.copy, [array] * 2)); print(time.time() - begin)
5.315511703491211
>>> del tmp
>>> begin = time.time(); tmp = list(executor.map(np.copy, [array] * 2)); print(time.time() - begin)
5.248187303543091
>>> del tmp
>>> executor = concurrent.futures.ThreadPoolExecutor(4)
>>> begin = time.time(); tmp = list(executor.map(np.copy, [array] * 4)); print(time.time() - begin)
5.308006763458252
>>> del tmp
>>> begin = time.time(); tmp = list(executor.map(np.copy, [array] * 4)); print(time.time() - begin)
5.11659049987793
>>> del tmp
>>> executor = concurrent.futures.ThreadPoolExecutor(8)
>>> begin = time.time(); tmp = list(executor.map(np.copy, [array] * 8)); print(time.time() - begin)
5.400676012039185
>>> del tmp
>>> begin = time.time(); tmp = list(executor.map(np.copy, [array] * 8)); print(time.time() - begin)
5.453733205795288

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant