-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support multithreading #496
Conversation
Cool. Can you add an example of a benchmark (you can put it in |
I am sorry if my comment is dumb, and I am not the maintainer for numexpr, so my opinion is not worth much, but I have the impression your benchmark compares apples to oranges: numpy 2 threads vs numexpr 32 (?) threads (2 explicit threads x 16 builtin threads) or how does the builtin numexpr threading interact with manual threading anyway? Also I would be interested in a benchmark against "normal"/builtin numexpr threading, which I think is more interesting than against numpy. Unless there is something I don't understand (very likely), I don't expect much difference. |
There are many benchmark cases both against numpy and different numbers of
threads, under the folder bench/. The whole point of this pr is to avoid
implementing the same mechanism of numexpr in numpy if multithreading,
numexpr is not thread safe due to global dict, which is stated clearly in
pr content.
I’m not quite sure if I understand the comment “how does the builtin
numexpr interact with manual threading”. Well, better they don’t? If they
do, usually implies race condition. The change here is to guarantee that
they don’t. Oversubscription (I have only 16 cores) is a common technique
that the cpu utilisation is low due to io, cuz in reality the other
thread(s) might be loading the data not using cpus that much.
As commented in benchmark file, it has 2 threads because the presumption is
that memory is only enough for 2 chunks computation. Then there are two
choices, one is to threading on smaller chunks with numpy, the other is to
hand over threading to numexpr. To me clearly the later is an easier
option. With chunk being smaller and oversubscription of threads, numexpr
is not doing as well as other conditions (again they’re available under
bench/ folder). However, it’s still much better than single numpy, and MUCH
less work.
…On Tue, 27 Aug 2024 at 5:58 PM, Gaëtan de Menten ***@***.***> wrote:
I am sorry if my comment is dumb, and I am not the maintainer for numexpr,
so my opinion is not worth much, but I have the impression your benchmark
compares apples to oranges: numpy 2 threads vs numexpr 32 (?) threads (2
explicit threads x 16 builtin threads) or how does the builtin numexpr
threading interact with manual threading anyway? Also I would be interested
in a benchmark against "normal"/builtin numexpr threading, which I think is
more interesting than against numpy. Unless there is something I don't
understand (very likely), I don't expect much difference.
—
Reply to this email directly, view it on GitHub
<#496 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABLBWNQY3DKAKV6KYZ7PVY3ZTQ2DFAVCNFSM6AAAAABNDLTO4SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJRHA4DKMBVGU>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Thanks @emmaai for your example. It was more for me (and others!) to understand the way you wanted to use your feature. Now it is much clearer, and sounds good to me. The only thing that I'd ask is to add a new test exercising this new feature; tests are actually the way to ensure that we are not introducing regressions in the future. |
Test added to verify thread safety by always manifesting the race condition. |
Thanks @emmaai for the added explanation. |
I'm following up with this pr, wondering if there is any concern that it can't be merged? |
I've just activated the tests in CI, and Mac OSX is reporting a failure. Can you address it? |
sorry forgot to commit the change in another file when pushing the test. it should pass now |
Thanks @emmaai ! |
As the title (ref: #494), the change:
_numexpr_last
with a dictionary like object aware of contextreasoning:
re-evaluate
and want it to stay the safe as wellasync
re/evaluate, which is to further cater to my specific user case.benchmark case:
It's based on my reality that most of time, I got large amount of data and only chunks of data can fit into memory, most of my cpus are idling when io (especially i) happens. If the data can fit into memory and cpus are fully utilized, I guess it doesn't make much difference between threading with
numpy
by chunks vsnumexpr
. Certainly I can implement whatnumexpr
can achieve by further chunking each chunk, but why? Specifically I usedask
threading to schedule the tasks. All I have to do is to packne.evaluate
nicely into a "task" if thread safety is taken care of.