New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DO NOT MERGE. Wip/get thread id schedules #6674
DO NOT MERGE. Wip/get thread id schedules #6674
Conversation
…re calling set_parallel_chunksize.
…size, 2) set chunksize back to the default of 0 and then after the gufunc returns, restore the chunksize back to the previously saved value. This way, the current thread gets its default chunksize behavior inside the parallel region but goes back to its previous value when the region is over.
Co-authored-by: stuartarchibald <stuartarchibald@users.noreply.github.com>
Co-authored-by: stuartarchibald <stuartarchibald@users.noreply.github.com>
More details on how actual chunksize can differ from specification. Moved code examples in docs to tests/doc_examples/test_parallel_chunksize.py. Export (g,s)et_parallel_chunksize from numba.np.ufunc. Fix withcontext parallel_chunksize doc string. Change set_parallel_chunksize to return previous chunk size. Use that return value to remove need for get_parallel_chunksize in some places. Raise exception if negative value to set_parallel_chunksize.
This: * Makes `_get_thread_id()` return enumerated threads ids from 0..mask. * Makes it possible to obtain the parfors schedule from within a parallel region. * Implements broadcasting of schedule info across all threads for convenience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stuartarchibald My first reaction is that perhaps this is overly complicated. Here is what I was thinking. When we lower a parfor, we first calculate the schedule size and then alloca that size to store the schedule. We then call do_schedule_(un)signed to calculate the schedule and store it in that alloca space. Both the size and the actual schedule are then on the stack and stick around until the end of the function. We won't try to access them directly by their LLVM names but just know that the storage will stick around. Then, in gufunc_scheduler, when we compute a schedule, we just save in that file the computed size and the pointer to the last computed schedule. So, on top of saving those two things, you add a function to get the schedule that just returns the size and the pointer to the schedule that you saved. You then have the issue with interpreting that raw array but I assume you could add a wrapper that could convert from C to Python format there. Thus, you would execute the parfor and after that you could inspect the schedule. My approach would also supporting inspecting it in the loop but that doesn't seem as clean to me. Does your approach require the inspection to be done inside the loop? If so, would that preclude testing vector-style parfors?
So, with my approach, if one function generated a schedule and then that function stopped and another function started and you then asked for the schedule before you did a parfor in that function then you could get a crash because the schedule alloca would not be on the stack anymore. Does your approach also have this issue?
Closing #7625 implements this further. |
No description provided.