Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More accurate array overlap test #39

Closed
manopapad opened this issue Jun 4, 2021 · 3 comments
Closed

More accurate array overlap test #39

manopapad opened this issue Jun 4, 2021 · 3 comments
Labels
backlog enhancement New feature or request

Comments

@manopapad
Copy link
Contributor

The current RegionField overlap test (meant to check whether an intra-array copy is safe to be implemented with a single Legion copy/task) is too conservative to be useful.

https://github.com/nv-legate/legate.numpy/blob/5ec509907b9e49d98edcbc2fdc9ff2b55d2f5f33/legate/numpy/runtime.py#L496-L517

It ignores slice steps, and is very inaccurate when going from a 2d view to a 1d base array, e.g. for:

a = np.arange(25)
b = a.reshape((5, 5))

it will decide that b[3:5:, 0:2] and b[3:5, 2:4] overlap, because it translates the rectangles to the base 1d space, and considers the bounding boxes on that space (a[15:22] and a[17:24] in this example).

@piyueh
Copy link

piyueh commented Jun 11, 2021

@manopapad I 'm not sure if the following code snippet is relevant to this issue or the original issue (#16):

import legate.numpy as np
a = np.arange(49).reshape((7, 7))
a[2:-2, 0] = a[2:-2, 2]

I got an error from this test code:

[0 - 7fe46cbbf700]    0.963354 {5}{runtime}: [error 356] LEGION ERROR: Aliased and interfering region requirements for individual tasks are not permitted. Region requirements 0 and 1 of task legate::numpy::CopyTask<long> (UID 7) in parent task legion_python_main (UID 1) are interfering. (from file /home/u00u92m5kmdwylez30357/Downloads/legate/legate.core/legion/runtime/legion/legion_tasks.cc:5921)
For more information see:
http://legion.stanford.edu/messages/error_code.html#error_code_356

Signal 6 received by node 0, process 208786 (thread 7fe46cbbf700) - obtaining backtrace
Signal 6 received by process 208786 (thread 7fe46cbbf700) at: stack trace: 19 frames
  [0] = /lib/x86_64-linux-gnu/libc.so.6(+0x3f040) [0x7fe47752d040]
  [1] = /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xc7) [0x7fe47752cfb7]
  [2] = /lib/x86_64-linux-gnu/libc.so.6(abort+0x141) [0x7fe47752e921]
  [3] = /home/u00u92m5kmdwylez30357/.local/anaconda/envs/torchswe-legate/lib/liblegion.so(+0x774ff6) [0x7fe479744ff6]
  [4] = /home/u00u92m5kmdwylez30357/.local/anaconda/envs/torchswe-legate/lib/liblegion.so(Legion::Internal::IndividualTask::report_interfering_requirements(unsigned int, unsigned int)+0xc8) [0x7fe4794927c8]
  [5] = /home/u00u92m5kmdwylez30357/.local/anaconda/envs/torchswe-legate/lib/liblegion.so(Legion::Internal::TaskOp::perform_intra_task_alias_analysis(bool, Legion::Internal::LegionTrace*, std::vector<Legion::Internal::RegionTreePath, std::allocator<Legion::Internal::RegionTreePath> >&)+0x35c) [0x7fe4794b21dc]
  [6] = /home/u00u92m5kmdwylez30357/.local/anaconda/envs/torchswe-legate/lib/liblegion.so(Legion::Internal::IndividualTask::trigger_prepipeline_stage()+0x3bf) [0x7fe4794c381f]
  [7] = /home/u00u92m5kmdwylez30357/.local/anaconda/envs/torchswe-legate/lib/liblegion.so(Legion::Internal::Operation::execute_prepipeline_stage(unsigned int, bool)+0x266) [0x7fe479418b96]
  [8] = /home/u00u92m5kmdwylez30357/.local/anaconda/envs/torchswe-legate/lib/liblegion.so(Legion::Internal::Operation::execute_dependence_analysis()+0x73) [0x7fe47944c893]
  [9] = /home/u00u92m5kmdwylez30357/.local/anaconda/envs/torchswe-legate/lib/liblegion.so(Legion::Internal::SpeculativeOp::execute_dependence_analysis()+0x1d5) [0x7fe4794611a5]
  [10] = /home/u00u92m5kmdwylez30357/.local/anaconda/envs/torchswe-legate/lib/liblegion.so(Legion::Internal::MemoizableOp<Legion::Internal::SpeculativeOp>::execute_dependence_analysis()+0xec) [0x7fe47948952c]
  [11] = /home/u00u92m5kmdwylez30357/.local/anaconda/envs/torchswe-legate/lib/liblegion.so(Legion::Internal::InnerContext::process_dependence_stage()+0x18c) [0x7fe4794f040c]
  [12] = /home/u00u92m5kmdwylez30357/.local/anaconda/envs/torchswe-legate/lib/liblegion.so(Legion::Internal::Runtime::legion_runtime_task(void const*, unsigned long, void const*, unsigned long, Realm::Processor)+0x5ea) [0x7fe4797b0f8a]
  [13] = /home/u00u92m5kmdwylez30357/.local/anaconda/envs/torchswe-legate/lib/librealm.so(+0x4323e1) [0x7fe47811d3e1]
  [14] = /home/u00u92m5kmdwylez30357/.local/anaconda/envs/torchswe-legate/lib/librealm.so(+0x2ada93) [0x7fe477f98a93]
  [15] = /home/u00u92m5kmdwylez30357/.local/anaconda/envs/torchswe-legate/lib/librealm.so(+0x2adc96) [0x7fe477f98c96]
  [16] = /home/u00u92m5kmdwylez30357/.local/anaconda/envs/torchswe-legate/lib/librealm.so(+0x2b0833) [0x7fe477f9b833]
  [17] = /home/u00u92m5kmdwylez30357/.local/anaconda/envs/torchswe-legate/lib/librealm.so(+0x294e92) [0x7fe477f7fe92]
  [18] = /lib/x86_64-linux-gnu/libc.so.6(+0x587b0) [0x7fe4775467b0]

Only errors from Legion. No errors & traceback from Python. Ran with -lg:numpy:test flag.

However, if a is initialized directly using np.random.random((7, 7)), it works fine..

@manopapad
Copy link
Contributor Author

Thank you for finding this. It turns out the simple overlap check above actually captures the behavior of the partitioning code in most cases (the partitioning code also takes the convex hull of the boundary points on the base region), so I reinstated it. Now instead of the Legion error you should get a NotImplementedError: copies between overlapping sub-arrays. The error message is not the whole story, since the overlap is not coming from the actual slice expressions (obviously a[2:-2, 0] and a[2:-2, 2] do not overlap), but from the way Legion implements 2d views over 1d base arrays.

Closing this issue, since the overlap check is already good enough. #40 tracks work on actually supporting overlapping array copies. @piyueh: So we know what to prioritize, is this feature useful for your actual application?

@piyueh
Copy link

piyueh commented Jun 25, 2021

@manopapad Currently, I haven't encountered this issue in my application.

fduguet-nv pushed a commit to fduguet-nv/cunumeric that referenced this issue Mar 29, 2022
Suppress some -Wswitch warnings
manopapad added a commit to manopapad/cunumeric that referenced this issue Feb 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlog enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants