Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address distributed non-ordered indexing #914

Open
ClaudiaComito opened this issue Feb 8, 2022 · 2 comments · May be fixed by #938
Open

Address distributed non-ordered indexing #914

ClaudiaComito opened this issue Feb 8, 2022 · 2 comments · May be fixed by #938
Assignees
Labels
bug Something isn't working enhancement New feature or request High priority, urgent indexing MPI Anything related to MPI communication redistribution Related to distributed tensors
Milestone

Comments

@ClaudiaComito
Copy link
Contributor

Related
All kind of stuff depends on distributed non-ordered indexing. Here a sample of issues/PRs where the problem has come up in various forms:

#607 #760 #903 #703 #824 #902 #177 #621 #749 #271 #857

Feature functionality

We want to be able to index a distributed DNDarray with a distributed, non-ordered key, and return the correct, stable result. Examples below. An implementation of this functionality via Alltoallv is available in ht.sort(), needs to be generalized.

>>> a = ht.arange(50, split=0)
>>> a
DNDarray([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26,
          27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49], dtype=ht.int32, device=cpu:0, split=0)
>>> b = ht.random.randint(0,50,(20,), dtype=ht.int64, split=0)
>>> b
DNDarray([46,  5, 44,  8, 14, 10, 30, 15, 34, 30, 41, 44, 28, 26, 11, 20, 16,  7,  9,  8], dtype=ht.int64, device=cpu:0, split=0)
>>> c = a[b]
>>> c
DNDarray([46,  5, 44,  8, 14, 10, 30, 15, 34, 30, 41, 44, 28, 26, 11, 20, 16,  7,  9,  8], dtype=ht.int32, device=cpu:0, split=0)

In the current implementation, c = a[b] returns a distributed DNDarray populated by whichever subset of the key is process-local.

On 2 processes:

c =  DNDarray([ 5,  8, 14, 10, 15, 11, 20, 16,  7,  9,  8, 46, 44, 30, 34, 30, 41, 44, 28, 26], dtype=ht.int32, device=cpu:0, split=0)

On 3 processes:

c =  DNDarray([ 5,  8, 14, 10, 15, 11, 16,  7,  9,  8, 30, 30, 28, 26, 20, 46, 44, 34, 41, 44], dtype=ht.int32, device=cpu:0, split=0)
@ClaudiaComito ClaudiaComito self-assigned this Feb 8, 2022
@ClaudiaComito ClaudiaComito added High priority, urgent bug Something isn't working enhancement New feature or request help wanted Extra attention is needed MPI Anything related to MPI communication redistribution Related to distributed tensors labels Feb 8, 2022
@ClaudiaComito ClaudiaComito mentioned this issue Feb 9, 2022
4 tasks
This was referenced Apr 1, 2022
@ClaudiaComito ClaudiaComito linked a pull request Dec 27, 2022 that will close this issue
4 tasks
@ClaudiaComito ClaudiaComito mentioned this issue Jul 31, 2023
4 tasks
@ClaudiaComito ClaudiaComito added this to the Repo Clean-Up milestone Jul 31, 2023
@mrfh92
Copy link
Collaborator

mrfh92 commented Aug 17, 2023

Stil relevant: #938 is a currently active PR that will resolve this issue.

Reviewed within #1109.

@ClaudiaComito ClaudiaComito modified the milestones: Repo Clean-Up, 1.4.0 Aug 21, 2023
@ClaudiaComito ClaudiaComito removed the help wanted Extra attention is needed label Aug 21, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Oct 9, 2023

@ClaudiaComito ClaudiaComito modified the milestones: 1.4.0, 1.5.0 Apr 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request High priority, urgent indexing MPI Anything related to MPI communication redistribution Related to distributed tensors
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants