Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update PLC, C, (and C++?) APIs to better accommodate python MG graph creation use cases #3947

Closed
rlratzel opened this issue Oct 20, 2023 · 1 comment · Fixed by #3982
Closed
Assignees
Labels
improvement Improvement / enhancement to an existing function

Comments

@rlratzel
Copy link
Contributor

rlratzel commented Oct 20, 2023

There are three separate requests here. I can break these up into three separate issues if that's better:

Do not require num_edges

The python/dask code currently has to compute/persist the dask_cudf dataframe in order to compute the total number of edges the PLC API requires for creating a graph. This should not be necessary since each array already has the size known internally. Not requiring num_edges would eliminate the need to do the somewhat expensive dask compute/persist call.

NOTE: adding this feature is related to the fix done for this issue

Allow multiple src/dst arrays

In order to ease the burden on the cugraph dask/python code for handling dask_cudf input with multiple partitions per worker, the PLC, C, and possibly C++ APIs could accept multiple src and dst vertex arrays. This would allow the dask/python layer to pass the src/dst arrays from each partition as-is, instead of combining each partition's arrays in python in order to pass them as a single src and single dst array to PLC/C.

Implement a "move" option to transfer array ownership

Another improvement could be to allow the PLC/C/C++ layers to own the src/dst arrays currently maintained in python. This would allow PLC/C/C++ modify and delete the incoming arrays as needed, instead of requiring a copy step to preserve the arrays owned by the user/python layer. This could be a new option, possibly called "move" which would default to False (the current behavior).

@rlratzel rlratzel added the improvement Improvement / enhancement to an existing function label Oct 20, 2023
@rlratzel
Copy link
Contributor Author

cc @VibhuJawa @jnke2016

rapids-bot bot pushed a commit that referenced this issue Nov 21, 2023
Updating the C API graph creation functions to support the following:
* Add support for isolated vertices
* Add MG optimization to support multiple device arrays per rank as input and concatenate them internally
* Add MG optimization to internally compute the number of edges via allreduce rather than requiring it as an input parameter (this can be expensive to compute in python)

This PR implements these features.  Some simple tests exist to check for isolate vertices (by running pagerank which generates a different result if the graph has isolated vertices).  A simple test for multiple input arrays exists for the MG case.

Closes #3947 
Closes #3974

Authors:
  - Chuck Hastings (https://github.com/ChuckHastings)
  - Naim (https://github.com/naimnv)

Approvers:
  - Naim (https://github.com/naimnv)
  - Joseph Nke (https://github.com/jnke2016)
  - Seunghwa Kang (https://github.com/seunghwak)

URL: #3982
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants