Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Renumber PG to be contiguous per type #2697

Merged
merged 2 commits into from
Sep 21, 2022

Conversation

eriknw
Copy link
Contributor

@eriknw eriknw commented Sep 14, 2022

Closes #2627 and CC @VibhuJawa

Currently only SG as we hammer out the API and behavior.

This returns a dataframe with start and stop for each type. Should stop be inclusive or exclusive?

How should we handle vertex ids that only exist in edge data? Should we raise (for now) if this condition exists? I think we can handle this without too much difficulty, but it will take more work.

Since we number edge data, I think edge IDs will often be added in a way that is already contiguous per type. We could keep track of this to avoid unnecessary computation.

Also, I want to confirm that we cannot have multiple rows for a single vertex ID, right? I think we settled on this. Multiple rows with the same ID would cause a problem with the current implementation--it currently gives each row a unique ID.

See rapidsai#2627

Currently only SG as we hammer out the API and behavior.
@eriknw eriknw requested a review from a team as a code owner September 14, 2022 04:51
@eriknw eriknw marked this pull request as draft September 14, 2022 05:08
@codecov-commenter
Copy link

codecov-commenter commented Sep 14, 2022

Codecov Report

Base: 60.11% // Head: 60.07% // Decreases project coverage by -0.03% ⚠️

Coverage data is based on head (e17b130) compared to base (4657c68).
Patch coverage: 50.81% of modified lines in pull request are covered.

Additional details and impacted files
@@               Coverage Diff                @@
##           branch-22.10    #2697      +/-   ##
================================================
- Coverage         60.11%   60.07%   -0.04%     
================================================
  Files               112      111       -1     
  Lines              6150     6184      +34     
================================================
+ Hits               3697     3715      +18     
- Misses             2453     2469      +16     
Impacted Files Coverage Δ
...ugraph/cugraph/dask/structure/mg_property_graph.py 15.10% <6.25%> (-0.81%) ⬇️
python/cugraph/cugraph/structure/property_graph.py 97.36% <100.00%> (+0.19%) ⬆️
python/cugraph/cugraph/community/triangle_count.py 88.88% <0.00%> (-11.12%) ⬇️
python/pylibcugraph/pylibcugraph/__init__.py 100.00% <0.00%> (ø)
python/cugraph/cugraph/experimental/__init__.py 100.00% <0.00%> (ø)
...pylibcugraph/pylibcugraph/experimental/__init__.py 100.00% <0.00%> (ø)
...h/cugraph/experimental/community/triangle_count.py
...n/cugraph/cugraph/dask/community/triangle_count.py 18.51% <0.00%> (+0.97%) ⬆️
python/cugraph/cugraph/utilities/api_tools.py 100.00% <0.00%> (+12.50%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@VibhuJawa
Copy link
Member

VibhuJawa commented Sep 15, 2022

This returns a dataframe with start and stop for each type. Should stop be inclusive or exclusive?

Either is fine. I prefer inclusive.

How should we handle vertex ids that only exist in edge data? Should we raise (for now) if this condition exists? I think we can handle this without too much difficulty, but it will take more work.

I think we dont need it for DGL usecases .

I want to confirm that we cannot have multiple rows for a single vertex ID, right? I think we settled on this. Multiple rows with the same ID would cause a problem with the current implementation--it currently gives each row a unique ID.

Yup. One vertex id for now for our storage.

@eriknw eriknw marked this pull request as ready for review September 19, 2022 22:43
@eriknw
Copy link
Contributor Author

eriknw commented Sep 20, 2022

Great, thanks @VibhuJawa. This is ready for review.

stop is now inclusive. Renumbering vertex IDs raises if there are vertex IDs that only exist in edge data.

@rlratzel rlratzel added this to the 22.10 milestone Sep 20, 2022
Copy link
Member

@VibhuJawa VibhuJawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alexbarghi-nv alexbarghi-nv added non-breaking Non-breaking change feature request New feature or request labels Sep 21, 2022
Copy link
Member

@alexbarghi-nv alexbarghi-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@rlratzel rlratzel added improvement Improvement / enhancement to an existing function and removed feature request New feature or request labels Sep 21, 2022
@rlratzel
Copy link
Contributor

@gpucibot merge

@rapids-bot rapids-bot bot merged commit ae69a29 into rapidsai:branch-22.10 Sep 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Renumber PropertyGraphs to be contiguous per type basis
5 participants