Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PropertyGraph set index to vertex and edge ids #2523

Merged
merged 14 commits into from
Sep 30, 2022

Conversation

eriknw
Copy link
Contributor

@eriknw eriknw commented Aug 9, 2022

Currently, this only does SG version for #2401. MG is still TODO.

Closes #2401

This also doesn't change anything user-facing (yet).

Currently, this only does SG version for rapidsai#2401.  MG is still TODO.

This also doesn't change anything user-facing (yet).
@eriknw eriknw requested a review from a team as a code owner August 9, 2022 23:16
@codecov-commenter
Copy link

codecov-commenter commented Aug 10, 2022

Codecov Report

❗ No coverage uploaded for pull request base (branch-22.10@238f909). Click here to learn what that means.
Patch has no changes to coverable lines.

❗ Current head 6a365ec differs from pull request most recent head 6a7ff4e. Consider uploading reports for the commit 6a7ff4e to get more accurate results

Additional details and impacted files
@@               Coverage Diff               @@
##             branch-22.10    #2523   +/-   ##
===============================================
  Coverage                ?   60.13%           
===============================================
  Files                   ?      111           
  Lines                   ?     6184           
  Branches                ?        0           
===============================================
  Hits                    ?     3719           
  Misses                  ?     2465           
  Partials                ?        0           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@alexbarghi-nv
Copy link
Member

@eriknw is this ready for review?

@eriknw
Copy link
Contributor Author

eriknw commented Aug 18, 2022

Yeah, for SG. I'm going to put MG into a different WIP branch, because of issues I'm having with dask_cudf.

I still want to add one thing: the ability to add vertex data that already has the vertex id as the index.

@alexbarghi-nv
Copy link
Member

alexbarghi-nv commented Aug 18, 2022

ok, I'll wait until you've made that addition, I wasn't sure if that was going into this branch or not

Copy link
Contributor

@rlratzel rlratzel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I had just one optional minor question/suggestion which need not hold up approval.

Comment on lines +448 to +452
if self.__series_type is cudf.Series:
latest = {n: self.__vertex_prop_dataframe[n]
for n in self.__vertex_prop_dataframe.columns}
else:
latest = self.__vertex_prop_dataframe.to_dict('series')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious how much faster/more efficient it is to create latest by separately using .to_dict() only for pandas vs. just having one way that works with both. Granted it's not many more lines of code to support both, but still might be worth only doing it one way if the speedup is negligible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt that it's any faster, but I aspire to write canonical pandas, and, who knows, maybe pandas can do something smarter and/or maybe cudf will add this API in the future.

@BradReesWork
Copy link
Member

rerun tests

@BradReesWork
Copy link
Member

rerun tests

1 similar comment
@alexbarghi-nv
Copy link
Member

rerun tests

@BradReesWork
Copy link
Member

rerun tests

Copy link
Member

@VibhuJawa VibhuJawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a check that we fetch ids in correct order and behave correctly with repeated ids ? (for both PropertyGraph and MGProperty Graph and for both get_edge_data and get_node_data).

See example:

import cudf
import cugraph
from cugraph.experimental import PropertyGraph
import numpy as np

src_ser = cudf.Series([1, 1, 1, 2])
dst_ser = cudf.Series([2, 3, 4, 1])
df = cudf.DataFrame(
    {"src": src_ser, "dst": dst_ser, "edge_feat": np.arange(len(src_ser))}
)
pg = PropertyGraph()
pg.add_edge_data(df, vertex_col_names=['src', 'dst'])

Current Main Behavior:

pg.get_edge_data(edge_ids = [2,1,3,1], columns=['edge_feat'])

_SRC_	_DST_	_EDGE_ID_	_TYPE_	edge_feat
1	1	3	1		1
2	1	4	2		2
3	2	1	3		3

Expected behavior on this PR.

_SRC_	_DST_	_EDGE_ID_	_TYPE_	edge_feat
0	1	4	2		2
1	1	3	1		1
2	2	1	3		3
3	1	3	1		1

@BradReesWork
Copy link
Member

rerun tests

rapids-bot bot pushed a commit that referenced this pull request Sep 30, 2022
This PR fixes rapidsai/graph_dl#27

This PR fixes rapidsai/graph_dl#43

This PR fixes rapidsai/graph_dl#39



**Tests Added:** 

Single GPU:
- [x]  APIs like num_nodes, num_edges
- [x]  test_sampling_basic
- [x]  test_sampling_homogeneous_gs_in_dir
- [x]  test_sampling_homogeneous_gs_out_dir
- [x]  test_sampling_gs_homogeneous_neg_one_fanout
- [x]  test_sampling_gs_heterogeneous_in_dir 
- [x]  test_sampling_gs_heterogeneous_out_dir
- [x]  test_sampling_gs_heterogeneous_neg_one_fanout 



Multi GPU:
- [x] APIs like num_nodes, num_edges
- [x] test_sampling_basic
- [x]  test_sampling_homogeneous_gs_in_dir
- [x]  test_sampling_homogeneous_gs_out_dir 
- [x] test_sampling_gs_homogeneous_neg_one_fanout 
- [x] test_sampling_gs_heterogeneous_in_dir 
- [x] test_sampling_gs_heterogeneous_out_dir
- [x] test_sampling_gs_heterogeneous_neg_one_fanout 


Bugs to reproduce:
- [ ] Repro heterogeneous single gpu hang outside pytest
- [x] Repro hetrogenous multi gpu incorrect results for with_replace=False 
#2760
- [x] Repro hetrogenous incorrect results for different amount of GPUs
#2761

Tests that depend upon #2523 
- [x] Add minimal example to PR to ensure it gets fixed
Added comment here: #2523 (review)
- [x]  test_get_node_storage_gs  (Failing cause of a PG bug) 
- [x]  test_get_edge_storage_gs  (Failing cause of a PG bug) 
- [x] test_get_node_storage_gs (Failing cause of a PG bug) 
- [x] test_get_edge_storage_gs (Failing cause of a PG bug)

Authors:
  - Vibhu Jawa (https://github.com/VibhuJawa)

Approvers:
  - Rick Ratzel (https://github.com/rlratzel)
  - Brad Rees (https://github.com/BradReesWork)

URL: #2592
@BradReesWork
Copy link
Member

@gpucibot merge

@rapids-bot rapids-bot bot merged commit d51b1f9 into rapidsai:branch-22.10 Sep 30, 2022
rapids-bot bot pushed a commit that referenced this pull request Oct 3, 2022
rapids-bot bot pushed a commit that referenced this pull request Oct 11, 2022
This PR fixes the below errors that have popped up in MNMG testing. 
- [x] fix_out of index keys on MNMG graphs
- [x] fix loc/get_node_storage error  on MNMG graphs 
(Work around  rapidsai/cudf#11877)
- [x] Clear Cached Properties when they become invalid
- [x] Remove  6 pytest skipping as both these PRs have landed 
- #2751
- #2523
- [x] Change `vertex_col_names` to  `node_col_names`  to match DGL 
- [x] Ensure MNMG tests pass 
- [x] Work around the PG bug and also prevent redundant conversion to lists

Authors:
  - Vibhu Jawa (https://github.com/VibhuJawa)

Approvers:
  - Rick Ratzel (https://github.com/rlratzel)
  - Erik Welch (https://github.com/eriknw)

URL: #2786
@VibhuJawa VibhuJawa mentioned this pull request Nov 7, 2022
18 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
6 participants