Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Distributed] Partition MovieLens dataset #8815

Merged
merged 2 commits into from
Jan 25, 2024

Conversation

kgajdamo
Copy link
Contributor

@kgajdamo kgajdamo commented Jan 24, 2024

Changes made:

  • added MovieLens dataset to the partitioning script (partition_graph.py)
  • edge time can be defined independently of edge_attrs

Copy link

codecov bot commented Jan 24, 2024

Codecov Report

Attention: 4 lines in your changes are missing coverage. Please review.

Comparison is base (fb31db1) 89.87% compared to head (2cf2cd5) 89.39%.

❗ Current head 2cf2cd5 differs from pull request most recent head 6a5cacc. Consider uploading reports for the commit 6a5cacc to get more accurate results

Files Patch % Lines
torch_geometric/distributed/partition.py 76.47% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8815      +/-   ##
==========================================
- Coverage   89.87%   89.39%   -0.49%     
==========================================
  Files         479      479              
  Lines       31136    31145       +9     
==========================================
- Hits        27984    27842     -142     
- Misses       3152     3303     +151     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@rusty1s rusty1s changed the title Partition MovieLens dataset [Distributed] Partition MovieLens dataset Jan 25, 2024
Comment on lines +134 to +140
train_data, val_data, test_data = T.RandomLinkSplit(
num_val=0.1,
num_test=0.1,
neg_sampling_ratio=0.0,
edge_types=[edge_type],
rev_edge_types=[('movie', 'rev_rates', 'user')],
)(dataset[0])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we drop the adjustment of message passing here computed in train_data, val_data and test_data. This is not necessarily a blocker (since I don't necessarily see a good way to fix this for now), but something you should be aware of. Currently, you would like information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't get that. What do you mean by "we drop the adjustment of message passing here computed in train_data, val_data and test_data"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The edge_index is different across different splits for link prediction tasks (in order to not leak information).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. What about the solution used in the temporal_link_pred example?: temporal_link_pred.py #L27 Maybe we can use that one instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, using temporal sampling is usually a good option to resolve this.

@rusty1s rusty1s merged commit 2ab0351 into pyg-team:master Jan 25, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants