[`Distributed`] Partition `MovieLens` dataset #8815

kgajdamo · 2024-01-24T10:25:52Z

Changes made:

added MovieLens dataset to the partitioning script (partition_graph.py)
edge time can be defined independently of edge_attrs

codecov · 2024-01-24T11:28:20Z

Codecov Report

Attention: 4 lines in your changes are missing coverage. Please review.

Comparison is base (fb31db1) 89.87% compared to head (2cf2cd5) 89.39%.

❗ Current head 2cf2cd5 differs from pull request most recent head 6a5cacc. Consider uploading reports for the commit 6a5cacc to get more accurate results

Files	Patch %	Lines
torch_geometric/distributed/partition.py	76.47%	4 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #8815      +/-   ##
==========================================
- Coverage   89.87%   89.39%   -0.49%     
==========================================
  Files         479      479              
  Lines       31136    31145       +9     
==========================================
- Hits        27984    27842     -142     
- Misses       3152     3303     +151

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

rusty1s · 2024-01-25T08:20:03Z

examples/distributed/pyg/partition_graph.py

+        train_data, val_data, test_data = T.RandomLinkSplit(
+            num_val=0.1,
+            num_test=0.1,
+            neg_sampling_ratio=0.0,
+            edge_types=[edge_type],
+            rev_edge_types=[('movie', 'rev_rates', 'user')],
+        )(dataset[0])


Looks like we drop the adjustment of message passing here computed in train_data, val_data and test_data. This is not necessarily a blocker (since I don't necessarily see a good way to fix this for now), but something you should be aware of. Currently, you would like information.

Sorry, I didn't get that. What do you mean by "we drop the adjustment of message passing here computed in train_data, val_data and test_data"?

The edge_index is different across different splits for link prediction tasks (in order to not leak information).

I see. What about the solution used in the temporal_link_pred example?: temporal_link_pred.py #L27 Maybe we can use that one instead?

Yeah, using temporal sampling is usually a good option to resolve this.

kgajdamo added 0 - Priority P0 distributed labels Jan 24, 2024

kgajdamo self-assigned this Jan 24, 2024

kgajdamo requested review from wsad1 and rusty1s as code owners January 24, 2024 10:25

github-actions bot added the example label Jan 24, 2024

kgajdamo mentioned this pull request Jan 24, 2024

Handle node-level and edge-level temporal information when generating partitions #8718

Merged

kgajdamo force-pushed the partition-movie-lens branch 2 times, most recently from 8652f1c to b0f1d7e Compare January 24, 2024 11:18

kgajdamo requested review from JakubPietrakIntel and ZhengHongming888 January 24, 2024 14:09

kgajdamo force-pushed the partition-movie-lens branch from 2cf2cd5 to 6a5cacc Compare January 24, 2024 16:39

kgajdamo added 2 commits January 24, 2024 18:15

partition MovieLens dataset

5a063a7

update CHANGELOG.md

6a5cacc

rusty1s changed the title ~~Partition MovieLens dataset~~ [Distributed] Partition MovieLens dataset Jan 25, 2024

rusty1s approved these changes Jan 25, 2024

View reviewed changes

rusty1s merged commit 2ab0351 into pyg-team:master Jan 25, 2024
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[`Distributed`] Partition `MovieLens` dataset #8815

[`Distributed`] Partition `MovieLens` dataset #8815

kgajdamo commented Jan 24, 2024 •

edited

codecov bot commented Jan 24, 2024 •

edited

rusty1s Jan 25, 2024

kgajdamo Jan 25, 2024

rusty1s Jan 25, 2024

kgajdamo Jan 26, 2024

rusty1s Jan 29, 2024

[Distributed] Partition MovieLens dataset #8815

[Distributed] Partition MovieLens dataset #8815

Conversation

kgajdamo commented Jan 24, 2024 • edited

codecov bot commented Jan 24, 2024 • edited

Codecov Report

rusty1s Jan 25, 2024

Choose a reason for hiding this comment

kgajdamo Jan 25, 2024

Choose a reason for hiding this comment

rusty1s Jan 25, 2024

Choose a reason for hiding this comment

kgajdamo Jan 26, 2024

Choose a reason for hiding this comment

rusty1s Jan 29, 2024

Choose a reason for hiding this comment

[`Distributed`] Partition `MovieLens` dataset #8815

[`Distributed`] Partition `MovieLens` dataset #8815

kgajdamo commented Jan 24, 2024 •

edited

codecov bot commented Jan 24, 2024 •

edited