Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change method of partitioning MovieLens #8874

Merged
merged 7 commits into from
Feb 9, 2024

Conversation

kgajdamo
Copy link
Contributor

@kgajdamo kgajdamo commented Feb 6, 2024

Changes made:

  • Perform link-level split instead of using T.RandomLinkSplit(...)
  • partition{X}.pt consists of a dictionary with the following data: edge_label_index, edge_label, edge_label_time.
  • added support to sort edge_index with respect to node time or edge time.

Copy link
Member

@rusty1s rusty1s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know once this is ready for review :)

@kgajdamo
Copy link
Contributor Author

kgajdamo commented Feb 7, 2024

Let me know once this is ready for review :)

Sure, I am still getting this error and I can't find the reason :|

@rusty1s
Copy link
Member

rusty1s commented Feb 7, 2024

This would mean that local neighborhoods are not sorted according to time.

@kgajdamo
Copy link
Contributor Author

kgajdamo commented Feb 7, 2024

This would mean that local neighborhoods are not sorted according to time.

Yes I know, this error comes from pyg-lib code. But I wonder why it happens, since I used a similar approach as in the temporal_link_pred.py example and no additional sorting was used there so the time data from the dataset should be valid.

@rusty1s
Copy link
Member

rusty1s commented Feb 7, 2024

And all local graph store instances respect the constraint from pyg-lib?

@kgajdamo
Copy link
Contributor Author

kgajdamo commented Feb 7, 2024

And all local graph store instances respect the constraint from pyg-lib?

When I check manually, the time values are actually not sorted. The most probable thing to me was that there is an error during partitioning, for example that the data was not permuted correctly. However, looking at the dataset and the corresponding edge_index and time values, the dependencies between them are preserved after the partitioning process.

@kgajdamo
Copy link
Contributor Author

kgajdamo commented Feb 8, 2024

@rusty1s I see what's missing. There is an additional sort in here: utils.py#L26. We don't use this function during partitioning. I'll try to fix this as soon as possible.

@github-actions github-actions bot added the loader label Feb 8, 2024
Copy link

codecov bot commented Feb 8, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (b3f0eb3) 89.89% compared to head (34a0aa9) 89.25%.

❗ Current head 34a0aa9 differs from pull request most recent head 4c6eac5. Consider uploading reports for the commit 4c6eac5 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8874      +/-   ##
==========================================
- Coverage   89.89%   89.25%   -0.65%     
==========================================
  Files         467      467              
  Lines       29924    29919       -5     
==========================================
- Hits        26899    26703     -196     
- Misses       3025     3216     +191     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@kgajdamo kgajdamo force-pushed the dist-link-temporal branch 5 times, most recently from 1df1010 to dfa38f0 Compare February 8, 2024 12:41
@kgajdamo kgajdamo marked this pull request as ready for review February 8, 2024 12:42
@kgajdamo kgajdamo requested review from wsad1, mananshah99 and a team as code owners February 8, 2024 12:42
@kgajdamo
Copy link
Contributor Author

kgajdamo commented Feb 8, 2024

@rusty1s It’s ready for review. Sorry for the delay. I’m glad you noticed that edge_label_time initialisation. Now I see that the values I provided were so small that it probably discarded the entire neighborhood before checking whether the time data was sorted. I added lexsort([edge_time, global_col]) with respect to edge time and now it works fine.

@kgajdamo kgajdamo force-pushed the dist-link-temporal branch 4 times, most recently from 8da97bd to e6705bf Compare February 9, 2024 12:11
@rusty1s rusty1s enabled auto-merge (squash) February 9, 2024 20:25
@rusty1s rusty1s merged commit f2773e5 into pyg-team:master Feb 9, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants