Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Roadmap] PyG for Recommendation 🚀 #8452

Closed
9 of 11 tasks
rusty1s opened this issue Nov 27, 2023 · 2 comments · Fixed by #8566
Closed
9 of 11 tasks

[Roadmap] PyG for Recommendation 🚀 #8452

rusty1s opened this issue Nov 27, 2023 · 2 comments · Fixed by #8566

Comments

@rusty1s
Copy link
Member

rusty1s commented Nov 27, 2023

🚀 The feature, motivation and pitch

This roadmap aims to bring better support for recommendation tasks to PyG.

Currently, all/most of our link prediction models are trained and evaluated using binary classification metrics. However, this usually requires that we have a set of candidates in advance, from which we can then infer the existence of links. This is not necessarily practical, since in most cases, we want to find the top-k most likely links from the full set of O(N^2) pairs.

While training can still be done via negative sampling and binary classification, this roadmap resolves around bringing better support for link prediction evaluation into PyG, with the following end-to-end pipeline:

  1. Embed all source and destination nodes
  2. Use "Maximum Inner Product Search" (MIPS) to find the top-k most likely links (via MIPSKNNIndex)
  3. Evaluate using common metrics for recommendation, e.g., map@k, precision@k, recall@k, f1@k, ndcg@k.

Metrics

We need to support recommendation metrics, which can be updated and computed in a mini-batch fashion. A related issue can be found here. Its interface can/should follow the torchmetrics.Metric interface, e.g.:

class LinkPredMetric(torchmetrics.Metric):
    def __init__(self, k: int):
        pass

    def update(self, top_k_pred_mat: Tensor, edge_label_index: Tensor):
        pass

    def compute(self):
        pass

where top_k_pred_mat holds the top-k indices for each left-hand-side (LHS) entity, and edge_label_index holds the ground-truth information as a [2, num_targets] matrix.

Examples

With this, we can build one or more clear and descriptive examples of how to leverage PyG for recommendation.

  • Select and implement one or two datasets commonly used for recommendation
  • Add exclusion logic to MIPSKNNIndex
  • Build an example that implements this pipeline
  • Write a tutorial about recommendation in PyG
  • Advanced: Combine PyG's recommendation capabilities with its temporal GNN support (see [Roadmap] Temporal Graph Support 🚀 #3230)
@benbates30
Copy link

@rusty1s I have done some similar work in the past extending PyG for better recommendation support (at a company where I couldn't contribute it to open source). Would love to contribute to this!

rusty1s added a commit that referenced this issue Dec 8, 2023
Partly fixes #8452.

I implemented it with an optional epsilon parameter to avoid zero
division.

PS: Is it better for me to start a new issue for this specifically, as
this only solves part of the issue?

---------

Co-authored-by: rusty1s <matthias.fey@tu-dortmund.de>
@rusty1s rusty1s reopened this Dec 8, 2023
@xnuohz
Copy link
Contributor

xnuohz commented Dec 20, 2023

As for recsys datasets, I think Taobao/IMDB/AmazonBook are good to use as an example. They are already supported in PyG datasets.

rusty1s added a commit that referenced this issue Dec 21, 2023
From #8452

---------

Co-authored-by: rusty1s <matthias.fey@tu-dortmund.de>
@rusty1s rusty1s closed this as completed Feb 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants