You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This roadmap aims to bring better support for recommendation tasks to PyG.
Currently, all/most of our link prediction models are trained and evaluated using binary classification metrics. However, this usually requires that we have a set of candidates in advance, from which we can then infer the existence of links. This is not necessarily practical, since in most cases, we want to find the top-k most likely links from the full set of O(N^2) pairs.
While training can still be done via negative sampling and binary classification, this roadmap resolves around bringing better support for link prediction evaluation into PyG, with the following end-to-end pipeline:
Embed all source and destination nodes
Use "Maximum Inner Product Search" (MIPS) to find the top-k most likely links (via MIPSKNNIndex)
Evaluate using common metrics for recommendation, e.g., map@k, precision@k, recall@k, f1@k, ndcg@k.
Metrics
We need to support recommendation metrics, which can be updated and computed in a mini-batch fashion. A related issue can be found here. Its interface can/should follow the torchmetrics.Metric interface, e.g.:
where top_k_pred_mat holds the top-k indices for each left-hand-side (LHS) entity, and edge_label_index holds the ground-truth information as a [2, num_targets] matrix.
@rusty1s I have done some similar work in the past extending PyG for better recommendation support (at a company where I couldn't contribute it to open source). Would love to contribute to this!
Partly fixes#8452.
I implemented it with an optional epsilon parameter to avoid zero
division.
PS: Is it better for me to start a new issue for this specifically, as
this only solves part of the issue?
---------
Co-authored-by: rusty1s <matthias.fey@tu-dortmund.de>
🚀 The feature, motivation and pitch
This roadmap aims to bring better support for recommendation tasks to PyG.
Currently, all/most of our link prediction models are trained and evaluated using binary classification metrics. However, this usually requires that we have a set of candidates in advance, from which we can then infer the existence of links. This is not necessarily practical, since in most cases, we want to find the top-k most likely links from the full set of
O(N^2)
pairs.While training can still be done via negative sampling and binary classification, this roadmap resolves around bringing better support for link prediction evaluation into PyG, with the following end-to-end pipeline:
MIPSKNNIndex
)map@k
,precision@k
,recall@k
,f1@k
,ndcg@k
.Metrics
We need to support recommendation metrics, which can be updated and computed in a mini-batch fashion. A related issue can be found here. Its interface can/should follow the
torchmetrics.Metric
interface, e.g.:where
top_k_pred_mat
holds the top-k indices for each left-hand-side (LHS) entity, andedge_label_index
holds the ground-truth information as a[2, num_targets]
matrix.LinkPredMetric
interfacemap@k
precision@k
recall@k
f1@k
ndcg@k
(NDCG@k
metric for link-prediction #8326)Examples
With this, we can build one or more clear and descriptive examples of how to leverage PyG for recommendation.
MIPSKNNIndex
The text was updated successfully, but these errors were encountered: