New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use the whole graph adjacency matrix for link prediction task? #72
Comments
Hi, the graph object only contains training edges, so there is no information leak. |
Hi @weihua916 , Thanks for your reply. Do you mean that the graph object (PygLinkPropPredDataset()) only has training edges? PygLinkPropPredDataset() and data.adj_t have same number of edges. Then how did you extract test and valid edges? In dataset code, https://github.com/snap-stanford/ogb/blob/master/ogb/linkproppred/dataset_pyg.py#L67
|
1179052 * 2 = 2358104. |
Maybe there is something that I missed, so is |
https://ogb.stanford.edu/docs/nodeprop/ |
I thought that valid and test edges are from adj_t. then could you tell me where valid and test edges are from? |
They are from |
Oh, I mean, where In dataset code, https://github.com/snap-stanford/ogb/blob/master/ogb/linkproppred/dataset_pyg.py#L67 how did you create |
And I performed the validation to find overlapped edges between row, col, _ = data.adj_t.coo()
total_edge = torch.stack([row,col],dim=0).t()
for test_edge in split_edge['test']['edge']:
overlap_validation = (test_edge==total_edge).sum(dim=1)==2
if overlap_validation.max() >0:
overlap_idx = overlap_validation.nonzero().squeeze()
print(f'Test edge: {test_edge}')
print(f'Train edge: {total_edge[overlap_idx,:]}')
print('*'*20) |
The results: Test edge: tensor([105700, 201535]) Test edge: tensor([159698, 220004]) Test edge: tensor([ 10737, 220004]) Test edge: tensor([106042, 220004]) Test edge: tensor([131588, 161470]) Test edge: tensor([117925, 112050]) |
When I set |
The overlapping edges are expected in |
Because it is split by time (year). It makes sense now. Thanks for your help! |
https://github.com/snap-stanford/ogb/blob/master/examples/linkproppred/collab/gnn.py#L106
it seems that gnn model takes the whole adjacency matrix (data.adj_t).
but as far as I know, in the standard setting, gnn takes an incomplete set of edges (split_edge['train']['edge']) and predicts the rest (split_edge['valid'] and split_edge['test']).
Should I fix it? or could you please give me some reference for this setting?
I really appreciate the great commitment of you all.
The text was updated successfully, but these errors were encountered: