use the whole graph adjacency matrix for link prediction task? #72

LeeJunHyun · 2020-10-02T11:21:09Z

https://github.com/snap-stanford/ogb/blob/master/examples/linkproppred/collab/gnn.py#L106

it seems that gnn model takes the whole adjacency matrix (data.adj_t).

but as far as I know, in the standard setting, gnn takes an incomplete set of edges (split_edge['train']['edge']) and predicts the rest (split_edge['valid'] and split_edge['test']).

Should I fix it? or could you please give me some reference for this setting?

I really appreciate the great commitment of you all.

weihua916 · 2020-10-02T21:29:11Z

Hi, the graph object only contains training edges, so there is no information leak.

LeeJunHyun · 2020-10-03T01:01:16Z

Hi @weihua916 ,

Thanks for your reply.

Do you mean that the graph object (PygLinkPropPredDataset()) only has training edges?

PygLinkPropPredDataset() and data.adj_t have same number of edges.

Then how did you extract test and valid edges?

In dataset code, https://github.com/snap-stanford/ogb/blob/master/ogb/linkproppred/dataset_pyg.py#L67
they are just loaded from the split path.

dataset
PygLinkPropPredDataset()

dataset[0]
Data(edge_index=[2, 2358104], edge_weight=[2358104, 1], edge_year=[2358104, 1], x=[235868, 128])

data.adj_t
SparseTensor(row=tensor([ 0, 0, 0, ..., 235867, 235867, 235867]),
col=tensor([ 20649, 21913, 46512, ..., 230251, 230251, 235583]),
val=tensor([1., 1., 1., ..., 1., 3., 2.]),
size=(235868, 235868), nnz=2358104, density=0.00%)

split_edge['train']['edge'].shape
torch.Size([1179052, 2])

LeeJunHyun · 2020-10-03T01:10:47Z

This is what I know as a standard-setting, so could you please give me some other references that you have?
(the below image is from GRL book)

weihua916 · 2020-10-03T01:14:03Z

1179052 * 2 = 2358104.

LeeJunHyun · 2020-10-03T01:16:38Z

Maybe there is something that I missed,

so is adj_t equal to split_edge['train']['edge']?

weihua916 · 2020-10-03T01:19:33Z

https://ogb.stanford.edu/docs/nodeprop/
Note: For undirected graphs, the loaded graphs will have a doubled number of edges because we add the bidirectional edges automatically.

LeeJunHyun · 2020-10-03T01:20:40Z

I thought that valid and test edges are from adj_t.

then could you tell me where valid and test edges are from?

weihua916 · 2020-10-03T01:23:13Z

They are from split_edge['valid']['edge'] and split_edge['test']['edge']

LeeJunHyun · 2020-10-03T01:25:40Z

Oh, I mean, where split_edge['valid']['edge'] and split_edge['test']['edge'] are from?

In dataset code, https://github.com/snap-stanford/ogb/blob/master/ogb/linkproppred/dataset_pyg.py#L67
they are just loaded from the split path.
(there are train.pt, valid.pt, and test.pt in the split path)

how did you create train.pt, valid.pt, and test.pt?

LeeJunHyun · 2020-10-03T02:02:28Z

And I performed the validation to find overlapped edges between data.adj_t and split_edge['test']['edge'].
There are overlapped edges in both train and test sets.
please let me know if there is another point that I'm still missing.

        row, col, _ = data.adj_t.coo()
        total_edge = torch.stack([row,col],dim=0).t()

        for test_edge in split_edge['test']['edge']:
            overlap_validation = (test_edge==total_edge).sum(dim=1)==2
            if overlap_validation.max() >0:
                overlap_idx = overlap_validation.nonzero().squeeze()
                print(f'Test edge: {test_edge}')
                print(f'Train edge: {total_edge[overlap_idx,:]}')
                print('*'*20)

LeeJunHyun · 2020-10-03T02:08:00Z

The results:

Test edge: tensor([105700, 201535])
Train edge: tensor([[105700, 201535],
[105700, 201535],
[105700, 201535],
[105700, 201535],
[105700, 201535]])

Test edge: tensor([159698, 220004])
Train edge: tensor([159698, 220004])

Test edge: tensor([ 10737, 220004])
Train edge: tensor([[ 10737, 220004],
[ 10737, 220004],
[ 10737, 220004]])

Test edge: tensor([106042, 220004])
Train edge: tensor([[106042, 220004],
[106042, 220004]])

Test edge: tensor([131588, 161470])
Train edge: tensor([[131588, 161470],
[131588, 161470]])

Test edge: tensor([117925, 112050])
Train edge: tensor([[117925, 112050],
[117925, 112050],
[117925, 112050],
[117925, 112050]])
...

LeeJunHyun · 2020-10-03T02:11:46Z

When I set total_edge=split_edge['train']['edge'], there are also overlapped edges (between split_edge['train']['edge'] and split_edge['test']['edge']).

weihua916 · 2020-10-03T03:22:11Z

The overlapping edges are expected in ogbl-collab. See https://ogb.stanford.edu/docs/linkprop/#ogbl-collab.

LeeJunHyun · 2020-10-03T04:15:06Z

Because it is split by time (year). It makes sense now.

Thanks for your help!

LeeJunHyun closed this as completed Oct 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use the whole graph adjacency matrix for link prediction task? #72

use the whole graph adjacency matrix for link prediction task? #72

LeeJunHyun commented Oct 2, 2020 •

edited

weihua916 commented Oct 2, 2020

LeeJunHyun commented Oct 3, 2020 •

edited

LeeJunHyun commented Oct 3, 2020 •

edited

weihua916 commented Oct 3, 2020

LeeJunHyun commented Oct 3, 2020 •

edited

weihua916 commented Oct 3, 2020

LeeJunHyun commented Oct 3, 2020

weihua916 commented Oct 3, 2020

LeeJunHyun commented Oct 3, 2020 •

edited

LeeJunHyun commented Oct 3, 2020

LeeJunHyun commented Oct 3, 2020

LeeJunHyun commented Oct 3, 2020

weihua916 commented Oct 3, 2020

LeeJunHyun commented Oct 3, 2020

use the whole graph adjacency matrix for link prediction task? #72

use the whole graph adjacency matrix for link prediction task? #72

Comments

LeeJunHyun commented Oct 2, 2020 • edited

weihua916 commented Oct 2, 2020

LeeJunHyun commented Oct 3, 2020 • edited

LeeJunHyun commented Oct 3, 2020 • edited

weihua916 commented Oct 3, 2020

LeeJunHyun commented Oct 3, 2020 • edited

weihua916 commented Oct 3, 2020

LeeJunHyun commented Oct 3, 2020

weihua916 commented Oct 3, 2020

LeeJunHyun commented Oct 3, 2020 • edited

LeeJunHyun commented Oct 3, 2020

LeeJunHyun commented Oct 3, 2020

LeeJunHyun commented Oct 3, 2020

weihua916 commented Oct 3, 2020

LeeJunHyun commented Oct 3, 2020

LeeJunHyun commented Oct 2, 2020 •

edited

LeeJunHyun commented Oct 3, 2020 •

edited

LeeJunHyun commented Oct 3, 2020 •

edited

LeeJunHyun commented Oct 3, 2020 •

edited

LeeJunHyun commented Oct 3, 2020 •

edited