⚓🔍 NodePiece: GPU-enabled BFS searcher #990

migalkin · 2022-06-21T19:38:59Z

GPU-enabled BFS searcher SparseBFSSearcher with torch_sparse and distance tracking.
Uses sparse-dense matrix multiplication between bool tensors

src/pykeen/nn/node_piece/anchor_search.py

migalkin · 2022-06-21T20:18:22Z

Okay this takes an interesting turn, the same BFS procedure on CPU and GPU gives a different result 👀

On YAGO310 mining for 500 anchors / 20 per node, debugging on a laptop, tokenization succeeds:

WARNING:pykeen.nn.node_piece.tokenization:32/123143 (0.03%) do not have any anchor.

But running the same code on a GPU, nothing has been found:

WARNING:pykeen.nn.node_piece.anchor_search:Search converged after iteration 0 without all nodes being reachable.
WARNING:pykeen.nn.node_piece.tokenization:122643/123143 (99.59%) do not have any anchor.

mberr · 2022-06-21T20:37:45Z

WARNING:pykeen.nn.node_piece.anchor_search:Search converged after iteration 0 without all nodes being reachable.

sounds like it is an (too) early termination issue 🤔

mberr · 2022-06-21T20:38:27Z

btw, I could only run my checks on CPU today, and I never encountered the warning message about missing anchors.

migalkin · 2022-06-21T20:50:02Z

btw, I could only run my checks on CPU today, and I never encountered the warning message about missing anchors.

oh maybe it's because I reduced max_iter to 5 for the sake of debugging speed up

migalkin · 2022-06-21T21:48:01Z

Debugging on a GPU led me to a revelation: torch_sparse apparently has some bugs processing bool tensors 😢 The workaround that returns the same results as on CPU assumes converting dense tensors to float and then converting the result back to bool (but those tricks kinda remove all memory saving from working with bool tensors :( ):

reachable = spmm(
    index=edge_list, 
    value=values.float(), 
    m=num_entities, 
    n=num_entities, 
    matrix=reachable.float()
) > 0.0

I'll open an issue in the torch_sparse repo: rusty1s/pytorch_sparse#243

Or it can be a general issue with the spmm kernel with bool tensors because it looks like we see the same even with the ScipySparse searcher in the other PR

mberr · 2022-06-22T10:05:00Z

Debugging on a GPU led me to a revelation: torch_sparse apparently has some bugs processing bool tensors cry The workaround that returns the same results as on CPU assumes converting dense tensors to float and then converting the result back to bool (but those tricks kinda remove all memory saving from working with bool tensors :( ):
reachable = spmm(
    index=edge_list, 
    value=values.float(), 
    m=num_entities, 
    n=num_entities, 
    matrix=reachable.float()
) > 0.0
I'll open an issue in the torch_sparse repo: rusty1s/pytorch_sparse#243

Or it can be a general issue with the spmm kernel with bool tensors because it looks like we see the same even with the ScipySparse searcher in the other PR

Okay, then we should just directly use float instead of converting from/to float in each iteration, right? In that case, we can also use the torch-builtin spmm without requiring an extra dependency.

migalkin · 2022-06-22T15:59:54Z

Okay, then we should just directly use float instead of converting from/to float in each iteration, right? In that case, we can also use the torch-builtin spmm without requiring an extra dependency.

In my experiments, torch_sparse (even its CPU version) is much faster than the vanilla torch sparse operators, I'd keep the dependency since it's a separate Searcher anyways and that issue with bools might be solved soon'ish

migalkin · 2022-06-22T19:11:41Z

@cthoyt we are ready here

src/pykeen/nn/node_piece/anchor_search.py

cthoyt

great looking code - I left two minor comments for adding more context to two TODOs. One big question I was wondering about is why we do these operations in numpy? Is this because it has data structures and functionality that pytorch doesn't?

migalkin · 2022-06-22T22:30:58Z

great looking code - I left two minor comments for adding more context to two TODOs. One big question I was wondering about is why we do these operations in numpy? Is this because it has data structures and functionality that pytorch doesn't?

The original AnchorTokenizer (where we send the results of the Searcher) has further numpy processing interfaces, it is also connected to other searchers that employ scipy, and scipy accepts numpy arrays as inputs

GPU-enabled BFS searcher with torch_sparse and distance tracking

9361a3e

migalkin changed the title ~~⚓ 🔍 GPU-enabled BFS searcher~~ ⚓ 🔍 NodePiece: GPU-enabled BFS searcher Jun 21, 2022

migalkin added 3 commits June 21, 2022 15:42

docstrings

a2eff5a

unused import

3a26961

fix cuda -> numpy

d5e06d0

mberr reviewed Jun 21, 2022

View reviewed changes

src/pykeen/nn/node_piece/anchor_search.py Outdated Show resolved Hide resolved

mberr reviewed Jun 21, 2022

View reviewed changes

src/pykeen/nn/node_piece/anchor_search.py Outdated Show resolved Hide resolved

mberr reviewed Jun 21, 2022

View reviewed changes

src/pykeen/nn/node_piece/anchor_search.py Show resolved Hide resolved

mberr reviewed Jun 21, 2022

View reviewed changes

src/pykeen/nn/node_piece/anchor_search.py Outdated Show resolved Hide resolved

mberr reviewed Jun 21, 2022

View reviewed changes

src/pykeen/nn/node_piece/anchor_search.py Outdated Show resolved Hide resolved

mberr reviewed Jun 21, 2022

View reviewed changes

src/pykeen/nn/node_piece/anchor_search.py Outdated Show resolved Hide resolved

mberr changed the title ~~⚓ 🔍 NodePiece: GPU-enabled BFS searcher~~ ⚓🔍 NodePiece: GPU-enabled BFS searcher Jun 21, 2022

dtype to uint8

e5d6027

custom device in the init and bfs

36a257a

migalkin and others added 5 commits June 22, 2022 01:02

edge deduplication and spmm GPU workaround

ee3995d

Add star imports

14be7fd

Add stub test

3d3195f

Lint

12ff73f

Update anchor_search.py

be5f120

migalkin added 3 commits June 22, 2022 13:27

removed a solved todo; trigger CI

1df6895

removed an unsed variable; trigger CI

156aac0

Inheriting from the base AnchorSearcher to please the linter, trigger CI

5b3b38d

cthoyt requested review from cthoyt and mberr June 22, 2022 22:09

cthoyt reviewed Jun 22, 2022

View reviewed changes

src/pykeen/nn/node_piece/anchor_search.py Show resolved Hide resolved

cthoyt reviewed Jun 22, 2022

View reviewed changes

src/pykeen/nn/node_piece/anchor_search.py Outdated Show resolved Hide resolved

cthoyt approved these changes Jun 22, 2022

View reviewed changes

migalkin and others added 2 commits June 22, 2022 18:28

removed resolved TODOs

dc69145

Update common.yml

25a567f

cthoyt self-requested a review June 22, 2022 22:30

cthoyt approved these changes Jun 22, 2022

View reviewed changes

cthoyt enabled auto-merge (squash) June 22, 2022 22:32

Update common.yml

0bac2b1

cthoyt self-requested a review June 22, 2022 22:53

cthoyt added 2 commits June 23, 2022 01:21

Update common.yml

0eaa509

Skip sparse stuff on windows

e8496ef

cthoyt requested review from cthoyt and removed request for cthoyt June 22, 2022 23:26

cthoyt merged commit 00e1bf8 into master Jun 22, 2022

cthoyt deleted the bfs-gpu branch June 22, 2022 23:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚓🔍 NodePiece: GPU-enabled BFS searcher #990

⚓🔍 NodePiece: GPU-enabled BFS searcher #990

migalkin commented Jun 21, 2022

migalkin commented Jun 21, 2022 •

edited

mberr commented Jun 21, 2022 •

edited

mberr commented Jun 21, 2022 •

edited

migalkin commented Jun 21, 2022

migalkin commented Jun 21, 2022 •

edited

mberr commented Jun 22, 2022

migalkin commented Jun 22, 2022

migalkin commented Jun 22, 2022

cthoyt left a comment

migalkin commented Jun 22, 2022

⚓🔍 NodePiece: GPU-enabled BFS searcher #990

⚓🔍 NodePiece: GPU-enabled BFS searcher #990

Conversation

migalkin commented Jun 21, 2022

migalkin commented Jun 21, 2022 • edited

mberr commented Jun 21, 2022 • edited

mberr commented Jun 21, 2022 • edited

migalkin commented Jun 21, 2022

migalkin commented Jun 21, 2022 • edited

mberr commented Jun 22, 2022

migalkin commented Jun 22, 2022

migalkin commented Jun 22, 2022

cthoyt left a comment

Choose a reason for hiding this comment

migalkin commented Jun 22, 2022

migalkin commented Jun 21, 2022 •

edited

mberr commented Jun 21, 2022 •

edited

mberr commented Jun 21, 2022 •

edited

migalkin commented Jun 21, 2022 •

edited