Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

⚓🔍 NodePiece: GPU-enabled BFS searcher #990

Merged
merged 19 commits into from
Jun 22, 2022
Merged

⚓🔍 NodePiece: GPU-enabled BFS searcher #990

merged 19 commits into from
Jun 22, 2022

Conversation

migalkin
Copy link
Member

GPU-enabled BFS searcher SparseBFSSearcher with torch_sparse and distance tracking.
Uses sparse-dense matrix multiplication between bool tensors

@migalkin migalkin changed the title ⚓ 🔍 GPU-enabled BFS searcher ⚓ 🔍 NodePiece: GPU-enabled BFS searcher Jun 21, 2022
@mberr mberr changed the title ⚓ 🔍 NodePiece: GPU-enabled BFS searcher ⚓🔍 NodePiece: GPU-enabled BFS searcher Jun 21, 2022
@migalkin
Copy link
Member Author

migalkin commented Jun 21, 2022

Okay this takes an interesting turn, the same BFS procedure on CPU and GPU gives a different result 👀

On YAGO310 mining for 500 anchors / 20 per node, debugging on a laptop, tokenization succeeds:

WARNING:pykeen.nn.node_piece.tokenization:32/123143 (0.03%) do not have any anchor.

But running the same code on a GPU, nothing has been found:

WARNING:pykeen.nn.node_piece.anchor_search:Search converged after iteration 0 without all nodes being reachable.
WARNING:pykeen.nn.node_piece.tokenization:122643/123143 (99.59%) do not have any anchor.

@mberr
Copy link
Member

mberr commented Jun 21, 2022

WARNING:pykeen.nn.node_piece.anchor_search:Search converged after iteration 0 without all nodes being reachable.

sounds like it is an (too) early termination issue 🤔

@mberr
Copy link
Member

mberr commented Jun 21, 2022

btw, I could only run my checks on CPU today, and I never encountered the warning message about missing anchors.

@migalkin
Copy link
Member Author

btw, I could only run my checks on CPU today, and I never encountered the warning message about missing anchors.

oh maybe it's because I reduced max_iter to 5 for the sake of debugging speed up

@migalkin
Copy link
Member Author

migalkin commented Jun 21, 2022

Debugging on a GPU led me to a revelation: torch_sparse apparently has some bugs processing bool tensors 😢 The workaround that returns the same results as on CPU assumes converting dense tensors to float and then converting the result back to bool (but those tricks kinda remove all memory saving from working with bool tensors :( ):

reachable = spmm(
    index=edge_list, 
    value=values.float(), 
    m=num_entities, 
    n=num_entities, 
    matrix=reachable.float()
) > 0.0

I'll open an issue in the torch_sparse repo: rusty1s/pytorch_sparse#243

Or it can be a general issue with the spmm kernel with bool tensors because it looks like we see the same even with the ScipySparse searcher in the other PR

@mberr
Copy link
Member

mberr commented Jun 22, 2022

Debugging on a GPU led me to a revelation: torch_sparse apparently has some bugs processing bool tensors cry The workaround that returns the same results as on CPU assumes converting dense tensors to float and then converting the result back to bool (but those tricks kinda remove all memory saving from working with bool tensors :( ):

reachable = spmm(
    index=edge_list, 
    value=values.float(), 
    m=num_entities, 
    n=num_entities, 
    matrix=reachable.float()
) > 0.0

I'll open an issue in the torch_sparse repo: rusty1s/pytorch_sparse#243

Or it can be a general issue with the spmm kernel with bool tensors because it looks like we see the same even with the ScipySparse searcher in the other PR

Okay, then we should just directly use float instead of converting from/to float in each iteration, right? In that case, we can also use the torch-builtin spmm without requiring an extra dependency.

@migalkin
Copy link
Member Author

Okay, then we should just directly use float instead of converting from/to float in each iteration, right? In that case, we can also use the torch-builtin spmm without requiring an extra dependency.

In my experiments, torch_sparse (even its CPU version) is much faster than the vanilla torch sparse operators, I'd keep the dependency since it's a separate Searcher anyways and that issue with bools might be solved soon'ish

@migalkin
Copy link
Member Author

@cthoyt we are ready here :shipit:

@cthoyt cthoyt requested review from cthoyt and mberr June 22, 2022 22:09
Copy link
Member

@cthoyt cthoyt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great looking code - I left two minor comments for adding more context to two TODOs. One big question I was wondering about is why we do these operations in numpy? Is this because it has data structures and functionality that pytorch doesn't?

@cthoyt cthoyt self-requested a review June 22, 2022 22:30
@migalkin
Copy link
Member Author

great looking code - I left two minor comments for adding more context to two TODOs. One big question I was wondering about is why we do these operations in numpy? Is this because it has data structures and functionality that pytorch doesn't?

The original AnchorTokenizer (where we send the results of the Searcher) has further numpy processing interfaces, it is also connected to other searchers that employ scipy, and scipy accepts numpy arrays as inputs

@cthoyt cthoyt enabled auto-merge (squash) June 22, 2022 22:32
@cthoyt cthoyt self-requested a review June 22, 2022 22:53
@cthoyt cthoyt requested review from cthoyt and removed request for cthoyt June 22, 2022 23:26
@cthoyt cthoyt merged commit 00e1bf8 into master Jun 22, 2022
@cthoyt cthoyt deleted the bfs-gpu branch June 22, 2022 23:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants