Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RandomLinkSplit causes data leakage when using bipartite undirected graph #9425

Open
sadrahkm opened this issue Jun 15, 2024 · 1 comment
Open
Labels

Comments

@sadrahkm
Copy link

sadrahkm commented Jun 15, 2024

🐛 Describe the bug

I am working on a task in which I have two types of nodes and the edges are only association, so it is considered a bipartite graph. I want this graph to be undirected so the message passing can be done in both directions. But I recently noticed that the documentation has mentioned that is_undirected option doesn't work when we have a bipartite graph, Did I understand this right?

If I am correct, so the example written in this blog post would be wrong. Because in that example, there is exactly a similar situation as mine (undirected bipartite graph), and the is_undirected=True cannot be used to avoid data leakage. If so, is there any way to fix this issue?

I would appreciate if you clarify since this I believe this is an important problem.

Versions

PyTorch version: 2.2.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: Debian GNU/Linux 12 (bookworm) (x86_64)
GCC version: (Debian 12.2.0-14) 12.2.0
Clang version: Could not collect
CMake version: version 3.25.1
Libc version: glibc-2.36

Python version: 3.11.2 (main, Mar 13 2023, 12:18:29) [GCC 12.2.0] (64-bit runtime)
Python platform: Linux-6.1.0-21-amd64-x86_64-with-glibc2.36
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA A16
GPU 1: NVIDIA A16
GPU 2: NVIDIA A16
GPU 3: NVIDIA A16

Nvidia driver version: 525.147.05
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
...

@sadrahkm sadrahkm added the bug label Jun 15, 2024
@sadrahkm sadrahkm changed the title RandomLinkSplit cause data leakage when using bipartite undirected graph RandomLinkSplit causes data leakage when using bipartite undirected graph Jun 15, 2024
@rusty1s
Copy link
Member

rusty1s commented Jun 24, 2024

For heterogeneous graphs, data leakage is prevent via specifying "reverse" edge types:

edge_types=("user", "rates", "movie"),
rev_edge_types=("movie", "rev_rates", "user")

This makes sure that links are eliminated in the reverse edge type as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants