OnDiskDataset for a large knowledge graph #10661
Unanswered
Yehor-Mishchyriak
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I am working on a project involving a large knowledge graph (1M+ nodes). Given the presence of node and edge features along with relatively high connectivity, keeping the graph in RAM/VRAM is not feasible, so I am exploring disk-backed alternatives.
From my understanding, the existing OnDiskDataset interface is intended for large collections of small-to-medium graphs, rather than a single large KG. As a result, it seems that the appropriate approach is to use the FeatureStore + GraphStore abstraction with a custom on-disk backend (e.g., Zarr), which requires a substantial amount of custom implementation.
I have a few questions:
1. Are there any existing alternatives to FeatureStore + GraphStore for working with large-scale knowledge graphs?
2. If no out-of-the-box solutions currently exist, are there any plans to support this use case more directly?
3. If I were to propose a generalized solution via a PR, would it be expected to rely only on existing PyG dependencies (i.e., avoid introducing something like Zarr)?
P.S. If there are existing issues or PRs addressing this, I would appreciate being pointed to them.
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions