OnDiskDataset for a large knowledge graph #10661

Yehor-Mishchyriak · 2026-04-06T07:02:26Z

Yehor-Mishchyriak
Apr 6, 2026

Hello,

I am working on a project involving a large knowledge graph (1M+ nodes). Given the presence of node and edge features along with relatively high connectivity, keeping the graph in RAM/VRAM is not feasible, so I am exploring disk-backed alternatives.

From my understanding, the existing OnDiskDataset interface is intended for large collections of small-to-medium graphs, rather than a single large KG. As a result, it seems that the appropriate approach is to use the FeatureStore + GraphStore abstraction with a custom on-disk backend (e.g., Zarr), which requires a substantial amount of custom implementation.

I have a few questions:
1. Are there any existing alternatives to FeatureStore + GraphStore for working with large-scale knowledge graphs?
2. If no out-of-the-box solutions currently exist, are there any plans to support this use case more directly?
3. If I were to propose a generalized solution via a PR, would it be expected to rely only on existing PyG dependencies (i.e., avoid introducing something like Zarr)?

P.S. If there are existing issues or PRs addressing this, I would appreciate being pointed to them.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OnDiskDataset for a large knowledge graph #10661

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

OnDiskDataset for a large knowledge graph #10661

Uh oh!

Yehor-Mishchyriak Apr 6, 2026

Replies: 0 comments

Yehor-Mishchyriak
Apr 6, 2026