Dev by zaz · Pull Request #311 · geometric-intelligence/TopoBench

zaz · 2026-04-17T15:57:04Z

Work towards porting TunedGNN.

This PR is a draft, but builds upon some complete PRs already submitted. It will be rebased in the future.

Run `pre-commit run --all-files`.

Fixes linter warning: numpydoc-validation flagged mismatched underline lengths in docstring section headers.

Fixes linter warning: numpydoc-validation flagged GL08 (missing docstring) on 4 modules.

Run `pre-commit run --all-files`.

After adding these, running `pre-commit run --all-files` indicates no existing issues.

Run `codespell --write-changes`, then manually correct.

Fix grammar using Claude Haiku 4.5, then manually correct. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

Use `return 1` instead of `exit 1` so that sourcing the script with an invalid platform prints the error without killing the shell. Fixes geometric-intelligence#305

Verify that `source uv_env_setup.sh INVALID` does not kill the user's shell (uses return instead of exit).

Replace if/elif chain with case statement and string concatenation. No functional changes.

torch >= 2.6 defaults to weights_only=True, which breaks OGB's torch.load calls that serialize PyG data classes. Register the needed classes as safe globals so datasets load without errors. Guarded with hasattr for compatibility with torch < 2.6.

Prevents PyG's fs.torch_load from falling back to weights_only=False when loading processed datasets that contain numpy scalars and dtypes. Includes numpy.dtypes.*DType subclasses for numpy >= 1.25.

torch.tensor(existing_tensor) is deprecated; use .detach().clone(). nx.from_numpy_matrix is removed in NetworkX 3.0; use from_numpy_array.

Replace hardcoded TORCH_VER="2.3.0" with auto-detection so the setup script works with any torch version resolved by uv. Loosen torch pin from ==2.3.0 to >=2.3.0 to allow newer versions.

Add no-build-package to pyproject.toml to prevent uv from building these packages from PyPI sdists. This forces resolution from the PyG find-links wheels, which are pre-built for the correct PyTorch + CUDA version. Applies to all uv commands, not just the setup script. Remove extra-build-dependencies section (no longer needed since we never build from source).

Add pytorch-cu128 index to pyproject.toml and cu128 option to the setup script. This is required for newer GPUs (e.g. Blackwell architecture) that need CUDA 12.8+.

This only affects users doing a manual install; the setup script installs them via --all-extras. Making them optional avoids install failures for users who don't need NSD, ED-GNN, or point cloud lifting backbones, as these packages require pre-built wheels matching the exact PyTorch + CUDA version. Move top-level imports of torch_sparse, torch_scatter, and torch_cluster to lazy imports inside the functions that use them, so that importing topobench doesn't crash without the [sparse] extra. Add [sparse] to the [all] extra group.

Verify that importing topobench and triggering backbone auto-discovery works without the [sparse] extra installed.

Extract _generate_or_load_cached_splits, the shared "check-or-generate-and-save 10 fold .npz files, then load fold N" pattern from random_splitting and k_fold_split. Both callers now gate regeneration on the requested fold's file not existing (matching k_fold_split's prior behavior). Behavior changes vs random_splitting only at the edge case of "fold .npz missing from an existing split_dir": previously raised FileNotFoundError on load, now regenerates all 10 folds.

Two changes to _generate_or_load_cached_splits: 1. os.makedirs(exist_ok=True) instead of "if not isdir then makedirs", eliminating the time-of-check / time-of-use window between two processes both seeing split_dir as missing. 2. Atomic per-file writes: each .npz is written to a pid-suffixed tmp path then os.replace'd to its final name, so a concurrent reader either sees the old file, the new file, or no file at all, never a half-written one. np.savez is given a file object rather than a path because np.savez(path_str, ...) silently appends ".npz" to any path that does not already end in ".npz", which would break the subsequent os.replace. With deterministic seeding, two parallel writers produce byte-identical .npz contents, so last-writer-wins is safe. Fixes geometric-intelligence#310.

Adds test_split_dir_created_concurrently, which monkeypatches os.path.isdir so that when a pre-fix helper checked whether split_dir existed, the check quietly created the dir and then returned False, mimicking another worker winning the race. The fix uses os.makedirs(exist_ok=True) and handles this cleanly.

Adds test_npz_write_is_atomic, which monkeypatches np.savez to write a few bytes to the target and then raise, mimicking a process killed mid-serialization. The fix writes to a per-pid tmp path and os.replace's into place, so the canonical fold path must not exist after the simulated crash.

Adds fixed_splitting, which uses a dataset's built-in train_mask / val_mask / test_mask attributes rather than generating new splits. Supports both 1D masks (single split, e.g. Planetoid) and 2D masks (multi-split datasets like Heterophilic, where columns are folds). For 2D masks, data_seed selects the column via modulo. val_mask_attr lets WikiCS substitute stopping_mask for val_mask. .cpu() before .numpy() so CUDA-tensor masks work transparently.

Adds an "ogb" branch in load_transductive_splits that pulls the underlying dataset's split_idx (already provided by OGB dataset wrappers) and converts each tensor/array to a numpy array of node indices. Lets OGB datasets reuse the standard TopoBench transductive pipeline without resampling.

Adds class_balanced_splitting, which samples a fixed number of nodes per class for training (default 20/class, the standard Planetoid protocol) and uses the remaining nodes for fixed-size validation and test sets sampled uniformly. Reuses _generate_or_load_cached_splits, so the 10 generated splits are cached on disk and safe under parallel sweep workers.

Add dataset loaders for Amazon (Computer, Photo), Coauthor (CS, Physics), WikiCS, WikipediaNetwork (Chameleon, Squirrel with configurable geom-gcn preprocessing), and filtered Wikipedia (Chameleon, Squirrel). Each includes a Hydra YAML config.

Passthrough encoder that sets data.x_0 = data.x without modifying features. Useful when the backbone handles its own input projection (e.g. ConfigurableGNN with pre_linear=True).

Port TunedGNN (NeurIPS 2024) backbone as ConfigurableGNN, a composable GCN/GAT/SAGE with independently toggleable residual connections, layer/batch norm, JK aggregation, and pre-linear projection.

zaz and others added 30 commits April 10, 2026 15:56

chore: fix pre-existing ruff-format issues

230dd4b

Run `pre-commit run --all-files`.

chore: fix numpydoc section underline lengths

3311145

Fixes linter warning: numpydoc-validation flagged mismatched underline lengths in docstring section headers.

chore: add missing module docstrings to nsd_utils

73473f6

Fixes linter warning: numpydoc-validation flagged GL08 (missing docstring) on 4 modules.

chore: add end-of-file-fixer pre-commit hook

f593dad

chore: fix end-of-file issues in all tracked files

8b7e1f5

Run `pre-commit run --all-files`.

chore: add trailing-whitespace pre-commit hook

b52ede1

chore: strip trailing whitespace from all tracked files

c01b51d

Run `pre-commit run --all-files`.

chore: add check-ast, check-json, check-symlinks hooks

109ccbe

After adding these, running `pre-commit run --all-files` indicates no existing issues.

chore: fix spelling errors across codebase

d2c8ac9

Run `codespell --write-changes`, then manually correct.

chore: fix grammar errors across codebase

8d37acd

Fix grammar using Claude Haiku 4.5, then manually correct. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

fix: don't close user's terminal on invalid platform

193bac1

Use `return 1` instead of `exit 1` so that sourcing the script with an invalid platform prints the error without killing the shell. Fixes geometric-intelligence#305

test: regression test for geometric-intelligence#305

02bcc2c

Verify that `source uv_env_setup.sh INVALID` does not kill the user's shell (uses return instead of exit).

refactor: simplify platform selection in uv_env_setup.sh

1b244e5

Replace if/elif chain with case statement and string concatenation. No functional changes.

fix: add numpy types to torch.load safe globals

78380e8

Prevents PyG's fs.torch_load from falling back to weights_only=False when loading processed datasets that contain numpy scalars and dtypes. Includes numpy.dtypes.*DType subclasses for numpy >= 1.25.

fix: deprecation warnings

70959fc

torch.tensor(existing_tensor) is deprecated; use .detach().clone(). nx.from_numpy_matrix is removed in NetworkX 3.0; use from_numpy_array.

feat: auto-detect torch version for PyG wheel resolution

da969a2

Replace hardcoded TORCH_VER="2.3.0" with auto-detection so the setup script works with any torch version resolved by uv. Loosen torch pin from ==2.3.0 to >=2.3.0 to allow newer versions.

feat: add CUDA 12.8 support

4e65dd7

Add pytorch-cu128 index to pyproject.toml and cu128 option to the setup script. This is required for newer GPUs (e.g. Blackwell architecture) that need CUDA 12.8+.

test: add import-without-sparse test

9741e93

Verify that importing topobench and triggering backbone auto-discovery works without the [sparse] extra installed.

feat: add IdentityFeatureEncoder

de7663e

Passthrough encoder that sets data.x_0 = data.x without modifying features. Useful when the backbone handles its own input projection (e.g. ConfigurableGNN with pre_linear=True).

feat: add ConfigurableGNN backbone

74a55f8

Port TunedGNN (NeurIPS 2024) backbone as ConfigurableGNN, a composable GCN/GAT/SAGE with independently toggleable residual connections, layer/batch norm, JK aggregation, and pre-linear projection.

zaz marked this pull request as draft April 19, 2026 16:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dev#311

Dev#311
zaz wants to merge 31 commits into
geometric-intelligence:mainfrom
zaz:dev

zaz commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zaz commented Apr 17, 2026

Work towards porting TunedGNN.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant