Skip to content

Dev#311

Draft
zaz wants to merge 31 commits into
geometric-intelligence:mainfrom
zaz:dev
Draft

Dev#311
zaz wants to merge 31 commits into
geometric-intelligence:mainfrom
zaz:dev

Conversation

@zaz
Copy link
Copy Markdown
Collaborator

@zaz zaz commented Apr 17, 2026

Work towards porting TunedGNN.

This PR is a draft, but builds upon some complete PRs already submitted. It will be rebased in the future.

zaz and others added 30 commits April 10, 2026 15:56
Run `pre-commit run --all-files`.
Fixes linter warning: numpydoc-validation flagged mismatched
underline lengths in docstring section headers.
Fixes linter warning: numpydoc-validation flagged GL08 (missing
docstring) on 4 modules.
Run `pre-commit run --all-files`.
After adding these, running `pre-commit run --all-files` indicates
no existing issues.
Run `codespell --write-changes`, then manually correct.
Fix grammar using Claude Haiku 4.5, then manually correct.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Use `return 1` instead of `exit 1` so that sourcing the script
with an invalid platform prints the error without killing the shell.

Fixes geometric-intelligence#305
Verify that `source uv_env_setup.sh INVALID` does not kill the
user's shell (uses return instead of exit).
Replace if/elif chain with case statement and string concatenation.
No functional changes.
torch >= 2.6 defaults to weights_only=True, which breaks OGB's
torch.load calls that serialize PyG data classes. Register the
needed classes as safe globals so datasets load without errors.
Guarded with hasattr for compatibility with torch < 2.6.
Prevents PyG's fs.torch_load from falling back to weights_only=False
when loading processed datasets that contain numpy scalars and dtypes.
Includes numpy.dtypes.*DType subclasses for numpy >= 1.25.
torch.tensor(existing_tensor) is deprecated; use .detach().clone().
nx.from_numpy_matrix is removed in NetworkX 3.0; use from_numpy_array.
Replace hardcoded TORCH_VER="2.3.0" with auto-detection so the setup
script works with any torch version resolved by uv. Loosen torch pin
from ==2.3.0 to >=2.3.0 to allow newer versions.
Add no-build-package to pyproject.toml to prevent uv from building
these packages from PyPI sdists. This forces resolution from the PyG
find-links wheels, which are pre-built for the correct PyTorch + CUDA
version. Applies to all uv commands, not just the setup script.
Remove extra-build-dependencies section (no longer needed since we
never build from source).
Add pytorch-cu128 index to pyproject.toml and cu128 option to the
setup script. This is required for newer GPUs (e.g. Blackwell
architecture) that need CUDA 12.8+.
This only affects users doing a manual install; the setup script
installs them via --all-extras. Making them optional avoids install
failures for users who don't need NSD, ED-GNN, or point cloud lifting
backbones, as these packages require pre-built wheels matching the
exact PyTorch + CUDA version.

Move top-level imports of torch_sparse, torch_scatter, and
torch_cluster to lazy imports inside the functions that use them,
so that importing topobench doesn't crash without the [sparse] extra.
Add [sparse] to the [all] extra group.
Verify that importing topobench and triggering backbone auto-discovery
works without the [sparse] extra installed.
Extract _generate_or_load_cached_splits, the shared
"check-or-generate-and-save 10 fold .npz files, then load fold N"
pattern from random_splitting and k_fold_split.

Both callers now gate regeneration on the requested fold's file not
existing (matching k_fold_split's prior behavior). Behavior changes vs
random_splitting only at the edge case of "fold .npz missing from an
existing split_dir": previously raised FileNotFoundError on load, now
regenerates all 10 folds.
Two changes to _generate_or_load_cached_splits:

1. os.makedirs(exist_ok=True) instead of "if not isdir then makedirs",
   eliminating the time-of-check / time-of-use window between two
   processes both seeing split_dir as missing.

2. Atomic per-file writes: each .npz is written to a pid-suffixed tmp
   path then os.replace'd to its final name, so a concurrent reader
   either sees the old file, the new file, or no file at all, never a
   half-written one. np.savez is given a file object rather than a path
   because np.savez(path_str, ...) silently appends ".npz" to any path
   that does not already end in ".npz", which would break the subsequent
   os.replace.

With deterministic seeding, two parallel writers produce byte-identical
.npz contents, so last-writer-wins is safe.

Fixes geometric-intelligence#310.
Adds test_split_dir_created_concurrently, which monkeypatches
os.path.isdir so that when a pre-fix helper checked whether split_dir
existed, the check quietly created the dir and then returned False,
mimicking another worker winning the race. The fix uses
os.makedirs(exist_ok=True) and handles this cleanly.
Adds test_npz_write_is_atomic, which monkeypatches np.savez to write a
few bytes to the target and then raise, mimicking a process killed
mid-serialization. The fix writes to a per-pid tmp path and os.replace's
into place, so the canonical fold path must not exist after the
simulated crash.
Adds fixed_splitting, which uses a dataset's built-in train_mask /
val_mask / test_mask attributes rather than generating new splits.

Supports both 1D masks (single split, e.g. Planetoid) and 2D masks
(multi-split datasets like Heterophilic, where columns are folds). For
2D masks, data_seed selects the column via modulo. val_mask_attr lets
WikiCS substitute stopping_mask for val_mask. .cpu() before .numpy() so
CUDA-tensor masks work transparently.
Adds an "ogb" branch in load_transductive_splits that pulls the
underlying dataset's split_idx (already provided by OGB dataset
wrappers) and converts each tensor/array to a numpy array of node
indices. Lets OGB datasets reuse the standard TopoBench transductive
pipeline without resampling.
Adds class_balanced_splitting, which samples a fixed number of nodes per
class for training (default 20/class, the standard Planetoid protocol)
and uses the remaining nodes for fixed-size validation and test sets
sampled uniformly.

Reuses _generate_or_load_cached_splits, so the 10 generated splits are
cached on disk and safe under parallel sweep workers.
Add dataset loaders for Amazon (Computer, Photo), Coauthor (CS,
Physics), WikiCS, WikipediaNetwork (Chameleon, Squirrel with
configurable geom-gcn preprocessing), and filtered Wikipedia
(Chameleon, Squirrel). Each includes a Hydra YAML config.
Passthrough encoder that sets data.x_0 = data.x without modifying
features. Useful when the backbone handles its own input projection
(e.g. ConfigurableGNN with pre_linear=True).
Port TunedGNN (NeurIPS 2024) backbone as ConfigurableGNN, a composable
GCN/GAT/SAGE with independently toggleable residual connections,
layer/batch norm, JK aggregation, and pre-linear projection.
@zaz zaz marked this pull request as draft April 19, 2026 16:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant