GpuStorageNode Phase 2: type support, incoming sets, bulk ops by plankatron · Pull Request #3 · opencog/atomspace-gpu

plankatron · 2026-02-10T14:41:26Z

Summary

Type storage: Added uint16_t type column to word pool, repurposed pair_flags for link type. Golden-ratio type hashing prevents same-name-different-type collisions (ConceptNode "foo" vs SchemaNode "foo" now get separate GPU slots).
Incoming set scan: New gpu-incoming.cl (OpenCL) and cuda_incoming_scan (CUDA) kernels — parallel scan of pair pool with atomic match counter. fetchIncomingByType filters by link type.
Bulk operations: loadAtomSpace, loadType, storeAtomSpace iterate GPU pool slots directly.
3 new test files: BasicSaveUTest (round-trip), GpuIncomingUTest (incoming + loadType), GpuBulkUTest (multi-type, collision safety, bulk, throughput showcase).

Known limitations

Pair pool key: (min(a,b), max(a,b)) — one slot per node pair. Multiple link types between the same two nodes share a slot (last write wins). Tests use distinct node pairs per link type. See Design exploration: Full GPU AtomSpace — unified atom table, N-ary links, performance envelope #2 for design exploration on fixing this.
Store throughput: storeAtom does per-atom host→GPU transfers (~35K nodes/sec). GpuBulkUTest uses reduced scale to pass within CI timeout. Load path is fast (~700K atoms/sec) since it reads GPU pools in bulk.

Test results

16/16 pass (12 kernel tests + 4 StorageNode tests). The test-compartment-kernel benchmark is excluded from default timeout — it runs a full GPU learning simulation (~30 min).

Files changed (13)

File	Change
`opencog/gpu/CMakeLists.txt`	Register gpu-incoming.cl
`opencog/gpu/gpu-incoming.cl`	New — OpenCL incoming scan kernel
`opencog/persist/gpu/GpuBackend.h`	6 new type read/write methods
`opencog/persist/gpu/CudaBackend.h` + `.cu`	CUDA type column + incoming scan
`opencog/persist/gpu/OpenCLBackend.h` + `.cc`	OpenCL type column + incoming scan
`opencog/persist/gpu/GpuStorageNode.h`	TypedName maps, type-aware signatures
`opencog/persist/gpu/GpuStorageNode.cc`	Full rewrite: type-aware store/fetch/load
`tests/persist/gpu/CMakeLists.txt`	Register 3 new tests
`tests/persist/gpu/BasicSaveUTest.cxxtest`	New — basic round-trip test
`tests/persist/gpu/GpuIncomingUTest.cxxtest`	New — incoming set + loadType
`tests/persist/gpu/GpuBulkUTest.cxxtest`	New — 6 subtests: multi-type, collision, bulk, throughput

Test plan

16/16 tests pass locally (RTX 2070, CUDA backend)
Verify OpenCL-only path on non-NVIDIA hardware
Stress test at larger scale (10K+ atoms) outside CI

🤖 Generated with Claude Code

Previously, all nodes were stored as SCHEMA_NODE and all links as LIST_LINK. The atom Type was not preserved in GPU pools, causing same-name-different-type atoms to collide (e.g., ConceptNode "foo" and SchemaNode "foo" shared a single GPU slot). Type storage: - Added uint16_t type column to word pool (nodes) - Repurposed pair_flags field for link type storage - Mixed Type into hash key via golden-ratio hashing to prevent same-name collisions across different atom types - Updated all store/fetch/load paths to use stored types Incoming set scan (fetchIncomingByType): - New gpu-incoming.cl OpenCL kernel: parallel scan of pair pool - New cuda_incoming_scan CUDA kernel: same algorithm - One thread per pair slot, atomic counter for matches - Type filtering in fetchIncomingByType reconstructs only matching link types Bulk operations (loadAtomSpace, loadType, storeAtomSpace): - loadAtomSpace iterates all occupied word/pair pool slots - loadType filters by stored type column - storeAtomSpace delegates to per-atom storeAtom New tests: - BasicSaveUTest: adapted from RocksDB test, basic round-trip - GpuIncomingUTest: incoming set scan, type filtering, loadType - GpuBulkUTest: multi-type nodes/links, collision safety, bulk store/load, incoming at scale, throughput showcase Known limitations: - GPU pair pool keyed by (min(a,b), max(a,b)) — one slot per node pair. Multiple link types between the same two nodes share a slot (last write wins). - storeAtom is per-atom (individual host→GPU transfers), so bulk store at large scale (>1K atoms) is slow (~35K nodes/sec). GpuBulkUTest uses reduced scale to pass within CI timeout. Load path is fast (~700K atoms/sec) since it reads GPU pools in bulk. 16/16 tests pass (excluding slow compartment kernel benchmark). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

plankatron · 2026-02-12T21:19:13Z

Closing — builds on wrong foundation. The WordPool/PairPool/SectionPool design is domain-specific language learning, not a general GPU AtomSpace. Need to start from the actual atomspace/ and atoms/base/ source code per Linas' guidance.

plankatron closed this Feb 12, 2026

plankatron mentioned this pull request Feb 12, 2026

GPU AtomSpace: implement the real data structures (guided by Linas) #8

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GpuStorageNode Phase 2: type support, incoming sets, bulk ops#3

GpuStorageNode Phase 2: type support, incoming sets, bulk ops#3
plankatron wants to merge 1 commit intomainfrom
gpu-type-support

plankatron commented Feb 10, 2026

Uh oh!

plankatron commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

plankatron commented Feb 10, 2026

Summary

Known limitations

Test results

Files changed (13)

Test plan

Uh oh!

plankatron commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant