Skip to content

GpuStorageNode Phase 2: type support, incoming sets, bulk ops#3

Closed
plankatron wants to merge 1 commit intomainfrom
gpu-type-support
Closed

GpuStorageNode Phase 2: type support, incoming sets, bulk ops#3
plankatron wants to merge 1 commit intomainfrom
gpu-type-support

Conversation

@plankatron
Copy link
Copy Markdown
Collaborator

Summary

  • Type storage: Added uint16_t type column to word pool, repurposed pair_flags for link type. Golden-ratio type hashing prevents same-name-different-type collisions (ConceptNode "foo" vs SchemaNode "foo" now get separate GPU slots).
  • Incoming set scan: New gpu-incoming.cl (OpenCL) and cuda_incoming_scan (CUDA) kernels — parallel scan of pair pool with atomic match counter. fetchIncomingByType filters by link type.
  • Bulk operations: loadAtomSpace, loadType, storeAtomSpace iterate GPU pool slots directly.
  • 3 new test files: BasicSaveUTest (round-trip), GpuIncomingUTest (incoming + loadType), GpuBulkUTest (multi-type, collision safety, bulk, throughput showcase).

Known limitations

  • Pair pool key: (min(a,b), max(a,b)) — one slot per node pair. Multiple link types between the same two nodes share a slot (last write wins). Tests use distinct node pairs per link type. See Design exploration: Full GPU AtomSpace — unified atom table, N-ary links, performance envelope #2 for design exploration on fixing this.
  • Store throughput: storeAtom does per-atom host→GPU transfers (~35K nodes/sec). GpuBulkUTest uses reduced scale to pass within CI timeout. Load path is fast (~700K atoms/sec) since it reads GPU pools in bulk.

Test results

16/16 pass (12 kernel tests + 4 StorageNode tests). The test-compartment-kernel benchmark is excluded from default timeout — it runs a full GPU learning simulation (~30 min).

Files changed (13)

File Change
opencog/gpu/CMakeLists.txt Register gpu-incoming.cl
opencog/gpu/gpu-incoming.cl New — OpenCL incoming scan kernel
opencog/persist/gpu/GpuBackend.h 6 new type read/write methods
opencog/persist/gpu/CudaBackend.h + .cu CUDA type column + incoming scan
opencog/persist/gpu/OpenCLBackend.h + .cc OpenCL type column + incoming scan
opencog/persist/gpu/GpuStorageNode.h TypedName maps, type-aware signatures
opencog/persist/gpu/GpuStorageNode.cc Full rewrite: type-aware store/fetch/load
tests/persist/gpu/CMakeLists.txt Register 3 new tests
tests/persist/gpu/BasicSaveUTest.cxxtest New — basic round-trip test
tests/persist/gpu/GpuIncomingUTest.cxxtest New — incoming set + loadType
tests/persist/gpu/GpuBulkUTest.cxxtest New — 6 subtests: multi-type, collision, bulk, throughput

Test plan

  • 16/16 tests pass locally (RTX 2070, CUDA backend)
  • Verify OpenCL-only path on non-NVIDIA hardware
  • Stress test at larger scale (10K+ atoms) outside CI

🤖 Generated with Claude Code

Previously, all nodes were stored as SCHEMA_NODE and all links as
LIST_LINK. The atom Type was not preserved in GPU pools, causing
same-name-different-type atoms to collide (e.g., ConceptNode "foo"
and SchemaNode "foo" shared a single GPU slot).

Type storage:
- Added uint16_t type column to word pool (nodes)
- Repurposed pair_flags field for link type storage
- Mixed Type into hash key via golden-ratio hashing to prevent
  same-name collisions across different atom types
- Updated all store/fetch/load paths to use stored types

Incoming set scan (fetchIncomingByType):
- New gpu-incoming.cl OpenCL kernel: parallel scan of pair pool
- New cuda_incoming_scan CUDA kernel: same algorithm
- One thread per pair slot, atomic counter for matches
- Type filtering in fetchIncomingByType reconstructs only
  matching link types

Bulk operations (loadAtomSpace, loadType, storeAtomSpace):
- loadAtomSpace iterates all occupied word/pair pool slots
- loadType filters by stored type column
- storeAtomSpace delegates to per-atom storeAtom

New tests:
- BasicSaveUTest: adapted from RocksDB test, basic round-trip
- GpuIncomingUTest: incoming set scan, type filtering, loadType
- GpuBulkUTest: multi-type nodes/links, collision safety,
  bulk store/load, incoming at scale, throughput showcase

Known limitations:
- GPU pair pool keyed by (min(a,b), max(a,b)) — one slot per
  node pair. Multiple link types between the same two nodes
  share a slot (last write wins).
- storeAtom is per-atom (individual host→GPU transfers), so
  bulk store at large scale (>1K atoms) is slow (~35K nodes/sec).
  GpuBulkUTest uses reduced scale to pass within CI timeout.
  Load path is fast (~700K atoms/sec) since it reads GPU pools
  in bulk.

16/16 tests pass (excluding slow compartment kernel benchmark).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@plankatron
Copy link
Copy Markdown
Collaborator Author

Closing — builds on wrong foundation. The WordPool/PairPool/SectionPool design is domain-specific language learning, not a general GPU AtomSpace. Need to start from the actual atomspace/ and atoms/base/ source code per Linas' guidance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant