Skip to content

Feature request: cache/reuse cuGraph graph projections across repeated GPU algorithm calls #978

@lmeyerov

Description

@lmeyerov

Summary

Separate from the RAPIDS graph-build regression, PyGraphistry should consider a local optimization feature for reusing a built cuGraph graph across repeated GPU algorithm calls on unchanged topology.

This is a product/perf feature request, not a bug report.

Motivation

We currently pay to_cugraph() / from_cudf_edgelist() graph-build cost each time a cuGraph algorithm is run unless the caller manually threads G=... through compute_cugraph() in graphistry/plugins/cugraph.py.

The current branch investigation showed:

  • graph build is often the dominant cost
  • the main RAPIDS regression is upstream-facing
  • but local reuse of an already-built G is still a valid PyGraphistry optimization for repeated algorithm calls

Scope

Intended scope:

  • repeated compute_cugraph() calls
  • repeated layout_cugraph() calls
  • GFQL CALL graphistry.cugraph.* reuse on unchanged topology

Explicitly out of scope:

  • generic GFQL MATCH / hop traversal
  • dataframe-based search stages
  • trying to fix the RAPIDS renumber regression by caching

Design direction

Potential design:

  • cache built cuGraph graph state on the Plottable
  • key it by:
    • edge table identity
    • source/destination bindings
    • edge-weight binding
    • directed
    • kind
    • relevant from_cudf_edgelist options
  • invalidate on topology or option changes

Likely invalidators:

  • edges(...)
  • filtering
  • hop()
  • rebinding source/destination/weight
  • reset_caches()

Existing hook

We already have a manual reuse hook:

  • compute_cugraph(..., G=...) in graphistry/plugins/cugraph.py

This request is about making that reuse practical and automatic when safe.

Requested outcome

  1. Decide whether this should be an internal optimization first or a public feature immediately.
  2. If implemented, keep it separate from the RAPIDS regression work.
  3. Add explicit tests for cache hits, misses, and invalidation on topology changes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions