Replies: 1 comment
-
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Two-Level Block/Tx Cache with Conflict Detection for OCC
Motivation
Cosmos SDK processes transactions sequentially by default. Every read from the
KV store requires protobuf unmarshalling, and every write requires marshalling —
operations that are both CPU-intensive and repeated across transactions within
the same block.
The goal of this proposal is twofold:
Phase 1 (current): Introduce a two-level block/tx cache implemented by
OptimisticStore(name kept for code compatibility) to eliminate redundantmarshal/unmarshal within a block while remaining fully deterministic and
compatible with the existing sequential execution model.
Phase 2 (future): Reuse the same conflict-tracking infrastructure to
enable Optimistic Concurrency Control (OCC) — parallel execution of
transactions with automatic conflict detection and rollback.
Problem Statement
The issue is not a classical race condition — it is a root of
non-determinism in optimistic (parallel) behaviour. When two transactions
execute in parallel and touch overlapping store keys, the final state depends on
execution timing rather than block ordering. This breaks consensus.
The Cosmos SDK's optional optimistic execution mode is not a universal,
one-size-fits-all feature. Its implementation is highly case-specific to the
module's access patterns. Gonka therefore implements its own OCC scheme tailored
to inference-chain workloads.
Design: OCC for Gonka
Scheduling
A scheduler takes N messages from the mempool (where N scales with CPU
cores). Messages are sorted and grouped by similarity — similar messages tend to
access similar store keys and have comparable execution times.
The selected N messages run as a parallel batch. The scheduler waits until
all messages in the batch complete, then takes the next batch.
Conflict Detection
During execution, each transaction's read-set and write-set are recorded by the
conflictTrackerinside everyOptimisticStore. After the batch completes:→ A must be rolled back and rescheduled.
(the one earlier in block order) must be rolled back and rescheduled.
Only the losing transactions are rescheduled to the next batch. When ordering
and batch-fill rules are deterministic, the entire parallel execution remains
deterministic across all validators.
Conflict Resolution Ordering
When two transactions in the same batch both write the same key, the winner is
determined by block order — the transaction that appears earlier in the
ordered block survives; the later one is rolled back. This ensures every
validator makes the same choice deterministically.
Grouping by Similarity
Grouping criterion is the predicted access pattern of the message type.
Two
MsgValidationfor the same model and epoch are"similar" — they access the same keys and take roughly the same time to execute.
Static analysis per message type provides the access prediction.
Most of hot messages will be gone when shardchains start to work,
but anyway this approach continues to work and will be useful.
Gas for Rescheduled Transactions
When a transaction is rolled back due to a conflict, its gas meter resets.
The rescheduled execution is a fresh attempt with a fresh meter. Cosmos gas
accounting therefore remains correct — the user's gas limit applies to the
successful execution, not to failed speculative attempts.
High-Contention Liveness
By design, there should be no persistent hot keys. Shared data is mostly read
and then written in deterministic flow, so sustained conflicts should not appear
in normal operation.
As a safety fallback only, if repeated conflicts still happen due to unexpected
workload shape, a transaction can be forced into sequential execution after
N retries (configurable), guaranteeing forward progress.
Throughput Improvement
Implementing this OCC scheme could yield N × k times better throughput for
the mainchain, where:
For workloads with low key overlap (e.g. inferences touching different models),
k approaches 1.0 and throughput scales nearly linearly with cores.
Phase 1: Two-Level Block/Tx Cache (Current Implementation)
Phase 1 is fully implemented and merged. There is no parallel scheduler yet.
All transactions still execute sequentially. This is not optimistic execution by
itself; it is a deterministic 2-level cache (tx draft + block cache) with
conflict detection that can be used later in optimistic concurrent mode.
This cache provides these benefits:
times within a block (e.g.
Params,EpochGroupData).transaction, so conflict detection can be enabled with a single environment
variable (
COSMOS_OCC_ENABLED=1).Shared System Data and Cache Scope
We use this 2-level block/tx cache for frequently read shared system data:
ParamsEpochGroupDataThese are hot system reads, so caching removes repeated codec overhead on the
same values inside a block and transaction flow.
Protobuf Marshal/Unmarshal and Gas
Cosmos SDK marshals/unmarshals protobuf data on store reads/writes, including as
part of store gas accounting. When reads/writes are served from this cache,
repeated codec operations are skipped and no extra gas is spent for those cached
operations in these system paths.
This is acceptable in our design because we already use no-gas or fixed-gas
policies for system messages (to reduce DDoS spam risk while keeping deterministic
execution). For example,
EpochGroupDatareads may be used by endpointprotection logic to identify participants slashed for missing inferences or
missing cPoCs; this is a system check where fast execution is preferred over
repeated marshalling/unmarshalling.
Architecture
context.Value. Created inAnteHandler, committed to block cache on txsuccess in
PostHandler, discarded on failure.first read (cache-aside pattern). Flushed to the persistent store backend in
EndBlock.Load/Save/Delete/Clone) thatwraps any persistent storage (collections map, raw KV, etc.).
Lifecycle: CheckTx vs DeliverTx
CheckTx (Mempool Validation)
During
CheckTx, the SDK validates a transaction before it enters the mempool.The optimistic store does not commit drafts during
CheckTx:This means
CheckTxsees the persistent store state (or block cache if alreadywarm), but its writes are discarded. This is correct because
CheckTxmust beside-effect-free.
DeliverTx (Block Execution)
During
DeliverTx, the full lifecycle executes:Branch Drafts
For operations that need speculative execution within a single tx (e.g.
CacheContextpatterns),OptimisticStoresupports branch drafts:The
StoreGroup.CacheContextmethod creates branch drafts for every registeredstore and returns a merged commit function.
How OptimisticStore Works Around Existing Storage
OptimisticStoreis a decorator — it wraps existing storage withoutreplacing it. The
StoreBackendinterface makes this explicit:Any existing
collections.Map,collections.Item, or raw KV store can bewrapped by providing these four functions.
Example: Wrapping a Collections Map
For
EpochGroupData, which is stored in acollections.Map[Pair[uint64,string], EpochGroupData]:NewOptimisticCollMapcreates thecollections.MapAND wraps it with anOptimisticStorein one call. TheLoad/Save/Deletefunctions delegate tothe collection;
Cloneusesproto.Clone.Example: Wrapping a Singleton (Params)
For
Params, stored as a single protobuf blob at a fixed KV key:NewOptimisticProtoItemhandles marshal/unmarshal internally.GetParamsandSetParamssimply callparamsStore.Get(ctx)/paramsStore.Set(ctx, val).Registering with the StoreGroup
Every optimistic store must be registered with the keeper's
StoreGroupsolifecycle methods (
InvalidateAll,FlushAll,WithDraftAll, etc.) apply toall stores uniformly:
Adding a New Optimistic Store to the Keeper
To wrap a new collection with optimistic caching:
Define a cache key type (must be
comparable):Add the optimistic store field to
Keeper:Initialize in
NewKeeperusingNewOptimisticCollMap(orNewOptimisticProtoItemfor singletons).Register with the store group:
Write getter/setter methods on
Keeperthat delegate to the store:No changes are needed to
AnteHandler,PostHandler,BeginBlock, orEndBlock— theStoreGrouphandles all registered stores automatically.Wiring: AnteHandler, PostHandler, BeginBlock, EndBlock
AnteHandler (
ante.go)PostHandler (
ante.go)BeginBlock (
module.go)EndBlock (
module.go)Phase 2 Roadmap: OCC Scheduler
Phase 2 will add a deterministic parallel scheduler. The current
conflictTrackerinsideOptimisticStorealready tracks per-tx read/write setsand can detect conflicts via
DetectConflicts(). What remains:execute in parallel.
DetectConflicts()oneach store, roll back losers (by block order), reschedule to next batch.
DeliverTxloop with the batchedscheduler; keep the same
AnteHandler/PostHandlerdraft lifecycle.The
OptimisticStoreAPI is already designed for this:No changes to the store, keeper, or cache layer are expected for Phase 2 — only
the addition of the scheduler and the
DeliverTxloop replacement.Beta Was this translation helpful? Give feedback.
All reactions