Devshard E2E Test Automation Proposal #1334
Replies: 1 comment
-
|
@aikuznetsov It is using multiple devshardd, 1 devshardctl, 1 dapi-mock and 1 mock-chain dockers and doesn't use chain. It even already used for testing new height-sync protocol for devshard: #1209 The difference is that actually Also I had some thoughts on more high-level scripting over test-environment for creating test plans: https://github.com/a-kuprin/gonka/blob/devshard-testenv/devshard/docs/proposals/PROTOCOL_TESTING_PROPOSAL.md |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Goal
Build a real integration test layer for devshard that runs from Go tests but
validates the system across Docker containers, real HTTP networking, real
process boundaries, and real storage.
This suite should complement the existing unit, package, and
httptesttests.Those tests remain the fast correctness layer. The E2E suite verifies that the
same protocol works when the pieces are started, wired, restarted, and failed
like real services.
Scope
The test runner is Go. The runtime is Docker.
The suite should not depend on a live Cosmos chain, Testermint, or
decentralized-api. Chain-facing metadata is served by a local mock service.Inference and validation use deterministic stub engines unless a scenario
explicitly opts into a different backend.
Out of scope for the first version:
Those can be added later as separate profiles once the core protocol E2E layer
is stable.
Test Tools And Frameworks
The first E2E implementation should keep the toolchain small and Go-native.
Recommended tools:
testingtestcontainers-gostretchr/testifynet/httpplus devshard clientsdevshardctl, host transport routes, mock-chain controls, and diagnostic endpoints.encoding/jsonor existing devshard JSON helpersmaketargetsTools to avoid in the first version:
signing helpers, storage helpers, and existing assertions. Adding Python
would create a second test runtime before the E2E contract is stable.
testcontainers-gogiveseach Go test direct control over networks, containers, ports, logs, restarts,
and cleanup. Compose can still be useful later for manual reproduction.
protocol and transport behavior from chain startup, block production,
governance, and unrelated node failures.
mock-chaincovers the bridgecontract needed by devshard.
reproducible, and focused on protocol behavior rather than GPU/model
availability or generation quality.
Browser automation would add slow UI concerns that are not part of this
proposal.
Test Environment Structure
Each test starts an isolated Docker network. The Go test process stays outside
the network and controls the environment through Docker APIs and mapped service
ports.
The default smoke environment should spin up:
mock-chaincontainerdevshard-host-NcontainersdevshardctlcontainerpostgrescontainerStorage and fault scenarios add containers or volumes as needed:
flowchart LR TestRunner["Go E2E test runner"] Docker["Docker / testcontainers-go"] Client["HTTP assertions"] subgraph Net["isolated Docker network"] MockChain["mock-chain\nchain metadata + control API"] DevshardCtl["devshardctl\nOpenAI-compatible API"] Host0["devshard-host-0\nslot 0"] Host1["devshard-host-1\nslot 1"] Host2["devshard-host-2\nslot 2"] Postgres["postgres\nsmoke storage backend"] Vol0[("host-0 SQLite volume")] Vol1[("host-1 SQLite volume")] Vol2[("host-2 SQLite volume")] end TestRunner --> Docker TestRunner --> Client Client --> DevshardCtl Client -.direct protocol checks.-> Host0 Client -.direct protocol checks.-> Host1 Client -.direct protocol checks.-> Host2 DevshardCtl --> Host0 DevshardCtl --> Host1 DevshardCtl --> Host2 Host0 <-->|gossip| Host1 Host1 <-->|gossip| Host2 Host2 <-->|gossip| Host0 Host0 --> MockChain Host1 --> MockChain Host2 --> MockChain DevshardCtl --> MockChain Host0 --> Postgres Host1 --> Postgres Host2 --> Postgres Host0 -.sqlite profile.-> Vol0 Host1 -.sqlite profile.-> Vol1 Host2 -.sqlite profile.-> Vol2Container inventory:
mock-chaindevshard-host-NdevshardctlpostgresThe first implementation should standardize on a three-host group because many
protocol behaviors need a majority-like shape: executor rotation, timeout
votes, signature accumulation, and gossip convergence. The harness can expose
Hosts: Nlater for stress or edge-case tests.Runtime Services
Each E2E environment starts an isolated Docker network and a small set of
services.
mock-chainmock-chainis a local metadata service that implements the subset of mainnetbridge behavior needed by devshard.
The first implementation should match the current REST bridge shape exactly.
That keeps E2E focused on validating the bridge contract devshard already uses
instead of adding a second mock-only API. A cleaner internal control API can be
added alongside the REST-compatible endpoints later, but protocol setup and
recovery should continue to exercise the same paths as production code.
It serves deterministic local config for:
It should also expose a dev-only control API for test scenarios:
devshard-host-NEach host container runs one participant. The process should use the real
devshard host, transport, signing, storage, gossip, and state machine code.
Configurable inputs:
The host should expose the standard devshard transport routes, mounted under
either the legacy route prefix or a versioned prefix:
devshardctlThe suite should include scenarios that drive requests through the
OpenAI-compatible
devshardctlsurface. This validates the user-facing path:Some lower-level scenarios can talk directly to host transport endpoints when
that makes the assertion clearer, but the smoke suite should use
devshardctl.postgresPostgres is part of the smoke environment and should be the default storage
backend for CI smoke tests. SQLite remains useful for local restart tests and
single-host persistence edge cases.
Storage scenarios should cover:
Test Binaries
The E2E suite needs runnable commands that are small wrappers around existing
devshard packages.
Recommended commands:
devshardddevsharddruns one host participant.For the first E2E implementation,
devsharddshould be an E2E-only command.It should not be treated as a production binary yet. This keeps the first
iteration focused on integration validation, while leaving room to harden and
promote the command later if it becomes the right production shape.
It should wire:
For E2E,
devsharddcan start with stub inference and validation engines.The important point is that the protocol runtime itself is real.
mock-chainmock-chainserves local metadata and deterministic control behavior. Itshould start as a simple HTTP server matching the current REST bridge shape. If
devshard later moves to a different chain client protocol, the mock should
follow that boundary.
Fault Injection
Deterministic fault injection should be part of the test design from the
beginning. Without it, timeout and recovery tests become slow and flaky.
The first control surface should support:
Fault controls must be disabled unless the process is started in explicit test
mode.
Scenario Set
Smoke Scenarios
Smoke scenarios should be reliable and fast enough for every CI run.
Happy path
Start three hosts and
devshardctl. Send several non-streaming chatcompletion requests. Finalize the session. Assert the settlement output is
present and all hosts agree on the final state.
Streaming path
Send a streaming chat completion request through
devshardctl. Assert theclient receives content chunks and
[DONE]. Assert devshard protocolreceipt/meta events are handled internally and do not corrupt the
OpenAI-compatible stream.
Auth rejection
Send a protected host request signed by an unauthorized key. Assert the
request is rejected with an authorization error.
Protocol Scenarios
Gossip convergence
Submit work while all hosts are running. Assert nonce, mempool, and
signature data propagate between participants and converge.
Host catch-up
Let one host miss earlier diffs, then send it a later request with catch-up
diffs. Assert it reaches the same state root as the rest of the group.
Executor failure and timeout
Configure the selected executor to fail or hang. Assert timeout votes are
collected, the timeout transaction is applied, and the session can continue
or finalize according to protocol rules.
Receipt challenge
Withhold or lose the executor response path, then challenge the executor for
a receipt. Assert the receipt is valid and the user session can process it.
Recovery Scenarios
SQLite host restart
Run several inferences, restart one host container with its SQLite volume
preserved, continue the session, and finalize. Assert there is no nonce
regression and the restarted host signs the final state.
Postgres recovery
Run the happy path with Postgres storage enabled. Restart all hosts and
continue the session. Assert state recovery from Postgres works and
finalization succeeds.
All-host restart before finalization
Run several inferences, stop every host, restart them, then finalize.
Assert persisted diffs and signatures are sufficient to recover.
Version And Routing Scenarios
Legacy route prefix
Run a session through
/v1/devshard/*and assert the stored session versionis
v1.Versioned route prefix
Run a session through
/devshard/<version>/*and assert the stored sessionversion is the selected version.
Version conflict
Create or recover the same escrow under one version, then attempt to attach
the same escrow under a different version. Assert storage rejects the
conflict.
Chain Metadata Scenarios
Warm key authorization
Configure a warm key grant in
mock-chain. Assert the warm key canauthenticate where allowed and is rejected after the grant is removed or
when used for the wrong participant.
Bridge metadata failure
Inject a bridge metadata error during session creation or recovery. Assert
the host fails ready or returns the expected service-unavailable response.
Assertions
E2E tests should avoid asserting only HTTP status codes. Useful protocol-level
assertions include:
Settlement Contract
Until the E2E suite submits settlement to a live chain, the stable settlement
contract should be the protocol commitment needed for chain-side verification.
Baseline settlement assertions should cover:
Economic fields such as token accounting, fees, remaining balance, host costs,
missed counts, and validation penalties should be asserted only in dedicated
accounting scenarios. They should not be part of the baseline smoke settlement
contract until the chain submission path is part of the E2E suite.
CI Tiers
Use focused
go testruns rather than one large undifferentiated suite.CI should build the required Docker images through explicit
maketargetsbefore running the E2E suite. The Go tests should select already-built images
rather than building images per test run.
Example targets:
devshard-e2e-imagesshould be an explicit build target that produces theimages used by the tests, including
mock-chain,devshard-host, anddevshardctl. The E2E tests should fail fast if those images are missinginstead of silently rebuilding them inside individual test cases.
Recommended tiers:
Beta Was this translation helpful? Give feedback.
All reactions