Skip to content

chore: hardening code paths #39

Merged
tac0turtle merged 5 commits intomainfrom
marko/cleanup
Mar 13, 2026
Merged

chore: hardening code paths #39
tac0turtle merged 5 commits intomainfrom
marko/cleanup

Conversation

@tac0turtle
Copy link
Contributor

@tac0turtle tac0turtle commented Mar 8, 2026

Overview

Summary by CodeRabbit

  • New Features

    • Added apex-loadtest CLI for benchmarking JSON‑RPC throughput and WebSocket subscriber load.
  • Documentation

    • Added API Compatibility Policy and linked docs; noted gRPC may evolve independently.
  • Bug Fixes

    • Stronger conflict detection and preservation of buffered data during storage flush failures.
  • Chores / Refactor

    • Expanded and reorganized linter configuration; multiple internal refactors and tests updating protobuf accessors.

@coderabbitai
Copy link

coderabbitai bot commented Mar 8, 2026

Warning

Rate limit exceeded

@tac0turtle has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 1 minutes and 7 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 91e96bc4-3b57-41bb-aaf4-8328b87a71bb

📥 Commits

Reviewing files that changed from the base of the PR and between 659911e and 17b168d.

📒 Files selected for processing (4)
  • cmd/apex-loadtest/main.go
  • pkg/api/grpc/server_test.go
  • pkg/store/s3.go
  • pkg/sync/subscription.go
📝 Walkthrough

Walkthrough

Adds a load-testing CLI, refactors API layer to expose service helpers for blobs/headers, tightens store invariants with in-flight buffering and conflict checks, improves sqlite context usage, converts many error constructions, updates protobuf field accessors, expands linters, and adds API compatibility docs and tests.

Changes

Cohort / File(s) Summary
Tooling & Docs
\.golangci\.yml, README\.md, docs/api-compat\.md, docs/running\.md
Reorganized/enabled linters and rules, added gofumpt and revive rules, extended exclusions; added API Compatibility Policy and linked it in running guide; small README addition.
Load-testing CLI
cmd/apex-loadtest/main\.go
New CLI to benchmark JSON‑RPC throughput and WebSocket subscriber limits with concurrent workers, latency percentiles, and subscriber batching.
CLI / Listener
cmd/apex/main\.go, cmd/apex/client\.go
Centralized dataSourceTypeApp constant; switched to context-aware ListenConfig for gRPC; added gosec nolint on HTTP calls.
Configuration
config/load\.go, config/config\.go
Introduced configValueAuto, allowed "auto" token in validations, replaced fmt.Errorf with errors.New, minor nolint comment on AuthToken.
API Service & Adapters
pkg/api/service\.go, pkg/api/grpc/blob_service\.go, pkg/api/grpc/header_service\.go, pkg/api/jsonrpc/stubs\.go
Added Service-level helpers (GetBlob, GetBlobByCommitment, GetAllBlobs, GetHeaderByHeight, GetLocalHead, GetNetworkHead); gRPC/JSON-RPC handlers now call service methods; switched to protobuf getters; stub errors use errors.New; removed Store()/Fetcher() accessors.
API Tests
pkg/api/grpc/server_test\.go, pkg/api/jsonrpc/server_test\.go
Updated tests to use proto getters; replaced net.Listen with ListenConfig in test servers; added tests for namespace limits and subscription filtering/compatibility.
Fetchers & Protobuf Access
pkg/fetch/... (blobtx.go, celestia_app\.go, celestia_node\.go, tests)
Replaced direct protobuf field access with getters, standardized errors.New usage, adjusted height/hash formatting (strconv), small lint suppressions and test variable renames.
Backfill / DB Source
pkg/backfill/db/source\.go
Introduced backend constants (backendAuto/pebble/leveldb), used configValueAuto, moved to errors.New, adjusted path normalization and layout detection; some iteration changes.
S3 Store & Tests
pkg/store/s3\.go, pkg/store/s3_test\.go
Added in-flight buffering (flushBuffers) and inflight field; replaced deduplication with mergeUniqueBlobs; added invariant checks, restore/clear inflight flows; enhanced tests with per-key failure injection and flush-failure scenarios.
SQLite Store & Tests
pkg/store/sqlite\.go, pkg/store/sqlite_test\.go
Context-propagated initialization/migrations (configureSQLite, applyMigration), added ensureSQLiteBlobInvariant and context-aware queries, added tests enforcing conflicting-index and conflicting-commitment rejections.
Sync / Subscription
pkg/sync/subscription\.go, pkg/sync/subscription_test\.go
Refactored header processing into handleHeader and checkContiguous helpers; added test validating network height updates from streaming headers.
Misc Tests & Small Edits
pkg/fetch/blobtx_test\.go, pkg/fetch/celestia_app_test\.go, other small files
Minor renames and protobuf getter updates in tests; changed some fmt.Errorf to errors.New.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as apex-loadtest CLI
    participant RPC as JSON-RPC Endpoint
    participant WS as WebSocket Endpoint
    participant Indexer as Apex Indexer
    participant Store as Data Store

    rect rgba(100, 150, 200, 0.5)
    Note over CLI,Indexer: RPC throughput benchmark
    CLI->>+RPC: Spawn concurrent workers (POST requests)
    loop during rpc-duration
        RPC->>+Indexer: header.GetByHeight / header.LocalHead / blob.GetAll
        Indexer->>+Store: service methods (GetHeaderByHeight/GetAllBlobs)
        Store-->>-Indexer: header/blob data
        Indexer-->>-RPC: JSON-RPC response
        RPC->>RPC: record latencies (p50/p95/p99)
    end
    RPC-->>-CLI: report RPS & latency percentiles
    end

    rect rgba(150, 200, 100, 0.5)
    Note over CLI,Indexer: WebSocket subscriber stress test
    CLI->>+WS: open N connections (batched)
    loop during sub-duration
        WS->>+Indexer: header.Subscribe (ws)
        Indexer-->>WS: stream events
        WS->>WS: count events per connection
    end
    WS-->>-CLI: report total events & avg per subscriber
    end
Loading
sequenceDiagram
    participant Client as API Client
    participant Handler as gRPC/JSON-RPC Handler
    participant Service as Service Layer
    participant Store as Data Store
    participant Fetcher as Fetcher

    rect rgba(100, 150, 200, 0.5)
    Note over Client,Store: Old flow (direct access)
    Client->>+Handler: GetBlob request
    Handler->>+Store: Store().GetBlob(...)
    Store-->>-Handler: blob
    Handler-->>-Client: response
    end

    rect rgba(150, 100, 200, 0.5)
    Note over Client,Service: New flow (service abstraction)
    Client->>+Handler: GetBlob request
    Handler->>+Service: GetBlob(ctx, height, ns, commitment)
    Service->>+Store: internal access to store/fetcher
    Store-->>-Service: blob
    Service-->>-Handler: blob
    Handler-->>-Client: response
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

🐰 I hopped through code at break of dawn,

I buffered blobs and kept them drawn,
I tested websockets, timed each beat,
I taught the service layer to greet,
Linters tuned and docs aglow — our Apex leaps, now on we go!

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 19.10% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'chore: hardening code paths' is vague and does not accurately reflect the scope of changes. The PR involves significant additions across linters, new tooling, API compatibility documentation, service layer refactoring, and storage conflict handling—far broader than 'hardening code paths.' Consider a more specific title that reflects the main changes, such as 'refactor: add service layer APIs and strengthen conflict detection' or similar, if the intent is consolidation; otherwise, clarify what 'hardening code paths' specifically means.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch marko/cleanup
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@tac0turtle tac0turtle marked this pull request as ready for review March 13, 2026 10:08
@tac0turtle
Copy link
Contributor Author

@codex review

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
cmd/apex/main.go (1)

171-181: ⚠️ Potential issue | 🟠 Major

Stop reading auth token from config in command runtime path.

Line 174 and Line 180 still pass cfg.DataSource.AuthToken. In cmd/apex/**/*.go, token sourcing must be env-only.

🔐 Proposed fix
 func openDataSource(ctx context.Context, cfg *config.Config) (fetch.DataFetcher, fetch.ProofForwarder, error) {
+	if cfg.DataSource.AuthToken != "" {
+		return nil, nil, errors.New("data_source.auth_token is not supported; use APEX_AUTH_TOKEN")
+	}
+	authToken := os.Getenv("APEX_AUTH_TOKEN")
+
 	switch cfg.DataSource.Type {
 	case dataSourceTypeApp:
-		appFetcher, err := fetch.NewCelestiaAppFetcher(cfg.DataSource.CelestiaAppGRPCAddr, cfg.DataSource.AuthToken, log.Logger)
+		appFetcher, err := fetch.NewCelestiaAppFetcher(cfg.DataSource.CelestiaAppGRPCAddr, authToken, log.Logger)
 		if err != nil {
 			return nil, nil, fmt.Errorf("create celestia-app fetcher: %w", err)
 		}
 		return appFetcher, nil, nil
 	case "node", "":
-		nodeFetcher, err := fetch.NewCelestiaNodeFetcher(ctx, cfg.DataSource.CelestiaNodeURL, cfg.DataSource.AuthToken, log.Logger)
+		nodeFetcher, err := fetch.NewCelestiaNodeFetcher(ctx, cfg.DataSource.CelestiaNodeURL, authToken, log.Logger)
 		if err != nil {
 			return nil, nil, fmt.Errorf("connect to celestia node: %w", err)
 		}
 		return nodeFetcher, nodeFetcher, nil

As per coding guidelines: cmd/apex/**/*.go: Auth token must be provided via APEX_AUTH_TOKEN environment variable only, not in config file.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/apex/main.go` around lines 171 - 181, In openDataSource, stop passing
cfg.DataSource.AuthToken into fetch.NewCelestiaAppFetcher and
fetch.NewCelestiaNodeFetcher; instead read the token from the APEX_AUTH_TOKEN
environment variable (fail fast with a clear error if it's empty) and pass that
env token into those constructors (refer to openDataSource,
cfg.DataSource.AuthToken, fetch.NewCelestiaAppFetcher, and
fetch.NewCelestiaNodeFetcher to locate the calls to change).
pkg/api/service.go (1)

116-121: ⚠️ Potential issue | 🟡 Minor

Deduplicate namespaces before aggregating.

If the request repeats a namespace, this appends the same blob set multiple times and shifts aggregate limit/offset behavior. Treat the namespace filter as a set before querying.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/api/service.go` around lines 116 - 121, The code iterates over the
incoming namespaces slice and calls s.store.GetBlobs for each entry, causing
duplicated work and duplicated blob append when the same namespace appears
multiple times; fix by deduplicating the namespaces before the loop (e.g., build
a map[string]struct{} seen and produce a uniqueNamespaces slice or iterate the
map keys) and then call s.store.GetBlobs only for each unique namespace,
appending results into allBlobs; reference the namespaces variable, the loop
that calls s.store.GetBlobs, and the allBlobs accumulation in service.go when
applying the change.
🧹 Nitpick comments (11)
pkg/sync/subscription_test.go (1)

13-70: Convert this to a table-driven test case.

The test is currently a single inline scenario; please wrap it in a table-driven structure so adding more stream/network-height cases is straightforward.

Refactor sketch
 func TestSubscriptionManagerUpdatesNetworkHeightFromStream(t *testing.T) {
-	st := newMockStore()
-	ns := types.Namespace{0: 1}
-	...
-	if err := <-done; err != nil {
-		t.Fatalf("Run: %v", err)
-	}
+	tests := []struct {
+		name          string
+		initialLatest uint64
+		initialNet     uint64
+		incomingHeight uint64
+		wantLatest     uint64
+		wantNet        uint64
+	}{
+		{
+			name:          "updates network height from streamed header",
+			initialLatest: 5,
+			initialNet:    5,
+			incomingHeight: 6,
+			wantLatest:    6,
+			wantNet:       6,
+		},
+	}
+
+	for _, tc := range tests {
+		t.Run(tc.name, func(t *testing.T) {
+			st := newMockStore()
+			ns := types.Namespace{0: 1}
+			_ = st.PutNamespace(context.Background(), ns)
+			_ = st.SetSyncState(context.Background(), types.SyncStatus{
+				State:         types.Streaming,
+				LatestHeight:  tc.initialLatest,
+				NetworkHeight: tc.initialNet,
+			})
+			// existing run/stream/assert logic...
+		})
+	}
 }
As per coding guidelines, "Use table-driven tests pattern for test cases".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sync/subscription_test.go` around lines 13 - 70, Replace the
single-scenario TestSubscriptionManagerUpdatesNetworkHeightFromStream with a
table-driven test: define a slice of test cases (fields: name, initial
LatestHeight, initial NetworkHeight, fetcherStartHeight, incomingHeaderHeight,
wantLatest, wantNetwork), then loop over cases with t.Run(tc.name, func(t
*testing.T){...}) and move the setup logic (newMockStore(), PutNamespace,
SetSyncState, newMockFetcher, subCh hookup, context/cancel, SubscriptionManager
instantiation, goroutine running sm.Run(ctx), sending subCh <-
makeHeader(tc.incomingHeaderHeight), waiting loop that checks st.GetSyncState
for tc.wantLatest and tc.wantNetwork, cancel and assert error from done) inside
the subtest body so each case is isolated; keep existing helpers (newMockStore,
newMockFetcher, makeHeader, SubscriptionManager.Run) and preserve timeouts and
error handling per-case.
cmd/apex-loadtest/main.go (3)

176-186: Error counting may be inflated by non-request errors.

Incrementing errCount for io.Copy and resp.Body.Close errors conflates infrastructure issues with actual request errors. Additionally, a single request could increment errCount up to 3 times (copy, close, status code), skewing metrics.

Consider separating infrastructure errors from request errors, or only counting request-level failures:

Simplified error tracking
-				if _, copyErr := io.Copy(io.Discard, resp.Body); copyErr != nil {
-					errCount.Add(1)
-				}
-				if closeErr := resp.Body.Close(); closeErr != nil {
-					errCount.Add(1)
-				}
+				_, _ = io.Copy(io.Discard, resp.Body)
+				_ = resp.Body.Close()
 
 				total.Add(1)
 				if resp.StatusCode != 200 {
 					errCount.Add(1)
 				}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/apex-loadtest/main.go` around lines 176 - 186, The current logic
increments errCount for io.Copy failures, resp.Body.Close failures, and non-200
resp.StatusCode, which can inflate error metrics and count the same request
multiple times; change this by distinguishing infrastructure errors from request
errors (e.g., introduce infraErrCount for read/close failures) or only increment
errCount once per request when the request is considered failed (for example set
a local failed bool per request and increment errCount once if failed or
increment infraErrCount for io.Copy/resp.Body.Close separately); update
references in this block (errCount, total, io.Copy, resp.Body.Close,
resp.StatusCode) so total still increments per request but errCount is only
incremented for request-level failures (status code) and read/close errors are
tracked in a separate counter or logged without double-counting.

269-271: Silent return on SetReadDeadline error loses diagnostic information.

If setting the read deadline fails, the goroutine exits silently without logging, making it harder to diagnose connection issues during load testing.

Add diagnostic output
 			if err := s.conn.SetReadDeadline(time.Now().Add(duration + 5*time.Second)); err != nil {
+				fmt.Fprintf(os.Stderr, "  sub %d: SetReadDeadline failed: %v\n", s.id, err)
 				return
 			}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/apex-loadtest/main.go` around lines 269 - 271, Silent return from
s.conn.SetReadDeadline hides failures; replace the bare return with error
reporting and proper cleanup: when calling
s.conn.SetReadDeadline(time.Now().Add(duration + 5*time.Second)) capture the
error, log it with context (e.g., include duration and remote address) using the
existing logger or log.Printf, then close or clean up s.conn and return;
reference s.conn.SetReadDeadline and the surrounding goroutine/handler so the
fix adds a log statement and any necessary conn.Close() before returning.

11-27: Consider using spf13/cobra for CLI implementation.

This tool uses the standard flag package, but the project convention is to use spf13/cobra for CLI command implementation in cmd/apex/**/*.go. Additionally, rs/zerolog is the standard logging library for this project.

For a standalone load-testing tool, the current approach may be acceptable if this is intentionally kept lightweight with minimal dependencies. Otherwise, consider aligning with project conventions.

Based on learnings: "Use spf13/cobra for CLI command implementation" applies to cmd/apex/**/*.go. As per coding guidelines: "Use rs/zerolog for logging".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/apex-loadtest/main.go` around lines 11 - 27, The file currently uses the
standard flag package and likely a plain main entrypoint; replace the ad-hoc CLI
with spf13/cobra by creating a root cobra.Command (e.g., rootCmd) and wiring
existing flag definitions as persistent/local flags on that command, move flag
parsing into rootCmd.Execute() and call it from main; also switch logging to
rs/zerolog (e.g., import github.com/rs/zerolog and github.com/rs/zerolog/log and
replace any fmt/print/log usages with log.*) and update imports to remove the
standard flag package and add github.com/spf13/cobra, ensuring any subcommands
or configuration parsing in main are ported to cobra handlers.
docs/api-compat.md (1)

25-28: Document the actual GetAll limit here.

namespace cap is part of the external contract. If the limit is 16 today, spell that out so clients do not need to read the handler to know what will be rejected.

Suggested wording
-- `BlobService.GetAll` enforces a namespace cap; JSON-RPC `blob.GetAll` does not.
+- `BlobService.GetAll` enforces a 16-namespace cap; JSON-RPC `blob.GetAll` does not.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/api-compat.md` around lines 25 - 28, Update the API compatibility note
to state the exact namespace cap (e.g., "16 namespaces") instead of the vague
term "namespace cap"; explicitly note that BlobService.GetAll enforces a
16-namespace limit while JSON-RPC blob.GetAll does not, and ensure the text
references the enforcing handler (BlobService.GetAll) so clients know requests
above 16 will be rejected.
pkg/api/grpc/blob_service.go (1)

24-37: Mirror the empty-commitment guard in Get.

This handler validates namespace, but it still sends an empty commitment down to GetBlob. That makes a malformed request look like a lookup miss instead of InvalidArgument.

Proposed fix
 func (s *BlobServiceServer) Get(ctx context.Context, req *pb.GetRequest) (*pb.GetResponse, error) {
 	ns, err := bytesToNamespace(req.GetNamespace())
 	if err != nil {
 		return nil, status.Errorf(codes.InvalidArgument, "invalid namespace: %v", err)
 	}
+	if len(req.GetCommitment()) == 0 {
+		return nil, status.Error(codes.InvalidArgument, "commitment is required")
+	}
 
 	b, err := s.svc.GetBlob(ctx, req.GetHeight(), ns, req.GetCommitment())
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/api/grpc/blob_service.go` around lines 24 - 37, The Get handler currently
validates namespace but allows an empty commitment to be passed to
s.svc.GetBlob, causing malformed requests to be treated as lookup misses; update
BlobServiceServer.Get to check req.GetCommitment() (after bytesToNamespace) and
return status.Errorf(codes.InvalidArgument, "invalid commitment: empty") (or
similar) when the commitment is empty, so GetBlob is only called with a
non-empty commitment and true not-found errors still map to codes.NotFound.
pkg/api/jsonrpc/server_test.go (1)

279-310: Fold this into the existing blob.GetAll test table.

This is the same flow as TestJSONRPCBlobGetAll with a different namespace-list size. Moving both scenarios into a table-driven test keeps the compatibility matrix in one place.

As per coding guidelines, **/*_test.go: Use table-driven tests pattern for test cases.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/api/jsonrpc/server_test.go` around lines 279 - 310, The new test
TestJSONRPCBlobGetAllAllowsCompatibilitySizedNamespaceList duplicates the flow
in TestJSONRPCBlobGetAll with only a different namespace-list size; fold it into
a table-driven variant of TestJSONRPCBlobGetAll by adding a test case for the
17-namespace scenario (and any other sizes we want to cover), iterating the
cases with t.Run, and reusing the existing setup/verification logic
(newMockStore, building namespaces, api.NewNotifier/api.NewService/NewServer,
doRPC and result assertions) so there is a single TestJSONRPCBlobGetAll that
drives multiple namespace-size scenarios rather than a separate test function.
pkg/store/sqlite_test.go (1)

208-250: Collapse these conflict checks into a table-driven test.

These two cases only vary by which invariant is violated. One table-driven test will make the conflict matrix easier to extend without duplicating setup and assertions.

As per coding guidelines, **/*_test.go: Use table-driven tests pattern for test cases.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/store/sqlite_test.go` around lines 208 - 250, Collapse the two tests
TestPutBlobsRejectsConflictingIndex and TestPutBlobsRejectsConflictingCommitment
into one table-driven test that enumerates conflict cases (e.g., "conflicting
index" and "conflicting commitment"); keep the shared setup (openTestDB, ctx,
ns, creation of original blob) and for each case provide the conflicting
types.Blob (varying Index or Commitment) and expected failure, then iterate the
table with t.Run for each case and call s.PutBlobs to assert an error is
returned for the conflict; reference s.PutBlobs, types.Blob, and the
original/conflict variables so the code is consolidated and assertions remain
the same.
pkg/store/s3.go (2)

636-648: Minor pre-allocation inefficiency.

Line 636 allocates with capacity len(newBlobs), but this slice is immediately discarded at line 638 when existing != nil. Consider allocating after the existing check.

♻️ Suggested improvement
-	merged := make([]types.Blob, 0, len(newBlobs))
+	var merged []types.Blob
 	if existing != nil {
-		merged, err = decodeS3Blobs(existing)
+		var decodeErr error
+		merged, decodeErr = decodeS3Blobs(existing)
+		if decodeErr != nil {
+			return fmt.Errorf("decode existing blob chunk: %w", decodeErr)
+		}
+	} else {
+		merged = make([]types.Blob, 0, len(newBlobs))
+	}
-		if err != nil {
-			return fmt.Errorf("decode existing blob chunk: %w", err)
-		}
-	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/store/s3.go` around lines 636 - 648, The preallocation for merged
currently uses make([]types.Blob, 0, len(newBlobs)) before checking existing,
wasting that allocation when existing != nil; change the logic in the block
around merged, existing, decodeS3Blobs so you only allocate merged after the
existing check (i.e. if existing != nil call decodeS3Blobs to set merged, else
allocate merged with capacity len(newBlobs)), then append newBlobs and call
mergeUniqueBlobs as before; reference symbols: merged, existing, newBlobs,
decodeS3Blobs, mergeUniqueBlobs.

780-793: Silent error handling when parsing namespace.

At line 785, if types.NamespaceFromHex fails, the error is silently swallowed and the loop continues. While this falls through to S3 lookup safely, silently ignoring a parse error could mask data corruption.

Consider logging the error for observability (using rs/zerolog as per coding guidelines), even if the behavior remains unchanged.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/store/s3.go` around lines 780 - 793, In
S3Store.findCommitEntryBlobLocked, NamespaceFromHex errors are being ignored;
update the loop to log any parse errors using the project's zerolog logger
(e.g., s.logger or the store's logger) while keeping the existing fallback
behavior, including context such as commitHex and entry.pointer.Namespace and
the error returned by types.NamespaceFromHex so data corruption is observable;
do not change control flow or return values, only add an error-level log call
before continuing, and keep using s.findBlobInBufferLocked for successful
parses.
pkg/store/s3_test.go (1)

433-470: Consider consolidating conflict tests into table-driven format.

TestS3Store_RejectsConflictingBufferedBlob and TestS3Store_RejectsConflictingPersistedBlob test similar scenarios (buffered vs persisted conflicts). As per coding guidelines, table-driven tests are preferred.

♻️ Optional: Table-driven consolidation
func TestS3Store_RejectsConflictingBlob(t *testing.T) {
	tests := []struct {
		name        string
		flushFirst  bool
		wantErrAt   string // "put" or "flush"
	}{
		{"buffered_conflict", false, "put"},
		{"persisted_conflict", true, "flush"},
	}
	for _, tc := range tests {
		t.Run(tc.name, func(t *testing.T) {
			ctx := context.Background()
			s, _ := newTestS3Store(t)
			ns := testNamespace(1)

			original := types.Blob{Height: 3, Namespace: ns, Data: []byte("d1"), Commitment: []byte("c1"), Index: 0}
			conflict := types.Blob{Height: 3, Namespace: ns, Data: []byte("d2"), Commitment: []byte("c2"), Index: 0}

			if err := s.PutBlobs(ctx, []types.Blob{original}); err != nil {
				t.Fatalf("PutBlobs (original): %v", err)
			}
			if tc.flushFirst {
				if err := s.SetSyncState(ctx, types.SyncStatus{LatestHeight: 3}); err != nil {
					t.Fatalf("SetSyncState (original): %v", err)
				}
			}

			err := s.PutBlobs(ctx, []types.Blob{conflict})
			if tc.wantErrAt == "put" {
				if err == nil {
					t.Fatal("expected PutBlobs to fail")
				}
				return
			}
			if err != nil {
				t.Fatalf("PutBlobs (conflict): %v", err)
			}
			if err := s.SetSyncState(ctx, types.SyncStatus{LatestHeight: 3}); err == nil {
				t.Fatal("expected SetSyncState to fail")
			}
		})
	}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/store/s3_test.go` around lines 433 - 470, Two nearly identical tests
(TestS3Store_RejectsConflictingBufferedBlob and
TestS3Store_RejectsConflictingPersistedBlob) should be consolidated into a
table-driven test to reduce duplication. Replace them with a single
TestS3Store_RejectsConflictingBlob that iterates over cases (e.g.,
{"buffered_conflict", flushFirst:false, wantErrAt:"put"} and
{"persisted_conflict", flushFirst:true, wantErrAt:"flush"}), use newTestS3Store
and testNamespace to set up, call s.PutBlobs for the original blob, call
s.SetSyncState conditionally when flushFirst is true, then attempt the
conflicting s.PutBlobs and assert the error occurs at the expected point (either
the PutBlobs call or the subsequent SetSyncState flush); keep checks and error
messages referencing s.PutBlobs and s.SetSyncState so failures remain clear.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.golangci.yml:
- Around line 43-49: Replace the global gosec excludes under
linters.settings.gosec.excludes with scoped suppressions using
issues.exclude-rules (or inline `#nosec` annotations) so the suppressions apply
only to specific paths/messages; for each rule currently listed (e.g. G304,
G306, G704 — and optionally G104, G115, G117) add one or more
issues.exclude-rules entries that include a path regex (e.g. ^internal/config/),
linters: [gosec], and a text matcher that targets the rule or false-positive
message, or leave the rule enabled globally and add `#nosec G<rule>` at the
precise false-positive sites in the code (e.g. around config-loading in the
functions that handle file inclusion/permissions/HTTP clients).

In `@cmd/apex-loadtest/main.go`:
- Around line 344-351: The WebSocket dial in dialAndSubscribe currently uses
websocket.DefaultDialer (no timeout) and can hang indefinitely; replace it with
a new websocket.Dialer that sets HandshakeTimeout (e.g., 5-15s) and call Dial on
that dialer instead of websocket.DefaultDialer, preserving the existing
resp.Body close handling and returning the conn/error as before so dials fail
fast under network issues.
- Around line 125-130: The json.Marshal call currently discards errors when
building the rpcRequest (body, _ := json.Marshal(rpcRequest{...})), which can
hide future issues; change this to capture and handle the error from
json.Marshal (e.g., body, err := json.Marshal(rpcRequest{Jsonrpc:"2.0", Method:
method, Params: params, ID: 1}) and then check err), and return or log a clear
error including context (method/params/rpcRequest) instead of proceeding with a
nil/invalid body so callers of the code using the body variable get a proper
failure path.
- Around line 161-165: The call to http.NewRequestWithContext can return an
error and a nil req, so modify the request creation around NewRequestWithContext
in main.go to handle its error instead of ignoring it: capture the returned err
when calling http.NewRequestWithContext, if err != nil log/record the failure
(e.g., increment your failure counter or metrics used in the load test), and
continue to the next iteration rather than calling client.Do on a nil req;
update any surrounding variables (t0 timing, response handling in the loop)
accordingly so timing/metrics remain consistent when a request creation error
occurs.
- Around line 368-376: fetchCurrentHeight currently uses context.Background()
and http.DefaultClient which can block indefinitely; change fetchCurrentHeight
to create a cancellable context with a timeout (e.g. ctx, cancel :=
context.WithTimeout(context.Background(), 5*time.Second); defer cancel()) and
use that ctx when building the request (NewRequestWithContext) so the call times
out, and ensure you close resp.Body after a successful response; update
references inside fetchCurrentHeight accordingly.

In `@pkg/api/grpc/server_test.go`:
- Around line 243-249: The test currently only checks that client.GetAll returns
an error for too many namespaces; update the assertion to decode the gRPC error
with status.FromError and assert that the returned code equals
codes.InvalidArgument (the handler in blob_service.go returns this for oversized
namespace lists). Locate the GetAll call in the test and replace the loose
nil-check with extracting st, ok := status.FromError(err) and asserting ok is
true and st.Code() == codes.InvalidArgument so the test only passes for the
expected InvalidArgument gRPC error.

In `@pkg/fetch/celestia_app.go`:
- Around line 226-231: The full protobuf Header and BlockID are being
json.Marshal'd here which makes types.Header.RawHeader vary by fetch path;
extract the minimal-header envelope serialization used in
pkg/backfill/db/source.go into a shared helper (e.g.,
MarshalMinimalHeaderEnvelope(Header, BlockID) or BuildMinimalHeaderEnvelope)
that returns the same minimal envelope bytes/structure, then replace the current
inline envelope construction in celestia_app.go (the envelope using "header":
hdr and "commit": {"height":..., "block_id": blockID}) with a call to that
shared helper so types.Header.RawHeader is produced identically by both fetch
paths.

In `@pkg/store/s3.go`:
- Around line 211-217: The loop over blobs appends every blob into s.blobBuf
even when ensureBufferedBlobInvariant reports an identical blob already exists,
causing duplicates; change ensureBufferedBlobInvariant to return (exists bool,
err error) and update its callers so the caller in this loop (the code using
blobs, b, s.blobBuf, s.inflight.blobBuf, blobChunkKey and s.chunkStart) checks
the boolean and continues (skips append) when exists==true, only appending when
exists==false, and propagate errors normally; also update other call sites of
ensureBufferedBlobInvariant to handle the new return signature.

In `@pkg/sync/subscription.go`:
- Around line 66-72: The loop reads hdr.Height after calling sm.handleHeader
even when the subscription channel closed (ok==false) and handleHeader returned
nil, causing a panic because hdr is nil; fix by checking ok (or hdr != nil)
immediately after the call to handleHeader and before accessing hdr.Height or
incrementing processed: if ok is false (or hdr==nil) return/continue (respecting
ctx cancellation) so you don't dereference hdr; update the variables (lastHeight
= hdr.Height, processed++) only when hdr is non-nil. Ensure this check is
applied in the same block where nextNetworkHeight is assigned and uses
handleHeader.

---

Outside diff comments:
In `@cmd/apex/main.go`:
- Around line 171-181: In openDataSource, stop passing cfg.DataSource.AuthToken
into fetch.NewCelestiaAppFetcher and fetch.NewCelestiaNodeFetcher; instead read
the token from the APEX_AUTH_TOKEN environment variable (fail fast with a clear
error if it's empty) and pass that env token into those constructors (refer to
openDataSource, cfg.DataSource.AuthToken, fetch.NewCelestiaAppFetcher, and
fetch.NewCelestiaNodeFetcher to locate the calls to change).

In `@pkg/api/service.go`:
- Around line 116-121: The code iterates over the incoming namespaces slice and
calls s.store.GetBlobs for each entry, causing duplicated work and duplicated
blob append when the same namespace appears multiple times; fix by deduplicating
the namespaces before the loop (e.g., build a map[string]struct{} seen and
produce a uniqueNamespaces slice or iterate the map keys) and then call
s.store.GetBlobs only for each unique namespace, appending results into
allBlobs; reference the namespaces variable, the loop that calls
s.store.GetBlobs, and the allBlobs accumulation in service.go when applying the
change.

---

Nitpick comments:
In `@cmd/apex-loadtest/main.go`:
- Around line 176-186: The current logic increments errCount for io.Copy
failures, resp.Body.Close failures, and non-200 resp.StatusCode, which can
inflate error metrics and count the same request multiple times; change this by
distinguishing infrastructure errors from request errors (e.g., introduce
infraErrCount for read/close failures) or only increment errCount once per
request when the request is considered failed (for example set a local failed
bool per request and increment errCount once if failed or increment
infraErrCount for io.Copy/resp.Body.Close separately); update references in this
block (errCount, total, io.Copy, resp.Body.Close, resp.StatusCode) so total
still increments per request but errCount is only incremented for request-level
failures (status code) and read/close errors are tracked in a separate counter
or logged without double-counting.
- Around line 269-271: Silent return from s.conn.SetReadDeadline hides failures;
replace the bare return with error reporting and proper cleanup: when calling
s.conn.SetReadDeadline(time.Now().Add(duration + 5*time.Second)) capture the
error, log it with context (e.g., include duration and remote address) using the
existing logger or log.Printf, then close or clean up s.conn and return;
reference s.conn.SetReadDeadline and the surrounding goroutine/handler so the
fix adds a log statement and any necessary conn.Close() before returning.
- Around line 11-27: The file currently uses the standard flag package and
likely a plain main entrypoint; replace the ad-hoc CLI with spf13/cobra by
creating a root cobra.Command (e.g., rootCmd) and wiring existing flag
definitions as persistent/local flags on that command, move flag parsing into
rootCmd.Execute() and call it from main; also switch logging to rs/zerolog
(e.g., import github.com/rs/zerolog and github.com/rs/zerolog/log and replace
any fmt/print/log usages with log.*) and update imports to remove the standard
flag package and add github.com/spf13/cobra, ensuring any subcommands or
configuration parsing in main are ported to cobra handlers.

In `@docs/api-compat.md`:
- Around line 25-28: Update the API compatibility note to state the exact
namespace cap (e.g., "16 namespaces") instead of the vague term "namespace cap";
explicitly note that BlobService.GetAll enforces a 16-namespace limit while
JSON-RPC blob.GetAll does not, and ensure the text references the enforcing
handler (BlobService.GetAll) so clients know requests above 16 will be rejected.

In `@pkg/api/grpc/blob_service.go`:
- Around line 24-37: The Get handler currently validates namespace but allows an
empty commitment to be passed to s.svc.GetBlob, causing malformed requests to be
treated as lookup misses; update BlobServiceServer.Get to check
req.GetCommitment() (after bytesToNamespace) and return
status.Errorf(codes.InvalidArgument, "invalid commitment: empty") (or similar)
when the commitment is empty, so GetBlob is only called with a non-empty
commitment and true not-found errors still map to codes.NotFound.

In `@pkg/api/jsonrpc/server_test.go`:
- Around line 279-310: The new test
TestJSONRPCBlobGetAllAllowsCompatibilitySizedNamespaceList duplicates the flow
in TestJSONRPCBlobGetAll with only a different namespace-list size; fold it into
a table-driven variant of TestJSONRPCBlobGetAll by adding a test case for the
17-namespace scenario (and any other sizes we want to cover), iterating the
cases with t.Run, and reusing the existing setup/verification logic
(newMockStore, building namespaces, api.NewNotifier/api.NewService/NewServer,
doRPC and result assertions) so there is a single TestJSONRPCBlobGetAll that
drives multiple namespace-size scenarios rather than a separate test function.

In `@pkg/store/s3_test.go`:
- Around line 433-470: Two nearly identical tests
(TestS3Store_RejectsConflictingBufferedBlob and
TestS3Store_RejectsConflictingPersistedBlob) should be consolidated into a
table-driven test to reduce duplication. Replace them with a single
TestS3Store_RejectsConflictingBlob that iterates over cases (e.g.,
{"buffered_conflict", flushFirst:false, wantErrAt:"put"} and
{"persisted_conflict", flushFirst:true, wantErrAt:"flush"}), use newTestS3Store
and testNamespace to set up, call s.PutBlobs for the original blob, call
s.SetSyncState conditionally when flushFirst is true, then attempt the
conflicting s.PutBlobs and assert the error occurs at the expected point (either
the PutBlobs call or the subsequent SetSyncState flush); keep checks and error
messages referencing s.PutBlobs and s.SetSyncState so failures remain clear.

In `@pkg/store/s3.go`:
- Around line 636-648: The preallocation for merged currently uses
make([]types.Blob, 0, len(newBlobs)) before checking existing, wasting that
allocation when existing != nil; change the logic in the block around merged,
existing, decodeS3Blobs so you only allocate merged after the existing check
(i.e. if existing != nil call decodeS3Blobs to set merged, else allocate merged
with capacity len(newBlobs)), then append newBlobs and call mergeUniqueBlobs as
before; reference symbols: merged, existing, newBlobs, decodeS3Blobs,
mergeUniqueBlobs.
- Around line 780-793: In S3Store.findCommitEntryBlobLocked, NamespaceFromHex
errors are being ignored; update the loop to log any parse errors using the
project's zerolog logger (e.g., s.logger or the store's logger) while keeping
the existing fallback behavior, including context such as commitHex and
entry.pointer.Namespace and the error returned by types.NamespaceFromHex so data
corruption is observable; do not change control flow or return values, only add
an error-level log call before continuing, and keep using
s.findBlobInBufferLocked for successful parses.

In `@pkg/store/sqlite_test.go`:
- Around line 208-250: Collapse the two tests
TestPutBlobsRejectsConflictingIndex and TestPutBlobsRejectsConflictingCommitment
into one table-driven test that enumerates conflict cases (e.g., "conflicting
index" and "conflicting commitment"); keep the shared setup (openTestDB, ctx,
ns, creation of original blob) and for each case provide the conflicting
types.Blob (varying Index or Commitment) and expected failure, then iterate the
table with t.Run for each case and call s.PutBlobs to assert an error is
returned for the conflict; reference s.PutBlobs, types.Blob, and the
original/conflict variables so the code is consolidated and assertions remain
the same.

In `@pkg/sync/subscription_test.go`:
- Around line 13-70: Replace the single-scenario
TestSubscriptionManagerUpdatesNetworkHeightFromStream with a table-driven test:
define a slice of test cases (fields: name, initial LatestHeight, initial
NetworkHeight, fetcherStartHeight, incomingHeaderHeight, wantLatest,
wantNetwork), then loop over cases with t.Run(tc.name, func(t *testing.T){...})
and move the setup logic (newMockStore(), PutNamespace, SetSyncState,
newMockFetcher, subCh hookup, context/cancel, SubscriptionManager instantiation,
goroutine running sm.Run(ctx), sending subCh <-
makeHeader(tc.incomingHeaderHeight), waiting loop that checks st.GetSyncState
for tc.wantLatest and tc.wantNetwork, cancel and assert error from done) inside
the subtest body so each case is isolated; keep existing helpers (newMockStore,
newMockFetcher, makeHeader, SubscriptionManager.Run) and preserve timeouts and
error handling per-case.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ef86e3e4-1bcf-4f9f-ac33-b4557a46eb84

📥 Commits

Reviewing files that changed from the base of the PR and between a51d28b and 8469aa7.

📒 Files selected for processing (25)
  • .golangci.yml
  • README.md
  • cmd/apex-loadtest/main.go
  • cmd/apex/main.go
  • config/load.go
  • docs/api-compat.md
  • docs/running.md
  • pkg/api/grpc/blob_service.go
  • pkg/api/grpc/header_service.go
  • pkg/api/grpc/server_test.go
  • pkg/api/jsonrpc/server_test.go
  • pkg/api/jsonrpc/stubs.go
  • pkg/api/service.go
  • pkg/backfill/db/source.go
  • pkg/fetch/blobtx.go
  • pkg/fetch/blobtx_test.go
  • pkg/fetch/celestia_app.go
  • pkg/fetch/celestia_app_test.go
  • pkg/fetch/celestia_node.go
  • pkg/store/s3.go
  • pkg/store/s3_test.go
  • pkg/store/sqlite.go
  • pkg/store/sqlite_test.go
  • pkg/sync/subscription.go
  • pkg/sync/subscription_test.go

.golangci.yml Outdated
Comment on lines +43 to +49
excludes:
- G104 # unhandled errors — covered by errcheck (default linter)
- G115 # integer overflow conversion — too noisy for blockchain height math
- G117 # struct field name matches secret pattern — false positive
- G304 # file inclusion via variable — expected for config loading
- G306 # file permissions > 0600 — config templates, not secrets
- G704 # SSRF via taint — HTTP clients use configured URLs, not user input
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

For golangci-lint v2 + gosec, confirm (1) whether G704 is a valid supported gosec rule ID, and (2) the recommended config pattern to scope gosec suppressions by path/message instead of global linters.settings.gosec.excludes.

💡 Result:

  1. Yes — G704 is a valid/supported gosec rule ID.
    It’s listed in gosec’s rule documentation as “G704 — SSRF via taint analysis” [2]. golangci-lint’s embedded gosec has also added support for the new G70x rules including G704 (via the gosec bump noted in the golangci-lint changelog) [1].

  2. Recommended pattern in golangci-lint v2 to scope gosec suppressions by path/message (instead of global linters.settings.gosec.excludes): use issues.exclude-rules.
    issues.exclude-rules lets you match by path/path-except plus text (message regex) and limit it to linters: [gosec] [3]. Example:

version: "2"

issues:
  exclude-rules:
    - path: ^internal/legacy/
      linters: [gosec]
      text: "G704|SSRF via taint analysis"

For single-call / local suppressions, prefer inline #nosec annotations with the specific rule ID (e.g. #nosec G704) rather than config-wide excludes [4].


Scope gosec suppressions by path/message using issues.exclude-rules instead of global linters.settings.gosec.excludes.

Global excludes for rules like G304 (file inclusion), G306 (file permissions), and G704 (SSRF) weaken repo-wide security guarantees. Use issues.exclude-rules with path and text matchers, or prefer inline #nosec G<rule> annotations at specific false-positive sites.

Example: path-scoped gosec suppressions
issues:
  exclude-rules:
    - path: ^internal/config/
      linters: [gosec]
      text: "G304|G306"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.golangci.yml around lines 43 - 49, Replace the global gosec excludes under
linters.settings.gosec.excludes with scoped suppressions using
issues.exclude-rules (or inline `#nosec` annotations) so the suppressions apply
only to specific paths/messages; for each rule currently listed (e.g. G304,
G306, G704 — and optionally G104, G115, G117) add one or more
issues.exclude-rules entries that include a path regex (e.g. ^internal/config/),
linters: [gosec], and a text matcher that targets the rule or false-positive
message, or leave the rule enabled globally and add `#nosec G<rule>` at the
precise false-positive sites in the code (e.g. around config-loading in the
functions that handle file inclusion/permissions/HTTP clients).

Comment on lines +125 to +130
body, _ := json.Marshal(rpcRequest{
Jsonrpc: "2.0",
Method: method,
Params: params,
ID: 1,
})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Ignored json.Marshal error could mask issues.

While rpcRequest has safe types, ignoring the error removes diagnostic capability if the struct evolves. Consider handling it:

Proposed fix
-	body, _ := json.Marshal(rpcRequest{
+	body, err := json.Marshal(rpcRequest{
 		Jsonrpc: "2.0",
 		Method:  method,
 		Params:  params,
 		ID:      1,
 	})
+	if err != nil {
+		return benchStats{}
+	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
body, _ := json.Marshal(rpcRequest{
Jsonrpc: "2.0",
Method: method,
Params: params,
ID: 1,
})
body, err := json.Marshal(rpcRequest{
Jsonrpc: "2.0",
Method: method,
Params: params,
ID: 1,
})
if err != nil {
return benchStats{}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/apex-loadtest/main.go` around lines 125 - 130, The json.Marshal call
currently discards errors when building the rpcRequest (body, _ :=
json.Marshal(rpcRequest{...})), which can hide future issues; change this to
capture and handle the error from json.Marshal (e.g., body, err :=
json.Marshal(rpcRequest{Jsonrpc:"2.0", Method: method, Params: params, ID: 1})
and then check err), and return or log a clear error including context
(method/params/rpcRequest) instead of proceeding with a nil/invalid body so
callers of the code using the body variable get a proper failure path.

Comment on lines +368 to +376
func fetchCurrentHeight(addr string) (uint64, error) {
req, err := http.NewRequestWithContext(context.Background(), http.MethodGet, "http://"+addr+"/health", nil)
if err != nil {
return 0, err
}
resp, err := http.DefaultClient.Do(req)
if err != nil {
return 0, err
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Health check has no timeout, could hang indefinitely.

Using context.Background() and http.DefaultClient provides no timeout. If the server is unresponsive, fetchCurrentHeight will block forever at startup.

Proposed fix: add context timeout
 func fetchCurrentHeight(addr string) (uint64, error) {
-	req, err := http.NewRequestWithContext(context.Background(), http.MethodGet, "http://"+addr+"/health", nil)
+	ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
+	defer cancel()
+	req, err := http.NewRequestWithContext(ctx, http.MethodGet, "http://"+addr+"/health", nil)
 	if err != nil {
 		return 0, err
 	}
-	resp, err := http.DefaultClient.Do(req)
+	client := &http.Client{Timeout: 10 * time.Second}
+	resp, err := client.Do(req)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func fetchCurrentHeight(addr string) (uint64, error) {
req, err := http.NewRequestWithContext(context.Background(), http.MethodGet, "http://"+addr+"/health", nil)
if err != nil {
return 0, err
}
resp, err := http.DefaultClient.Do(req)
if err != nil {
return 0, err
}
func fetchCurrentHeight(addr string) (uint64, error) {
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
req, err := http.NewRequestWithContext(ctx, http.MethodGet, "http://"+addr+"/health", nil)
if err != nil {
return 0, err
}
client := &http.Client{Timeout: 10 * time.Second}
resp, err := client.Do(req)
if err != nil {
return 0, err
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/apex-loadtest/main.go` around lines 368 - 376, fetchCurrentHeight
currently uses context.Background() and http.DefaultClient which can block
indefinitely; change fetchCurrentHeight to create a cancellable context with a
timeout (e.g. ctx, cancel := context.WithTimeout(context.Background(),
5*time.Second); defer cancel()) and use that ctx when building the request
(NewRequestWithContext) so the call times out, and ensure you close resp.Body
after a successful response; update references inside fetchCurrentHeight
accordingly.

Comment on lines 226 to 231
envelope := map[string]any{
"header": hdr,
"commit": map[string]any{
"height": fmt.Sprintf("%d", hdr.Height),
"height": strconv.FormatInt(hdr.GetHeight(), 10),
"block_id": blockID,
},
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use the same raw-header serializer in both fetch paths.

pkg/backfill/db/source.go now builds a minimal header envelope explicitly, but this path still passes the full protobuf Header and BlockID messages straight to json.Marshal. That makes types.Header.RawHeader depend on which fetch path produced it again, which undermines the API-contract work in this PR. Extract the minimal serializer into a shared helper and use it here as well.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/fetch/celestia_app.go` around lines 226 - 231, The full protobuf Header
and BlockID are being json.Marshal'd here which makes types.Header.RawHeader
vary by fetch path; extract the minimal-header envelope serialization used in
pkg/backfill/db/source.go into a shared helper (e.g.,
MarshalMinimalHeaderEnvelope(Header, BlockID) or BuildMinimalHeaderEnvelope)
that returns the same minimal envelope bytes/structure, then replace the current
inline envelope construction in celestia_app.go (the envelope using "header":
hdr and "commit": {"height":..., "block_id": blockID}) with a call to that
shared helper so types.Header.RawHeader is produced identically by both fetch
paths.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 659911e722

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

}

networkHeight = nextNetworkHeight
lastHeight = hdr.Height

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid dereferencing header after closed subscription

When the subscription channel closes during context cancellation, handleHeader returns (networkHeight, nil) for ok == false, but Run still executes lastHeight = hdr.Height and processed++. In that path hdr is nil (closed channel zero value), so this panics during a normal shutdown race where the channel close is selected before ctx.Done(). Return or skip state updates when ok is false before using hdr.

Useful? React with 👍 / 👎.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
cmd/apex-loadtest/main.go (1)

176-186: ⚠️ Potential issue | 🟡 Minor

Error counting may inflate error count beyond total requests.

The error count is incremented separately for io.Copy failure (line 177), resp.Body.Close failure (line 180), and non-200 status (line 185). A single request could increment errCount multiple times while only incrementing total once, making errors potentially exceed total in the stats output.

Consider tracking only one error per request or clarifying what "errors" represents in the output.

Proposed fix to count at most one error per request
-			if _, copyErr := io.Copy(io.Discard, resp.Body); copyErr != nil {
-				errCount.Add(1)
-			}
-			if closeErr := resp.Body.Close(); closeErr != nil {
-				errCount.Add(1)
-			}
-
-			total.Add(1)
-			if resp.StatusCode != 200 {
-				errCount.Add(1)
-			}
+			_, copyErr := io.Copy(io.Discard, resp.Body)
+			closeErr := resp.Body.Close()
+
+			total.Add(1)
+			if resp.StatusCode != 200 || copyErr != nil || closeErr != nil {
+				errCount.Add(1)
+			}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/apex-loadtest/main.go` around lines 176 - 186, The request loop currently
increments errCount multiple times for a single request (on io.Copy error,
resp.Body.Close error, and non-200 status). Change this to track a
single-per-request error flag (e.g., hadErr) inside the loop: set hadErr=true on
any of the failures (io.Copy, resp.Body.Close, or non-200 resp.StatusCode) and
after handling the response increment errCount once if hadErr is true; keep
total.Add(1) unchanged. Update references to errCount, total, io.Copy(resp.Body,
...), resp.Body.Close(), and resp.StatusCode accordingly so no request can add
more than one error to errCount.
🧹 Nitpick comments (1)
cmd/apex-loadtest/main.go (1)

16-17: Consider using spf13/cobra and rs/zerolog for consistency.

This CLI uses the standard flag package and fmt for output. Based on learnings, CLI commands under cmd/apex/**/*.go should use spf13/cobra for command implementation. Additionally, per coding guidelines, rs/zerolog should be used for logging instead of fmt.Printf/fmt.Fprintf.

For a load-testing tool, this may be acceptable as a standalone utility, but consider aligning with the repo's conventions if this tool will be maintained long-term.

Also applies to: 29-30

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/apex-loadtest/main.go` around lines 16 - 17, The CLI currently uses the
standard flag and fmt in package main (main function) — replace that with a
spf13/cobra-based command (create a rootCmd and move flag parsing and run logic
into its RunE) and swap all fmt.Printf/fmt.Fprintf usage for an rs/zerolog
Logger (inject or create a zerolog.Logger in main and use logger.Info()/Error()
etc.); update the imports by removing "flag" and "fmt" and adding
"github.com/spf13/cobra" and "github.com/rs/zerolog" (and
"github.com/rs/zerolog/log" if you prefer the global logger), and ensure you
call rootCmd.Execute() from main to run the command.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@cmd/apex-loadtest/main.go`:
- Around line 176-186: The request loop currently increments errCount multiple
times for a single request (on io.Copy error, resp.Body.Close error, and non-200
status). Change this to track a single-per-request error flag (e.g., hadErr)
inside the loop: set hadErr=true on any of the failures (io.Copy,
resp.Body.Close, or non-200 resp.StatusCode) and after handling the response
increment errCount once if hadErr is true; keep total.Add(1) unchanged. Update
references to errCount, total, io.Copy(resp.Body, ...), resp.Body.Close(), and
resp.StatusCode accordingly so no request can add more than one error to
errCount.

---

Nitpick comments:
In `@cmd/apex-loadtest/main.go`:
- Around line 16-17: The CLI currently uses the standard flag and fmt in package
main (main function) — replace that with a spf13/cobra-based command (create a
rootCmd and move flag parsing and run logic into its RunE) and swap all
fmt.Printf/fmt.Fprintf usage for an rs/zerolog Logger (inject or create a
zerolog.Logger in main and use logger.Info()/Error() etc.); update the imports
by removing "flag" and "fmt" and adding "github.com/spf13/cobra" and
"github.com/rs/zerolog" (and "github.com/rs/zerolog/log" if you prefer the
global logger), and ensure you call rootCmd.Execute() from main to run the
command.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 91e9b522-6e1d-4b52-b95a-c34db3923adb

📥 Commits

Reviewing files that changed from the base of the PR and between 8469aa7 and 659911e.

📒 Files selected for processing (4)
  • .golangci.yml
  • cmd/apex-loadtest/main.go
  • cmd/apex/client.go
  • config/config.go
✅ Files skipped from review due to trivial changes (1)
  • config/config.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • .golangci.yml

@tac0turtle tac0turtle merged commit 925677d into main Mar 13, 2026
4 checks passed
@tac0turtle tac0turtle deleted the marko/cleanup branch March 13, 2026 10:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant