Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThreads conditional OpenTelemetry tracing through async task handlers, HTTP middleware, operator handlers, and the event reconciler; passes config into constructors where needed, updates tests to assert tracing behavior, and promotes OTEL modules to direct go.mod requirements. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant HTTP as "HTTP Server"
participant Tracer as "OpenTelemetry"
participant Handler as "App Handler"
participant Operator as "TenantOperator"
participant DB
Client->>HTTP: HTTP request (e.g., GET /healthz)
HTTP->>Tracer: start server span (TracingMiddleware)
HTTP->>Handler: call handler with traced context
Handler->>Operator: invoke operator handler
Operator->>Tracer: start operator span
Operator->>DB: query/update tenant data
DB-->>Operator: DB result
Operator->>Tracer: end operator span
Handler-->>HTTP: response
HTTP->>Tracer: end server span
HTTP-->>Client: HTTP response
sequenceDiagram
participant Asynq as "Asynq Worker"
participant Tracer as "OpenTelemetry"
participant TaskHandler as "Task Handler"
participant DB
Asynq->>Asynq: Receive task
Asynq->>TaskHandler: call wrapped handler(ctx, task)
TaskHandler->>Tracer: start task span
TaskHandler->>DB: perform task DB operations
DB-->>TaskHandler: DB result
TaskHandler->>Tracer: end task span
TaskHandler-->>Asynq: task completed
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (4)
internal/operator/operator.go (1)
87-126: Consider adding nil validation for thecfgparameter.The constructor validates
db,operatorTarget.Client, andclientsFactorybut does not validatecfg. Ifcfgis nil, thetrace()method at line 499 will panic when accessingo.cfg.Telemetry.Traces.Enabled.🛡️ Proposed fix to add cfg validation
func NewTenantOperator( db *multitenancy.DB, cfg *config.Config, operatorTarget orbital.TargetOperator, clientsFactory clients.Factory, tenantManager manager.Tenant, groupManager *manager.GroupManager, ) (*TenantOperator, error) { if db == nil { return nil, oops.Errorf("db is nil") } + + if cfg == nil { + return nil, oops.Errorf("cfg is nil") + } if operatorTarget.Client == nil {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/operator/operator.go` around lines 87 - 126, Add a nil-check for the cfg parameter inside NewTenantOperator and return a descriptive error if nil; specifically, verify cfg != nil before constructing the TenantOperator so later calls like trace() (which accesses o.cfg.Telemetry.Traces.Enabled) cannot panic. Update NewTenantOperator to validate cfg and fail fast with an oops.Errorf("config is nil") (or similar) so TenantOperator and its trace() method are only created when cfg is present.internal/middleware/tracing.go (1)
21-23: Consider span name normalization for better trace aggregation.Using
r.URL.Pathdirectly as the span name may result in high-cardinality spans when paths contain dynamic segments (e.g.,/tenants/abc123/keys/xyz789). This can make trace aggregation and analysis more difficult.For a future enhancement, consider normalizing paths to route patterns (e.g.,
/tenants/{tenantId}/keys/{keyId}), potentially using the OpenAPI spec already loaded in this codebase.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/middleware/tracing.go` around lines 21 - 23, Replace the raw request path used as the span name (currently passed as r.URL.Path into tracer.Start inside the anonymous http.HandlerFunc) with a normalized route pattern; implement a small helper (e.g., normalizeSpanName or use the existing OpenAPI route matcher) to map dynamic paths like /tenants/abc123/keys/xyz789 to templates such as /tenants/{tenantId}/keys/{keyId}, then call tracer.Start(r.Context(), normalizedName) and keep defer span.End() as-is so spans aggregate by route instead of high-cardinality IDs.internal/async/middlewares.go (2)
14-14: Minor inconsistency: config passed by value instead of pointer.This middleware accepts
config.Configby value, while the HTTP tracing middleware atinternal/middleware/tracing.go:11accepts*config.Config(pointer). For consistency and to avoid copying the entire config struct, consider using a pointer.♻️ Proposed fix for consistency
-func TracingMiddleware(cfg config.Config) func(next asynq.HandlerFunc) asynq.HandlerFunc { +func TracingMiddleware(cfg *config.Config) func(next asynq.HandlerFunc) asynq.HandlerFunc {Note: This would also require updating the call site in
internal/async/async.go.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/async/middlewares.go` at line 14, The TracingMiddleware function currently takes config.Config by value; change its signature to accept *config.Config (i.e., func TracingMiddleware(cfg *config.Config) func(next asynq.HandlerFunc) asynq.HandlerFunc) to avoid copying and match the HTTP tracing middleware's pointer usage, then update the call site(s) (e.g., where TracingMiddleware is invoked in async.go) to pass a pointer to the shared config instance; ensure any nil checks or usages inside TracingMiddleware are updated accordingly.
21-29: Consider moving tracer initialization outside the handler for efficiency.The tracer is currently created on every task invocation (line 23). While
otel.Tracer()caches tracers internally so the impact is minimal, moving the tracer creation outside the innermost function (similar to the HTTP middleware pattern) would be more efficient and consistent.♻️ Proposed optimization
+ tracer := otel.Tracer(cfg.Application.Name) + return func(next asynq.HandlerFunc) asynq.HandlerFunc { return func(ctx context.Context, task *asynq.Task) error { - tracer := otel.Tracer(cfg.Application.Name) ctx, span := tracer.Start(ctx, task.Type()) defer span.End() return next.ProcessTask(ctx, task) } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/async/middlewares.go` around lines 21 - 29, Move the otel.Tracer creation out of the per-task inner handler to avoid recreating it on every invocation: in the outer closure (the function that takes next asynq.HandlerFunc and returns asynq.HandlerFunc) call tracer := otel.Tracer(cfg.Application.Name) once, then in the returned inner function continue to call ctx, span := tracer.Start(ctx, task.Type()) and defer span.End(); keep the call to next.ProcessTask(ctx, task) unchanged. Ensure you reference the existing asynq.HandlerFunc closure, the tracer variable, tracer.Start(ctx, task.Type()), and next.ProcessTask so the change is localized and behavior is preserved.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@internal/async/middlewares.go`:
- Line 14: The TracingMiddleware function currently takes config.Config by
value; change its signature to accept *config.Config (i.e., func
TracingMiddleware(cfg *config.Config) func(next asynq.HandlerFunc)
asynq.HandlerFunc) to avoid copying and match the HTTP tracing middleware's
pointer usage, then update the call site(s) (e.g., where TracingMiddleware is
invoked in async.go) to pass a pointer to the shared config instance; ensure any
nil checks or usages inside TracingMiddleware are updated accordingly.
- Around line 21-29: Move the otel.Tracer creation out of the per-task inner
handler to avoid recreating it on every invocation: in the outer closure (the
function that takes next asynq.HandlerFunc and returns asynq.HandlerFunc) call
tracer := otel.Tracer(cfg.Application.Name) once, then in the returned inner
function continue to call ctx, span := tracer.Start(ctx, task.Type()) and defer
span.End(); keep the call to next.ProcessTask(ctx, task) unchanged. Ensure you
reference the existing asynq.HandlerFunc closure, the tracer variable,
tracer.Start(ctx, task.Type()), and next.ProcessTask so the change is localized
and behavior is preserved.
In `@internal/middleware/tracing.go`:
- Around line 21-23: Replace the raw request path used as the span name
(currently passed as r.URL.Path into tracer.Start inside the anonymous
http.HandlerFunc) with a normalized route pattern; implement a small helper
(e.g., normalizeSpanName or use the existing OpenAPI route matcher) to map
dynamic paths like /tenants/abc123/keys/xyz789 to templates such as
/tenants/{tenantId}/keys/{keyId}, then call tracer.Start(r.Context(),
normalizedName) and keep defer span.End() as-is so spans aggregate by route
instead of high-cardinality IDs.
In `@internal/operator/operator.go`:
- Around line 87-126: Add a nil-check for the cfg parameter inside
NewTenantOperator and return a descriptive error if nil; specifically, verify
cfg != nil before constructing the TenantOperator so later calls like trace()
(which accesses o.cfg.Telemetry.Traces.Enabled) cannot panic. Update
NewTenantOperator to validate cfg and fail fast with an oops.Errorf("config is
nil") (or similar) so TenantOperator and its trace() method are only created
when cfg is present.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 5693f835-8bb0-4afb-9701-151f31d6f2a0
📒 Files selected for processing (8)
cmd/tenant-manager/main.gointernal/async/async.gointernal/async/middlewares.gointernal/daemon/server.gointernal/event-processor/reconciler.gointernal/middleware/tracing.gointernal/operator/operator.gointernal/operator/operator_test.go
| func getMiddlewares(cfg conf.Config) []Middleware { | ||
| middlewares := []Middleware{ | ||
| TracingMiddleware(cfg), | ||
| } | ||
|
|
||
| return middlewares | ||
| } |
There was a problem hiding this comment.
I like this approach :) We should in the future also add a Logging MW and a Panic one
3ffc913
e90d652 to
3ffc913
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (4)
internal/event-processor/reconciler.go (1)
276-290: Record errors on the span for better observability.When the wrapped function returns an error, the span should capture it. Currently, failed operations will appear as successful spans in tracing backends, making it harder to identify and debug issues.
📊 Proposed fix to record errors on span
import ( ... + "go.opentelemetry.io/otel/codes" "go.opentelemetry.io/otel/trace" ... ) func (c *CryptoReconciler) trace( ctx context.Context, job orbital.Job, actionName string, f func(ctx context.Context, job orbital.Job) error, ) error { if c.tracer == nil { return f(ctx, job) } ctx, span := (*c.tracer).Start(ctx, actionName+":"+job.Type) defer span.End() - return f(ctx, job) + err := f(ctx, job) + if err != nil { + span.RecordError(err) + span.SetStatus(codes.Error, err.Error()) + } + return err }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/event-processor/reconciler.go` around lines 276 - 290, The trace method on CryptoReconciler should record and mark spans as errored when the wrapped function returns an error: after calling f(ctx, job) inside trace, check the returned error and, if non-nil, call the span's error recording and status-setting API (e.g., span.RecordError(err) and span.SetStatus(codes.Error, err.Error()) or the equivalent on your tracer), then return the error; keep the existing defer span.End(). Update the function named trace in CryptoReconciler to perform this error-check/recording using the tracer span obtained from (*c.tracer).Start.internal/middleware/tracing.go (1)
23-23: Use route patterns instead of raw paths for span names to avoid high cardinality.Routes like
/keys/{keyID}and/groups/{groupID}will generate high cardinality spans when usingr.URL.Pathdirectly (e.g.,/keys/abc-123and/keys/xyz-789become separate spans). This overwhelms tracing backends and breaks aggregation.Since you're on Go 1.25.6, use
http.RouteContext(r.Context()).Routeto get the matched pattern:ctx, span := tracer.Start(r.Context(), http.RouteContext(r.Context()).Route, trace.WithSpanKind(trace.SpanKindServer))This works with your custom
ServeMuxsince it delegates to the standardhttp.ServeMux, which populates the route pattern in the context.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/middleware/tracing.go` at line 23, The span name currently uses the raw request path (r.URL.Path) in the tracer.Start call which produces high-cardinality spans; replace that with the matched route pattern obtained from the request context (use http.RouteContext(r.Context()).Route) so span names use route templates (e.g., /keys/{keyID}) instead of concrete paths—update the tracer.Start invocation in tracing.go where ctx, span := tracer.Start(r.Context(), r.URL.Path, ...) to use the route pattern fetched from the context and keep trace.WithSpanKind(trace.SpanKindServer) unchanged.internal/async/middlewares_test.go (1)
21-29: Tighten this test around the actual middleware contract.Right now it only proves that an enabled wrapper can finish a span. It would still pass if the middleware stopped calling
next, and it does not lock down the disabled path from the feature flag.✅ Suggested additions
-type MockTaskHandler struct{} +type MockTaskHandler struct { + called int +} func (h *MockTaskHandler) ProcessTask(ctx context.Context, _ *asynq.Task) error { + h.called++ log.Info(ctx, "Processing mock task") return nil } @@ err := wrappedProcessTask(t.Context(), asynq.NewTask(handler.TaskType(), nil)) assert.NoError(t, err) + assert.Equal(t, 1, handler.called)Please also add a second test with
Telemetry.Traces.Enabled = falseand assert that no spans are recorded.Also applies to: 32-70
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/async/middlewares_test.go` around lines 21 - 29, Tighten the test by (1) asserting the middleware actually calls the next handler: use MockTaskHandler (ProcessTask / TaskType) with a side-effect (e.g., set a boolean or increment a counter) and assert that side-effect occurred after invoking the middleware so the test fails if next is not called, and (2) add a second test that sets Telemetry.Traces.Enabled = false and asserts no spans are recorded by your test tracer (verify span count is zero) while also asserting the MockTaskHandler still ran; reference MockTaskHandler.ProcessTask and Telemetry.Traces.Enabled to locate the middleware usage and tracer assertions.internal/async/async.go (1)
139-140: Reverse this loop ifa.middlewaresis meant to express execution order.As written, the last middleware in the slice becomes the outermost wrapper. That is invisible with only
TracingMiddleware, but it will flip the order once logging/panic middleware are added.♻️ Suggested change
- for _, mw := range a.middlewares { - h = mw(h) - } + for i := len(a.middlewares) - 1; i >= 0; i-- { + h = a.middlewares[i](h) + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/async/async.go` around lines 139 - 140, The middleware application loop currently iterates forward so the last middleware becomes the outermost wrapper; reverse the loop over a.middlewares so the slice order represents execution order. In the function that builds the handler (the loop using a.middlewares and variable h), iterate from len(a.middlewares)-1 down to 0 and apply each middleware as h = a.middlewares[i](h) so the first slice element is the outermost and middleware execute in the declared order.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@internal/async/async.go`:
- Around line 139-140: The middleware application loop currently iterates
forward so the last middleware becomes the outermost wrapper; reverse the loop
over a.middlewares so the slice order represents execution order. In the
function that builds the handler (the loop using a.middlewares and variable h),
iterate from len(a.middlewares)-1 down to 0 and apply each middleware as h =
a.middlewares[i](h) so the first slice element is the outermost and middleware
execute in the declared order.
In `@internal/async/middlewares_test.go`:
- Around line 21-29: Tighten the test by (1) asserting the middleware actually
calls the next handler: use MockTaskHandler (ProcessTask / TaskType) with a
side-effect (e.g., set a boolean or increment a counter) and assert that
side-effect occurred after invoking the middleware so the test fails if next is
not called, and (2) add a second test that sets Telemetry.Traces.Enabled = false
and asserts no spans are recorded by your test tracer (verify span count is
zero) while also asserting the MockTaskHandler still ran; reference
MockTaskHandler.ProcessTask and Telemetry.Traces.Enabled to locate the
middleware usage and tracer assertions.
In `@internal/event-processor/reconciler.go`:
- Around line 276-290: The trace method on CryptoReconciler should record and
mark spans as errored when the wrapped function returns an error: after calling
f(ctx, job) inside trace, check the returned error and, if non-nil, call the
span's error recording and status-setting API (e.g., span.RecordError(err) and
span.SetStatus(codes.Error, err.Error()) or the equivalent on your tracer), then
return the error; keep the existing defer span.End(). Update the function named
trace in CryptoReconciler to perform this error-check/recording using the tracer
span obtained from (*c.tracer).Start.
In `@internal/middleware/tracing.go`:
- Line 23: The span name currently uses the raw request path (r.URL.Path) in the
tracer.Start call which produces high-cardinality spans; replace that with the
matched route pattern obtained from the request context (use
http.RouteContext(r.Context()).Route) so span names use route templates (e.g.,
/keys/{keyID}) instead of concrete paths—update the tracer.Start invocation in
tracing.go where ctx, span := tracer.Start(r.Context(), r.URL.Path, ...) to use
the route pattern fetched from the context and keep
trace.WithSpanKind(trace.SpanKindServer) unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b4363495-ddd1-4016-8a17-881fab62a84e
📒 Files selected for processing (12)
cmd/tenant-manager/main.gogo.modinternal/async/async.gointernal/async/middlewares.gointernal/async/middlewares_test.gointernal/daemon/server.gointernal/event-processor/reconciler.gointernal/middleware/tracing.gointernal/middleware/tracing_test.gointernal/operator/export_test.gointernal/operator/operator.gointernal/operator/operator_test.go
✅ Files skipped from review due to trivial changes (1)
- go.mod
🚧 Files skipped from review as they are similar to previous changes (3)
- internal/daemon/server.go
- internal/async/middlewares.go
- internal/operator/operator.go
3710942 to
d828b44
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (4)
internal/async/middlewares_test.go (1)
32-70: Add a disabled-tracing test case for the conditional branchLine 32-Line 70 validates only
Telemetry.Traces.Enabled=true. Sinceinternal/async/middlewares.gohas a disabled early-return branch, add one test forEnabled=falseand assert no span is recorded.Proposed test addition
func TestTracingMiddleware(t *testing.T) { @@ assert.Equal(t, trace.SpanKindInternal, spans[0].SpanKind()) } + +func TestTracingMiddleware_Disabled(t *testing.T) { + cfgWithoutTracing := &config.Config{ + BaseConfig: commoncfg.BaseConfig{ + Application: commoncfg.Application{Name: "test-app"}, + Telemetry: commoncfg.Telemetry{ + Traces: commoncfg.Trace{Enabled: false}, + }, + }, + } + + recorder := tracetest.NewSpanRecorder() + tp := sdktrace.NewTracerProvider() + tp.RegisterSpanProcessor(recorder) + + original := otel.GetTracerProvider() + otel.SetTracerProvider(tp) + t.Cleanup(func() { + otel.SetTracerProvider(original) + _ = tp.Shutdown(t.Context()) + }) + + mw := async.TracingMiddleware(*cfgWithoutTracing) + handler := &MockTaskHandler{} + err := mw(handler.ProcessTask)(t.Context(), asynq.NewTask(handler.TaskType(), nil)) + assert.NoError(t, err) + assert.Len(t, recorder.Ended(), 0) +}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/async/middlewares_test.go` around lines 32 - 70, Add a new test that covers the disabled-tracing branch: create cfgWithTracing where Telemetry.Traces.Enabled is false, set up a tracetest.NewSpanRecorder and a TracerProvider (register the recorder), swap it into otel like in TestTracingMiddleware, call mw := async.TracingMiddleware(*cfgWithTracing) and wrap handler.ProcessTask, invoke the wrapped function, then assert no spans were recorded (recorder.Ended() length is 0) and clean up by restoring the original tracer provider and shutting down tp; reference TracingMiddleware, MockTaskHandler, and recorder to locate where to mirror the existing enabled test.internal/event-processor/reconciler.go (1)
84-84: Consider storing the interface value directly instead of a pointer to interface.Storing
traceras*trace.Tracer(pointer to interface) is functional but non-idiomatic in Go. Interface values already contain a pointer internally, sotrace.Tracer(the interface itself) would suffice. You can use a sentinel or checktracer != nilafter initialization to determine if tracing is enabled.♻️ Suggested refactor
type CryptoReconciler struct { repo repo.Repo manager *orbital.Manager targets map[string]struct{} initiators []orbital.Initiator svcRegistry *cmkpluginregistry.Registry jobHandlerMap map[JobType]JobHandler - tracer *trace.Tracer + tracer trace.Tracer }Then in
getTracer:-func getTracer(cfg *config.Config) *trace.Tracer { +func getTracer(cfg *config.Config) trace.Tracer { if !cfg.Telemetry.Traces.Enabled { return nil } - - tracer := otel.Tracer(cfg.Application.Name) - - return &tracer + return otel.Tracer(cfg.Application.Name) }And in
trace:func (c *CryptoReconciler) trace(...) error { if c.tracer == nil { return f(ctx, job) } - ctx, span := (*c.tracer).Start(ctx, actionName+":"+job.Type) + ctx, span := c.tracer.Start(ctx, actionName+":"+job.Type)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/event-processor/reconciler.go` at line 84, The struct currently stores tracer as a pointer to an interface (tracer *trace.Tracer); change the field to the interface type (tracer trace.Tracer) and update all uses in getTracer and trace to treat it as an interface (check tracer != nil instead of dereferencing, assign the interface directly rather than taking its address, and remove any * or & operators around tracer). Ensure initialization, nil checks, and return types in getTracer and trace are adjusted accordingly so you no longer use a pointer-to-interface.internal/event-processor/reconciler_test.go (2)
1013-1013: Minor: Test name spelling inconsistency.The test name uses "JobCancelled" (double 'l') but the function and span name use "JobCanceled" (single 'l'). Consider aligning for consistency.
- t.Run("should add trace with correct attributes for JobCancelled", func(t *testing.T) { + t.Run("should add trace with correct attributes for JobCanceled", func(t *testing.T) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/event-processor/reconciler_test.go` at line 1013, The test name string "should add trace with correct attributes for JobCancelled" is inconsistent with the code that uses "JobCanceled" (single 'l') for the function/span; update the test name to use "JobCanceled" to match the existing function/span names (or alternatively rename the function/span to "JobCancelled" if you prefer British spelling) so the t.Run description, the handler under test (e.g., the JobCanceled handler/function), and the span name are all identical; ensure any assertions that check span.Name or related attributes use the same spelling.
657-660: Attribute index assertions may be fragile.Asserting attributes by positional index (
span.Attributes()[0],span.Attributes()[1]) assumes a specific ordering that the OpenTelemetry SDK doesn't guarantee. If the implementation changes the order ofSetAttributescalls, or if OTel changes internal ordering, these tests will fail unexpectedly.Consider using a helper to find attributes by key:
♻️ Suggested approach
// Helper function to find attribute by key func findAttribute(attrs []attribute.KeyValue, key string) (attribute.KeyValue, bool) { for _, attr := range attrs { if string(attr.Key) == key { return attr, true } } return attribute.KeyValue{}, false } // Then in tests: jobIDAttr, found := findAttribute(span.Attributes(), "job.id") assert.True(t, found) assert.Equal(t, j.ID.String(), jobIDAttr.Value.AsString()) jobTypeAttr, found := findAttribute(span.Attributes(), "job.type") assert.True(t, found) assert.Equal(t, jobType, jobTypeAttr.Value.AsString())🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/event-processor/reconciler_test.go` around lines 657 - 660, The test is brittle because it asserts span attributes by index; change the assertions to search attributes by key instead. Add a small helper like findAttribute(attrs []attribute.KeyValue, key string) (attribute.KeyValue, bool) and use it in reconciler_test.go to locate "job.id" and "job.type" in span.Attributes() (e.g., jobIDAttr, found := findAttribute(span.Attributes(), "job.id")), assert found is true, then compare j.ID.String() to jobIDAttr.Value.AsString(), and similarly for "job.type" and jobType; update the existing indexed assertions to use this key-based lookup.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@internal/async/middlewares_test.go`:
- Around line 32-70: Add a new test that covers the disabled-tracing branch:
create cfgWithTracing where Telemetry.Traces.Enabled is false, set up a
tracetest.NewSpanRecorder and a TracerProvider (register the recorder), swap it
into otel like in TestTracingMiddleware, call mw :=
async.TracingMiddleware(*cfgWithTracing) and wrap handler.ProcessTask, invoke
the wrapped function, then assert no spans were recorded (recorder.Ended()
length is 0) and clean up by restoring the original tracer provider and shutting
down tp; reference TracingMiddleware, MockTaskHandler, and recorder to locate
where to mirror the existing enabled test.
In `@internal/event-processor/reconciler_test.go`:
- Line 1013: The test name string "should add trace with correct attributes for
JobCancelled" is inconsistent with the code that uses "JobCanceled" (single 'l')
for the function/span; update the test name to use "JobCanceled" to match the
existing function/span names (or alternatively rename the function/span to
"JobCancelled" if you prefer British spelling) so the t.Run description, the
handler under test (e.g., the JobCanceled handler/function), and the span name
are all identical; ensure any assertions that check span.Name or related
attributes use the same spelling.
- Around line 657-660: The test is brittle because it asserts span attributes by
index; change the assertions to search attributes by key instead. Add a small
helper like findAttribute(attrs []attribute.KeyValue, key string)
(attribute.KeyValue, bool) and use it in reconciler_test.go to locate "job.id"
and "job.type" in span.Attributes() (e.g., jobIDAttr, found :=
findAttribute(span.Attributes(), "job.id")), assert found is true, then compare
j.ID.String() to jobIDAttr.Value.AsString(), and similarly for "job.type" and
jobType; update the existing indexed assertions to use this key-based lookup.
In `@internal/event-processor/reconciler.go`:
- Line 84: The struct currently stores tracer as a pointer to an interface
(tracer *trace.Tracer); change the field to the interface type (tracer
trace.Tracer) and update all uses in getTracer and trace to treat it as an
interface (check tracer != nil instead of dereferencing, assign the interface
directly rather than taking its address, and remove any * or & operators around
tracer). Ensure initialization, nil checks, and return types in getTracer and
trace are adjusted accordingly so you no longer use a pointer-to-interface.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 56ff7aa9-41ed-4167-a6c8-f0d879041c39
📒 Files selected for processing (14)
cmd/tenant-manager/main.gogo.modinternal/async/async.gointernal/async/middlewares.gointernal/async/middlewares_test.gointernal/daemon/server.gointernal/event-processor/export_test.gointernal/event-processor/reconciler.gointernal/event-processor/reconciler_test.gointernal/middleware/tracing.gointernal/middleware/tracing_test.gointernal/operator/export_test.gointernal/operator/operator.gointernal/operator/operator_test.go
✅ Files skipped from review due to trivial changes (3)
- internal/async/async.go
- go.mod
- internal/middleware/tracing_test.go
🚧 Files skipped from review as they are similar to previous changes (5)
- internal/middleware/tracing.go
- internal/operator/export_test.go
- internal/async/middlewares.go
- internal/operator/operator.go
- internal/operator/operator_test.go
d828b44 to
703c0aa
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
internal/operator/operator.go (1)
87-125:⚠️ Potential issue | 🟠 MajorReject
nilconfig inNewTenantOperator.
trace()dereferenceso.cfgduring handler registration, so the operator can currently be constructed successfully and then panic on startup if a caller passesnilhere.🐛 Proposed fix
func NewTenantOperator( db *multitenancy.DB, cfg *config.Config, operatorTarget orbital.TargetOperator, clientsFactory clients.Factory, tenantManager manager.Tenant, groupManager *manager.GroupManager, ) (*TenantOperator, error) { if db == nil { return nil, oops.Errorf("db is nil") } + if cfg == nil { + return nil, oops.Errorf("config is nil") + } if operatorTarget.Client == nil { return nil, oops.Errorf("operator target client is nil") }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/operator/operator.go` around lines 87 - 125, NewTenantOperator currently allows cfg to be nil which leads to a panic later when trace() dereferences o.cfg; update NewTenantOperator to validate cfg is non-nil and return an error if it is nil. Specifically, add a nil check for the cfg parameter at the start of NewTenantOperator (alongside existing db/operatorTarget/clientsFactory checks) and return an appropriate oops.Errorf message; this prevents constructing a TenantOperator with a nil cfg that would cause trace() or other methods to dereference o.cfg and panic.internal/event-processor/reconciler.go (1)
214-228:⚠️ Potential issue | 🟠 MajorCancellation paths are traced as success.
trace()only knows abouterror, but these call sites can stop processing without one.ConfirmJobcan returnorbital.CancelJobConfirmer(...)withnilerror, andresolveTaskscan convert an empty task list intoCancelTaskResolver(...)after the span has already ended. Those cases are cancellations from Orbital's point of view, but the span stays successful.Also applies to: 239-245, 278-302
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/async/async.go`:
- Around line 236-241: getMiddlewares enables TracingMiddleware(cfg) but the
middleware currently ends the span before calling next.ProcessTask so task
errors aren't reflected; update the TracingMiddleware implementation (the
middleware returned by TracingMiddleware) to call next.ProcessTask(ctx, task)
first, capture the returned error, then set the span status/attributes and
record the exception on non-nil error (e.g., set status to error, add
error.message / task metadata), and only then end the span so failures and
retries are visible in traces.
In `@internal/async/middlewares_test.go`:
- Around line 94-96: Replace the length-zero assertion with an emptiness check:
in the test where you call mw(handler.ProcessTask)(t.Context(),
asynq.NewTask(handler.TaskType(), nil)) and assert.NoError(t, err), change the
assert.Len(t, recorder.Ended(), 0) call to assert.Empty(t, recorder.Ended()) so
the testifylint empty checker is satisfied; keep the same t and recorder.Ended()
arguments and update only the assertion helper.
In `@internal/operator/operator_test.go`:
- Around line 1457-1495: The test constructs the operator with nil tenantManager
and groupManager but then exercises ACTION_ACTION_PROVISION_TENANT which
triggers o.tm.CreateTenant / o.gm.CreateDefaultGroups; fix by providing non-nil
test doubles or real managers to NewTenantOperator so
CreateTenant/CreateDefaultGroups are safe: replace the nil tenantManager and
groupManager arguments passed into operator.NewTenantOperator with appropriate
mocks or a seeded test tenantManager and groupManager (or a no-op
implementation) so RunOperator and the provisioning path invoked by
ACTION_ACTION_PROVISION_TENANT do not panic and behave deterministically.
In `@internal/operator/operator.go`:
- Around line 495-509: The trace wrapper currently ends spans regardless of
handler outcome; update TenantOperator.trace to inspect the
orbital.HandlerResponse after calling next and mark the span as error when the
handler signaled failure or scheduled a retry: after next(ctx, request,
response) check the response for failure (use whatever API the type exposes,
e.g., response.IsFailed()/response.Failed()/response.Status != success or
response.Error()/response.FailureReason()), and if failed call
span.RecordError(errOrNewErrorWithMessage) and span.SetStatus(codes.Error,
message) (import go.opentelemetry.io/otel/codes), optionally add
span.SetAttributes(semconv or a custom attribute like "handler.failed"=true)
before span.End() so failed outcomes produce error spans instead of successful
ones.
---
Outside diff comments:
In `@internal/operator/operator.go`:
- Around line 87-125: NewTenantOperator currently allows cfg to be nil which
leads to a panic later when trace() dereferences o.cfg; update NewTenantOperator
to validate cfg is non-nil and return an error if it is nil. Specifically, add a
nil check for the cfg parameter at the start of NewTenantOperator (alongside
existing db/operatorTarget/clientsFactory checks) and return an appropriate
oops.Errorf message; this prevents constructing a TenantOperator with a nil cfg
that would cause trace() or other methods to dereference o.cfg and panic.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 737e54f8-1555-4c3f-8101-3762fb7e58da
📒 Files selected for processing (14)
cmd/tenant-manager/main.gogo.modinternal/async/async.gointernal/async/middlewares.gointernal/async/middlewares_test.gointernal/daemon/server.gointernal/event-processor/export_test.gointernal/event-processor/reconciler.gointernal/event-processor/reconciler_test.gointernal/middleware/tracing.gointernal/middleware/tracing_test.gointernal/operator/export_test.gointernal/operator/operator.gointernal/operator/operator_test.go
✅ Files skipped from review due to trivial changes (3)
- internal/daemon/server.go
- internal/operator/export_test.go
- internal/middleware/tracing_test.go
🚧 Files skipped from review as they are similar to previous changes (3)
- cmd/tenant-manager/main.go
- internal/middleware/tracing.go
- internal/async/middlewares.go
| op, err := operator.NewTenantOperator(dbConn, cfg, target, clientFactory, nil, nil) | ||
| require.NoError(t, err) | ||
|
|
||
| validData, err := createValidTenantData(tenantID, RegionUSWest1, tenants[0]) | ||
| assert.NoError(t, err) | ||
|
|
||
| go func() { | ||
| err = op.RunOperator(t.Context()) | ||
| assert.NoError(t, err) | ||
| }() | ||
|
|
||
| taskReq := orbital.TaskRequest{ | ||
| TaskID: uuid.New(), | ||
| Type: tenantgrpc.ACTION_ACTION_PROVISION_TENANT.String(), | ||
| Data: validData, | ||
| } | ||
|
|
||
| recorder := tracetest.NewSpanRecorder() | ||
| tp := sdktrace.NewTracerProvider() | ||
| tp.RegisterSpanProcessor(recorder) | ||
|
|
||
| original := otel.GetTracerProvider() | ||
| otel.SetTracerProvider(tp) | ||
|
|
||
| t.Cleanup(func() { | ||
| otel.SetTracerProvider(original) | ||
| _ = tp.Shutdown(t.Context()) | ||
| }) | ||
|
|
||
| responder.NewRequest(taskReq) | ||
| taskResp := responder.NewResponse() | ||
| assert.Equal(t, taskReq.TaskID, taskResp.TaskID) | ||
| assert.Equal(t, string(orbital.TaskStatusProcessing), taskResp.Status) | ||
|
|
||
| spans := recorder.Ended() | ||
| assert.Len(t, spans, 1) | ||
| span := spans[0] | ||
| assert.Equal(t, tenantgrpc.ACTION_ACTION_PROVISION_TENANT.String(), span.Name()) | ||
| } |
There was a problem hiding this comment.
TestTenantOperatorTracing is provisioning with nil managers.
This test drives ACTION_ACTION_PROVISION_TENANT, but the operator is constructed with nil tenantManager and groupManager. The expected PROCESSING response suggests it is hitting the non-complete provisioning path, which is the same path that calls o.tm.CreateTenant(...) / o.gm.CreateDefaultGroups(...). That makes this test either panic-prone or dependent on very specific seeded DB state.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/operator/operator_test.go` around lines 1457 - 1495, The test
constructs the operator with nil tenantManager and groupManager but then
exercises ACTION_ACTION_PROVISION_TENANT which triggers o.tm.CreateTenant /
o.gm.CreateDefaultGroups; fix by providing non-nil test doubles or real managers
to NewTenantOperator so CreateTenant/CreateDefaultGroups are safe: replace the
nil tenantManager and groupManager arguments passed into
operator.NewTenantOperator with appropriate mocks or a seeded test tenantManager
and groupManager (or a no-op implementation) so RunOperator and the provisioning
path invoked by ACTION_ACTION_PROVISION_TENANT do not panic and behave
deterministically.
| func (o *TenantOperator) trace( | ||
| next orbital.HandlerFunc, | ||
| name string, | ||
| ) orbital.HandlerFunc { | ||
| if !o.cfg.Telemetry.Traces.Enabled { | ||
| return next | ||
| } | ||
|
|
||
| return func(ctx context.Context, request orbital.HandlerRequest, response *orbital.HandlerResponse) { | ||
| tracer := otel.Tracer(o.cfg.Application.Name) | ||
| ctx, span := tracer.Start(ctx, name) | ||
| defer span.End() | ||
|
|
||
| next(ctx, request, response) | ||
| } |
There was a problem hiding this comment.
Failed task outcomes still produce successful spans.
orbital.HandlerFunc does not return an error, and this wrapper never inspects response after next runs. Any path that calls response.Fail(...) or schedules a retry will therefore end the span as success, which makes the new traces misleading on the exact paths we need for debugging.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/operator/operator.go` around lines 495 - 509, The trace wrapper
currently ends spans regardless of handler outcome; update TenantOperator.trace
to inspect the orbital.HandlerResponse after calling next and mark the span as
error when the handler signaled failure or scheduled a retry: after next(ctx,
request, response) check the response for failure (use whatever API the type
exposes, e.g., response.IsFailed()/response.Failed()/response.Status != success
or response.Error()/response.FailureReason()), and if failed call
span.RecordError(errOrNewErrorWithMessage) and span.SetStatus(codes.Error,
message) (import go.opentelemetry.io/otel/codes), optionally add
span.SetAttributes(semconv or a custom attribute like "handler.failed"=true)
before span.End() so failed outcomes produce error spans instead of successful
ones.
703c0aa to
3e12bff
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (2)
internal/operator/operator.go (1)
87-126: Consider adding nil check forcfgparameter.The constructor validates
db,operatorTarget.Client, andclientsFactory, but notcfg. While the call site inmain.goensurescfgis non-nil, adding a defensive check would be consistent with the other parameter validations and prevent potential nil-pointer panics if called from other locations.🛡️ Proposed fix
func NewTenantOperator( db *multitenancy.DB, cfg *config.Config, operatorTarget orbital.TargetOperator, clientsFactory clients.Factory, tenantManager manager.Tenant, groupManager *manager.GroupManager, ) (*TenantOperator, error) { if db == nil { return nil, oops.Errorf("db is nil") } + + if cfg == nil { + return nil, oops.Errorf("cfg is nil") + } if operatorTarget.Client == nil {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/operator/operator.go` around lines 87 - 126, NewTenantOperator lacks a nil check for the cfg parameter which can cause a panic; add a check like `if cfg == nil { return nil, oops.Errorf("config is nil") }` (use the same error style) near the other parameter validations (e.g., after the db nil check) so NewTenantOperator returns an error when cfg is nil instead of dereferencing it later.internal/async/middlewares_test.go (1)
35-37: Consider usingWithSpanProcessoroption for cleaner initialization.Registering the span processor during provider creation is more idiomatic with the OTel SDK.
♻️ Proposed refactor
recorder := tracetest.NewSpanRecorder() - tp := sdktrace.NewTracerProvider() - tp.RegisterSpanProcessor(recorder) + tp := sdktrace.NewTracerProvider(sdktrace.WithSpanProcessor(recorder))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/async/middlewares_test.go` around lines 35 - 37, Replace the two-step tracer provider setup that calls sdktrace.NewTracerProvider() followed by tp.RegisterSpanProcessor(recorder) with the idiomatic single-call initialization using the WithSpanProcessor option: create the recorder via tracetest.NewSpanRecorder(), then pass it into sdktrace.NewTracerProvider(sdktrace.WithSpanProcessor(recorder)) so the span processor is registered at provider creation (references: tracetest.NewSpanRecorder, sdktrace.NewTracerProvider, RegisterSpanProcessor, sdktrace.WithSpanProcessor).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@internal/async/middlewares_test.go`:
- Around line 35-37: Replace the two-step tracer provider setup that calls
sdktrace.NewTracerProvider() followed by tp.RegisterSpanProcessor(recorder) with
the idiomatic single-call initialization using the WithSpanProcessor option:
create the recorder via tracetest.NewSpanRecorder(), then pass it into
sdktrace.NewTracerProvider(sdktrace.WithSpanProcessor(recorder)) so the span
processor is registered at provider creation (references:
tracetest.NewSpanRecorder, sdktrace.NewTracerProvider, RegisterSpanProcessor,
sdktrace.WithSpanProcessor).
In `@internal/operator/operator.go`:
- Around line 87-126: NewTenantOperator lacks a nil check for the cfg parameter
which can cause a panic; add a check like `if cfg == nil { return nil,
oops.Errorf("config is nil") }` (use the same error style) near the other
parameter validations (e.g., after the db nil check) so NewTenantOperator
returns an error when cfg is nil instead of dereferencing it later.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9278b76b-8275-4558-a00e-4e5eddaf99b1
📒 Files selected for processing (14)
cmd/tenant-manager/main.gogo.modinternal/async/async.gointernal/async/middlewares.gointernal/async/middlewares_test.gointernal/daemon/server.gointernal/event-processor/export_test.gointernal/event-processor/reconciler.gointernal/event-processor/reconciler_test.gointernal/middleware/tracing.gointernal/middleware/tracing_test.gointernal/operator/export_test.gointernal/operator/operator.gointernal/operator/operator_test.go
✅ Files skipped from review due to trivial changes (5)
- internal/daemon/server.go
- internal/middleware/tracing.go
- go.mod
- internal/middleware/tracing_test.go
- internal/event-processor/reconciler_test.go
🚧 Files skipped from review as they are similar to previous changes (5)
- internal/event-processor/export_test.go
- internal/operator/export_test.go
- internal/async/middlewares.go
- internal/async/async.go
- internal/operator/operator_test.go
c2a8d8f to
405a06e
Compare
a48750c to
73a2aca
Compare
73a2aca to
8a242da
Compare
8a242da to
137a347
Compare
Summary by CodeRabbit
New Features
Tests