Skip to content

refactor: migrate logger calls to scoped context patterns#3067

Merged
amikofalvy merged 11 commits intomainfrom
feat/logger-migration
Apr 8, 2026
Merged

refactor: migrate logger calls to scoped context patterns#3067
amikofalvy merged 11 commits intomainfrom
feat/logger-migration

Conversation

@amikofalvy
Copy link
Copy Markdown
Collaborator

Summary

  • Broad migration of logger calls across the codebase to use the scoped logger context introduced in feat: add scoped logger context via AsyncLocalStorage #3054
  • Remove repeated ambient fields (tenantId, projectId, agentId) that are now automatically provided by ALS middleware
  • Add runWithLogContext wrappers for operation-level context (triggerId, invocationId, sessionId, etc.)
  • Convert logger.info({}, 'msg') to logger.info('msg') string-only form
  • Apply class member .with() pattern where classes repeat constructor fields

Migration plan

See specs/2026-04-06-logger-scoped-context/MIGRATION-PLAN.md for the full phased plan.

Phases:

  1. Run domain — workflow steps, execution handler, agents, tools, streams
  2. Manage domain — route handlers
  3. Evals domain — services, workflow functions
  4. Middleware/infra
  5. agents-core — data access layer, dolt, utils
  6. agents-sdk — clients, components
  7. agents-work-apps — Slack, GitHub

Test plan

  • All test mocks updated with runWithLogContext + with (33 files)
  • pnpm check passes after each batch
  • No behavioral changes — only logger call patterns

🤖 Generated with Claude Code

@vercel
Copy link
Copy Markdown

vercel bot commented Apr 8, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agents-api Ready Ready Preview, Comment Apr 8, 2026 7:03am
agents-manage-ui Ready Ready Preview, Comment Apr 8, 2026 7:03am
1 Skipped Deployment
Project Deployment Actions Updated (UTC)
agents-docs Skipped Skipped Apr 8, 2026 7:03am

Request Review

@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Apr 8, 2026

🦋 Changeset detected

Latest commit: 511a805

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 10 packages
Name Type
@inkeep/agents-api Patch
@inkeep/agents-core Patch
@inkeep/agents-sdk Patch
@inkeep/agents-manage-ui Patch
@inkeep/agents-cli Patch
@inkeep/agents-work-apps Patch
@inkeep/ai-sdk-provider Patch
@inkeep/create-agents Patch
@inkeep/agents-email Patch
@inkeep/agents-mcp Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@pullfrog
Copy link
Copy Markdown
Contributor

pullfrog bot commented Apr 8, 2026

TL;DR — Migrates logger calls across the entire monorepo (agents-api, agents-core, agents-sdk, agents-work-apps) to use runWithLogContext scoped bindings, so that contextual identifiers like requestId, conversationId, agentId, and tenantId are automatically attached to every log line within a scope instead of being manually threaded into each call. Also inlines 4 constant modules from agents-core that were only used as string literals.

Key changes

  • Wrap execution entry points in runWithLogContextexecutionHandler, callLlmStep, executeToolStep, and executeScheduledTriggerStep each push shared identifiers into an async-local-storage scope at entry, eliminating redundant bindings from ~50 individual logger calls.
  • Strip ambient fields from 59 agents-api source files — Agent generation, tools, context resolution, run routes, manage routes, eval routes/services/workflows, and middleware all have tenantId/projectId/agentId removed from individual logger calls since the scoped context already carries them.
  • Migrate agents-core to string-only logger calls — 5 files (agentFull.ts, ref-middleware.ts, jwt-helpers.ts, composio-client.ts, tracer-factory.ts) get empty-object cleanup without ambient field removal, since these library functions may be called outside ALS scope.
  • Refactor EvaluationClient to use a class member loggerevaluationClient.ts replaces a module-level logger with this.logger = getLogger('EvalClient').with({ tenantId, projectId }), removing ~126 repeated tenantId/projectId pairs across every method.
  • Migrate agents-work-apps to string-only logger calls — 13 Slack and GitHub files get empty-object cleanup; conservative approach since work-apps have their own middleware stack.
  • Inline and delete 4 agents-core constant modulesrelation-types.ts, session-events.ts, tool-names.ts, and workflow.ts are deleted; their string literals are inlined at ~30 call sites in agents-api. The barrel export in constants/index.ts is also removed.
  • Fix global test logger mocksetup.ts corrects with()/child() to return the full mock logger object instead of the mock function, enabling chained calls like logger.with(bindings).info(...) to work in tests.
  • Normalize logger call patterns — Fixes pino anti-patterns (e.g. logger.info('msg:', data)logger.info({ tools }, 'msg')) and downgrades one noisy info to debug in context validation.
  • Fix userId incorrectly removed from AgentMcpManager — Restores userId to a logger call where it is not ambient (user-scoped credential context).
  • Remove unused destructured variables — Cleans up agentId, tenantId, projectId from destructuring in context.ts and validation.ts where they were only used in now-simplified logger data objects.
  • Update 38 test logger mocks — All vi.mock blocks gain with: vi.fn().mockReturnThis() and a runWithLogContext passthrough so tests pass under the new pattern.
  • Add changeset covering agents-api, agents-core, agents-sdk — Patch bump for all three packages.

Summary | 120 files | 11 commits | base: mainfeat/logger-migration


Scoped log context at execution entry points

Before: Every logger.info / logger.error call manually included { requestId, conversationId, agentId, ... } as the first argument — up to 6 fields repeated per call.
After: A single runWithLogContext({ requestId, conversationId, ... }, async () => { ... }) wraps the method body; interior calls only pass fields unique to that specific log line.

Four entry points are wrapped: executionHandler.handle, callLlmStep, executeToolStep, and executeScheduledTriggerStep. The large diffs in these files are almost entirely re-indentation from the wrapping — no behavioral changes.

How does runWithLogContext work?

It binds key-value pairs into an async-local-storage context. The logger's with method returns a child logger that inherits these bindings. Any logger.info(...) call within the async scope automatically includes the bound fields without the caller needing to repeat them.

executionHandler.ts · agentExecutionSteps.ts · scheduledTriggerSteps.ts


Ambient field removal across run domain agents, tools, context, and routes

Before: Each logger call site manually passed tenantId, projectId, agentId, or userId as structured data — often the only fields in the object.
After: These ambient fields are removed. Calls that had only ambient fields become string-only (logger.info('message')); calls with additional data keep only the non-ambient fields.

18 run-domain source files are updated. The three sub-patterns:

Pattern Example
Strip ambient, keep rest { tenantId, projectId, err }{ err }
Strip all → string-only { agentId } → no data object
Empty object cleanup { } → string-only

generate.ts · default-tools.ts · validation.ts


Ambient field removal across manage routes, evals, and middleware

Before: All 28 manage route files and 6 eval service/route files manually passed tenantId and projectId in every logger call — hundreds of redundant field inclusions.
After: Ambient fields are stripped from all manage and eval logger calls. Middleware files get empty-object-to-string conversion only, since they run before ALS context is established.

One deliberate exception: userId was restored to AgentMcpManager.ts where it provides user-scoped credential context that is not ambient.

Domain Files changed Primary change
Manage routes 28 Strip tenantId/projectId from all CRUD logger calls
Eval routes/services/workflows 7 Strip tenantId/projectId/agentId
Middleware 6 Convert logger.info({}, 'msg')logger.info('msg')
Run (fix) 1 Restore userId to AgentMcpManager

scheduledTriggers.ts · conversationEvaluation.ts · branchScopedDb.ts · AgentMcpManager.ts


agents-core conservative migration and constant inlining

Before: agents-core exported 4 constant modules (relation-types.ts, session-events.ts, tool-names.ts, workflow.ts) with string constants like DELEGATE_TOOL_PREFIX, SESSION_EVENT_TRANSFER, DURABLE_APPROVAL_ARTIFACT_TYPE. Logger calls in 5 data-access/utility files used logger.info({}, 'msg').
After: The 4 constant modules are deleted and their ~20 string literals are inlined at call sites in agents-api. The 5 utility files get empty-object-to-string logger cleanup only — no ambient field removal since library functions may be called outside ALS scope.

The constant inlining is intentional: these values were simple string literals with no variation or branching logic, and the indirection added import noise without providing abstraction value.

tool-wrapper.ts · relationTools.ts · BaseCompressor.ts · ToolApprovalUiBus.ts


EvaluationClient class member logger

Before: evaluationClient.ts used a module-level logger and passed { tenantId: this.tenantId, projectId: this.projectId } in every method — ~126 repeated field pairs across 63 logger calls.
After: The constructor creates this.logger = getLogger('EvalClient').with({ tenantId, projectId }). All method-level calls use this.logger and only include operation-specific fields.

This is the most impactful single-file cleanup in the PR. The file shrinks from ~527 additions to ~350 by eliminating the repetitive field threading.

evaluationClient.ts


agents-work-apps empty-object cleanup

Before: 13 Slack and GitHub files used logger.info({}, 'msg') throughout.
After: All converted to logger.info('msg'). No ambient field removal — work-apps have their own middleware and may not run within ALS scope.

Covers GitHub config, JWKS, OIDC, token exchange, webhooks; and Slack events, OAuth, workspaces, client, Nango, security, and socket-mode. Test setup also updated with the with mock.

events.ts · socket-mode.ts


Unused variable cleanup after migration

Before: context.ts destructured agentId and validation.ts destructured tenantId/projectId from executionContext — variables that were only consumed by logger data objects now simplified away.
After: The unused destructured variables are removed, eliminating lint warnings.

context.ts · validation.ts


Logger call normalization and log level correction

Before: Some files used the pino anti-pattern logger.info('message:', data) (concatenation) and one info-level call logged verbose context config on every request.
After: default-tools.ts corrects to logger.info({ tools }, 'message') (structured data). validation.ts downgrades one noisy call from info to debug.

default-tools.ts · validation.ts


Global test mock fix and 38 test file updates

Before: setup.ts defined with: vi.fn().mockReturnThis() which returned the mock function, not the mock logger object — breaking chained calls. Per-test mocks only exposed info, warn, error, debug.
After: setup.ts correctly returns the full mockLogger via mockLogger.with.mockReturnValue(mockLogger). All 38 test files add with: vi.fn().mockReturnThis() and export runWithLogContext: vi.fn((_bindings, fn) => fn()).

The setup.ts fix is the only behavioral test change — it resolves a latent bug where logger.with(bindings).info(...) would have thrown in tests. The per-test mock updates are mechanical.

setup.ts · Agent.test.ts

Pullfrog  | View workflow run | Triggered by Pullfrog | Using Claude Opus𝕏

Copy link
Copy Markdown
Contributor

@pullfrog pullfrog bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well-structured refactor that correctly wraps production code in runWithLogContext and updates 33 test mocks. The structural changes in all three production files are sound — no dropped code paths, all cleanup branches preserved. One actionable finding: conversationId is missing from the log context in two key functions.

Pullfrog  | Fix all ➔Fix 👍s ➔View workflow run | Using Claude Opus𝕏

Comment thread agents-api/src/domains/run/workflow/steps/agentExecutionSteps.ts Outdated
Comment thread agents-api/src/domains/run/workflow/steps/agentExecutionSteps.ts Outdated
@itoqa
Copy link
Copy Markdown

itoqa bot commented Apr 8, 2026

Ito Test Report ✅

14 test cases ran. 7 additional findings, 7 passed.

Overall, the unified run executed 14 test cases with a mixed result (7 passed, 7 failed): core CRUD/security/responsiveness paths were generally stable, but multiple medium/high-impact reliability defects were confirmed in production code. The most important issues were trigger-related correctness gaps (SSR/client timezone hydration mismatches plus unbounded Run Now duplicates and repeatable reruns), auth/contract failures (valid playground JWTs rejected and stale durable approvals incorrectly returning success), missing required bulk dataset-item creation UI despite backend support, and stale dataset-run deep links producing an internal 500 transport path instead of a clean not-found flow.

✅ Passed (7)
Category Summary Screenshot
Adversarial Injection payloads remained inert text with no script execution. ADV-3
Edge Slow-3G double-submit created one dataset and one evaluator without duplicates. EDGE-2
Edge Mobile viewport checks for Evaluations and Triggers remained usable with key create flows accessible and no crash/blank state. EDGE-3
Logic Slack MCP access settings persisted across save and reload for selected and all modes; GitHub selected mode remained unavailable when no repositories were available. LOGIC-2
Happy-path Dataset create/rename/delete completed and removal persisted after refresh. ROUTE-2
Happy-path Scheduled trigger create route and single Run Now flow were verified in UI; invocation reached a terminal state when observed correctly. ROUTE-5
Happy-path Webhook trigger accepted a POST and invocation list/detail retrieval returned expected metadata and execution fields. ROUTE-6
ℹ️ Additional Findings (7)

These findings are unrelated to the current changes but were observed during testing.

Category Summary Screenshot
Adversarial ⚠️ Valid playground JWT is rejected by /manage/available-agents (401), matching tampered-token behavior. ADV-1
Adversarial ⚠️ Re-running the same original invocation repeatedly is accepted, creating multiple rerun invocations from one source invocation. ADV-5
Adversarial 🟠 Stale approval submissions return success instead of rejection for durable execution approvals. ADV-6
Edge 🟠 Rapid Run Now requests create one new invocation per request with no server-side deduplication or rate bound. EDGE-1
Edge 🟠 Invalid dataset run deep-link causes an unexpected internal 500 transport request instead of a clean not-found handling path. EDGE-5
Happy-path 🟠 Triggers page can hydrate with mismatched timezone/date text between SSR and client render. ROUTE-1
Happy-path 🟠 Dataset items UI lacks a bulk creation path required by the tested flow. ROUTE-3
⚠️ Valid playground token is rejected by available-agents
  • What failed: A newly issued playground token is rejected as 401 Invalid or expired token, instead of being accepted while a tampered token is rejected.
  • Impact: Playground-based agent discovery and related execution flows fail for valid sessions because auth cannot be established. This blocks normal user workflows and masks downstream behavior behind unauthorized errors.
  • Steps to reproduce:
    1. Sign in to the Manage UI in a local non-production environment.
    2. Generate a playground token for an existing project and agent from the Try it flow or playground token endpoint.
    3. Call GET /manage/available-agents with the returned token in the Authorization Bearer header.
    4. Tamper one character in the JWT and repeat the call to compare behavior.
  • Stub / mock context: The run used temporary local QA accommodations: frontend dev overlays/indicators were disabled, and playground token handling in agents-api/src/domains/manage/routes/playgroundToken.ts included localhost-only bypass logic so token-generation flows could proceed in the local environment.
  • Code analysis: I inspected manage token issuance and verification paths in the repo. The issuer signs tokens with tid/pid claims and no explicit issuer/audience/type, while the verifier enforces issuer inkeep-manage-api, audience inkeep-run-api, and type: temporary, then expects tenantId/projectId in payload.
  • Why this is likely a bug: The issuing and verifying code paths require incompatible JWT structure/claims, so valid first-party tokens fail deterministically even without test harness assumptions.

Relevant code:

agents-api/src/domains/manage/routes/playgroundToken.ts (lines 146-155)

const token = await new SignJWT({
  tid: tenantId,
  pid: projectId,
  agentId,
})
  .setProtectedHeader({ alg: 'RS256', kid })
  .setSubject(userId)
  .setIssuedAt()
  .setExpirationTime('1h')
  .sign(privateKey);

packages/agents-core/src/utils/temp-jwt.ts (lines 21-27)

const { payload } = await jose.jwtVerify(token, publicKey, {
  issuer: 'inkeep-manage-api',
  audience: 'inkeep-run-api',
});

if (payload.type !== 'temporary') {
  throw new Error('Invalid token type');
}

agents-api/src/domains/manage/routes/availableAgents.ts (lines 39-44)

const payload = await verifyTempToken(publicKeyPem, token);

return {
  userId: payload.sub,
  tenantId: payload.tenantId,
  tokenType: 'temp-jwt',
};
⚠️ Repeated rerun requests are accepted for the same original invocation
  • What failed: The same original invocation can be rerun repeatedly, producing multiple rerun invocations instead of rejecting replay-style duplicate transitions.
  • Impact: Duplicate reruns from one source invocation can trigger repeated downstream executions and side effects that operators likely intended to run once. This weakens state-transition integrity for invocation lifecycle controls.
  • Steps to reproduce:
    1. Open a scheduled trigger invocation that is completed, failed, or cancelled.
    2. Call rerun on that invocation once.
    3. Call rerun again on the same original invocation.
    4. Inspect invocation history and confirm multiple rerun invocations were created from the same source.
  • Stub / mock context: This adversarial rerun check ran under the same localhost auth bypass setup used in the session, allowing repeated rerun requests to hit local application routes directly.
  • Code analysis: I inspected rerun validation and invocation creation logic in agents-api/src/domains/manage/routes/scheduledTriggers.ts. The route checks only source invocation status and then always creates a new invocation with a unique time-based key; there is no persisted single-rerun guard or lookup for prior reruns of the same source invocation.
  • Why this is likely a bug: The rerun route permits replaying the same source invocation indefinitely because it never records or checks rerun provenance before creating another invocation.

Relevant code:

agents-api/src/domains/manage/routes/scheduledTriggers.ts (lines 1326-1332)

// Only allow rerun of completed, failed, or cancelled invocations
if (originalInvocation.status === 'pending' || originalInvocation.status === 'running') {
  throw createApiError({
    code: 'bad_request',
    message: `Cannot rerun invocation with status: ${originalInvocation.status}. Wait for it to complete or cancel it first.`,
  });
}

agents-api/src/domains/manage/routes/scheduledTriggers.ts (lines 1373-1386)

const newInvocationId = generateId();

await createScheduledTriggerInvocation(runDbClient)({
  id: newInvocationId,
  tenantId,
  projectId,
  agentId,
  scheduledTriggerId,
  ref: resolvedRef,
  status: 'pending',
  scheduledFor: new Date().toISOString(),
  idempotencyKey: `manual-rerun-${invocationId}-${Date.now()}`,
  attemptNumber: 1,
  runAsUserId: rerunRunAsUserId,
});
🟠 Stale approval submission against durable execution
  • What failed: The stale approval request returns 200 with {"success":true} instead of rejecting the stale token/reference (expected 4xx-style rejection).
  • Impact: Clients receive a false success signal for stale approvals, which can mislead operators and automations into believing an approval was applied when it was not. This weakens execution-state integrity and can mask real approval handling issues.
  • Steps to reproduce:
    1. Start an approval-required execution and capture executionId and toolCallId.
    2. Start a newer execution for the same conversation.
    3. Submit approval to the old execution/tool call endpoint.
    4. Observe that the endpoint responds with success instead of rejecting the stale approval.
  • Stub / mock context: Approval behavior was exercised through real local run API calls using bypass authorization headers rather than browser-intercepted responses. The server was restarted with environment variables loaded so temporary token issuance and execution setup could proceed consistently.
  • Code analysis: I inspected the durable approval route and hook token generation path. The route catches toolApprovalHook.resume errors that include not found/already and intentionally suppresses them, then always returns success; this creates false-positive acknowledgement for stale approvals. PR metadata shows this file changed only in an unrelated logging line, so this defect appears pre-existing.
  • Why this is likely a bug: Returning a success response after explicitly detecting stale/missing approval hooks creates an incorrect API contract for stale submissions and contradicts the expected rejection semantics for outdated execution approvals.

Relevant code:

agents-api/src/domains/run/routes/executions.ts (lines 375-391)

try {
  await toolApprovalHook.resume(token, {
    approved,
    reason: approved ? undefined : reason,
  });
} catch (error) {
  const message = error instanceof Error ? error.message : String(error);
  if (!message.includes('not found') && !message.includes('already')) {
    throw error;
  }
  logger.warn(
    { executionId, toolCallId, message },
    'Tool approval already processed (idempotent)'
  );
}

return c.json({ success: true as boolean }, 200);

agents-api/src/domains/run/workflow/functions/agentExecution.ts (lines 88-89)

const token = `tool-approval:${payload.conversationId}:${workflowRunId}:${hookToolCallId}`;
🟠 Rapid interaction run-now requests create unbounded duplicate invocations
  • What failed: Ten rapid manual-run requests created ten additional invocations; expected behavior for this abuse case is idempotent or bounded duplicate creation.
  • Impact: Repeated clicks or scripted requests can flood execution with duplicate runs and inflate queue load/cost. This also increases accidental duplicate processing risk for users.
  • Steps to reproduce:
    1. Open the project Scheduled Triggers page.
    2. Trigger Run Now rapidly for the same scheduled trigger, or send repeated POST requests to the run endpoint.
    3. Open the scheduled trigger invocations list.
    4. Confirm the invocation count increases by one for each rapid request.
  • Stub / mock context: Localhost-only auth bypass behavior was active for this scenario, so the rapid run-now abuse sequence exercised local production code paths without real external sign-in dependencies.
  • Code analysis: I reviewed scheduled trigger manual run creation logic in agents-api/src/domains/manage/routes/scheduledTriggers.ts and idempotent scheduled-run workflow logic in agents-api/src/domains/run/workflow/steps/scheduledTriggerSteps.ts. The manual route always creates fresh invocations with unique timestamp-based idempotency keys and does not perform an existing-invocation check, while another production path explicitly performs idempotency lookup.
  • Why this is likely a bug: The manual run path lacks the deduplication guard already used in scheduled execution flow, so repeated requests create unbounded duplicates by design rather than rejecting or coalescing replayed actions.

Relevant code:

agents-api/src/domains/manage/routes/scheduledTriggers.ts (lines 1731-1747)

for (let userIndex = 0; userIndex < runAsUserIds.length; userIndex++) {
  const runAsUserId = runAsUserIds[userIndex];
  const delayBeforeExecutionMs = userIndex * dispatchDelayMs;
  const invocationId = generateId();
  invocationIds.push(invocationId);

  await createScheduledTriggerInvocation(runDbClient)({
    id: invocationId,
    tenantId,
    projectId,
    agentId,
    scheduledTriggerId,
    ref: resolvedRef,
    status: 'pending',
    scheduledFor,
    idempotencyKey: `manual-run-${scheduledTriggerId}-${runAsUserId ?? 'none'}-${timestamp}`,

agents-api/src/domains/run/workflow/steps/scheduledTriggerSteps.ts (lines 100-110)

// Check if invocation already exists
const existing = await getScheduledTriggerInvocationByIdempotencyKey(runDbClient)({
  idempotencyKey: params.idempotencyKey,
});

if (existing) {
  logger.info(
    { scheduledTriggerId: params.scheduledTriggerId, idempotencyKey: params.idempotencyKey },
    'Invocation already exists, skipping creation'
  );
  return { invocation: existing, alreadyExists: true };
}
🟠 Deep-link stale IDs for dataset run and trigger invocation pages
  • What failed: The backend route returns not-found for a missing run, but the page also triggers a server-action POST that fails with an internal 500 transport error instead of staying on a fully controlled not-found flow.
  • Impact: Users opening stale dataset run links can hit noisy internal error behavior and inconsistent failure handling. This weakens reliability and can complicate support/debugging for broken links.
  • Steps to reproduce:
    1. Open a project in Manage UI.
    2. Navigate directly to /datasets/{datasetId}/runs/{runId} using stale dataset and run IDs.
    3. Reload the same stale deep-link once.
    4. Observe that a page-level not-found message is accompanied by an internal 500 transport failure path.
  • Stub / mock context: A local dev-session authentication bypass was used to mint a session cookie and keep the browser authenticated while deep-link behavior was validated; no feature-specific response mocking was applied for the dataset run or trigger invocation endpoints.
  • Code analysis: I reviewed the dataset run page component, the dataset run API helper, and shared API request error handling. The page is a client component that calls helpers from a server-only module; those helpers throw ApiError on non-2xx responses (including 404), which aligns with the observed extra 500 POST transport failure during stale deep-link loads.
  • Why this is likely a bug: The production code path combines a client page with server-only API helpers that throw on expected 404 deep-link misses, creating an avoidable internal 500 transport failure instead of a clean not-found UX.

Relevant code:

agents-manage-ui/src/app/[tenantId]/projects/[projectId]/datasets/[datasetId]/runs/[runId]/page.tsx (lines 1-53)

'use client';

import type { DatasetRunInvocation, DatasetRunWithConversations } from '@/lib/api/dataset-runs';
import { fetchDatasetRun, fetchDatasetRunItems } from '@/lib/api/dataset-runs';

async function loadRun(showLoading = true) {
  try {
    if (showLoading) {
      setLoading(true);
    }
    setError(null);
    const [response, itemsResponse] = await Promise.all([
      fetchDatasetRun(tenantId, projectId, runId),
      fetchDatasetRunItems(tenantId, projectId, runId),
    ]);

agents-manage-ui/src/lib/api/dataset-runs.ts (lines 1-68)

'use server';

import { cache } from 'react';
import type { ListResponse, SingleResponse } from '../types/response';
import { makeManagementApiRequest } from './api-config';

async function $fetchDatasetRun(
  tenantId: string,
  projectId: string,
  runId: string
): Promise<SingleResponse<DatasetRunWithConversations>> {
  return makeManagementApiRequest<SingleResponse<DatasetRunWithConversations>>(
    `tenants/${tenantId}/projects/${projectId}/evals/dataset-runs/${runId}`
  );
}

export const fetchDatasetRun = cache($fetchDatasetRun);

agents-manage-ui/src/lib/api/api-config.ts (lines 81-138)

if (!response.ok) {
  let errorData: any;
  try {
    const text = await response.text();
    errorData = text ? JSON.parse(text) : null;
  } catch {
    errorData = null;
  }

  const errorMessage =
    errorData?.error?.message ||
    errorData?.message ||
    errorData?.detail ||
    `HTTP ${response.status}: ${response.statusText}` ||
    'Unknown error occurred';

  const errorCode = errorData?.error?.code || errorData?.code || 'unknown';

  throw new ApiError(
    {
      code: errorCode,
      message: errorMessage,
    },
    response.status
  );
}
🟠 Manage UI session bootstrap and project navigation remains stable
  • What failed: The table header and date cells are rendered with local-timezone formatting that can differ between server render and browser render, causing hydration text mismatch errors instead of stable hydration.
  • Impact: Users can hit hydration instability and console errors on a core navigation path. In mismatch cases, React may discard server markup and re-render client-side, which risks transient UI inconsistency and reduced reliability.
  • Steps to reproduce:
    1. Open Manage UI and sign in with an admin account.
    2. Create or open a project, then navigate to the Triggers page.
    3. Load or reload the page while scheduled trigger rows are rendered.
    4. Observe hydration mismatch errors related to timezone/date text rendering in the table.
  • Stub / mock context: Development error overlays were disabled so navigation and hydration issues stayed visible in the console during this run; no API response stubs, route interception mocks, or auth bypass behavior were applied to this test flow.
  • Code analysis: agents-manage-ui/src/components/project-triggers/scheduled-tab-content.tsx server-fetches trigger data and renders a client table component. That table computes local timezone abbreviations and local-formatted datetimes during render in agents-manage-ui/src/components/project-triggers/project-scheduled-triggers-table.tsx, and those helpers in agents-manage-ui/src/lib/utils/format-date.ts depend on runtime locale/timezone via Intl.DateTimeFormat, making SSR/client output non-deterministic.
  • Why this is likely a bug: Production render code uses environment-dependent local timezone formatting during hydration-sensitive output, so identical data can produce different server/client text and trigger hydration mismatches.

Relevant code:

agents-manage-ui/src/components/project-triggers/scheduled-tab-content.tsx (lines 1-20)

import { fetchProjectScheduledTriggers } from '@/lib/api/project-triggers';
import { ScheduledTabPanel } from './scheduled-tab-panel';

export async function ScheduledTabContent({ tenantId, projectId }: { tenantId: string; projectId: string }) {
  const { triggers, agents } = await fetchProjectScheduledTriggers(tenantId, projectId);

  return (
    <ScheduledTabPanel
      tenantId={tenantId}
      projectId={projectId}
      initialTriggers={triggers}
      agents={agents}
    />
  );
}

agents-manage-ui/src/components/project-triggers/project-scheduled-triggers-table.tsx (lines 66-210)

function formatLastRun(trigger: ScheduledTriggerWithAgent): string {
  if (trigger.lastRunAt) {
    return formatDateTimeLocal(trigger.lastRunAt);
  }
  return '—';
}

function formatNextRun(trigger: ScheduledTriggerWithAgent): string {
  if (!trigger.enabled) {
    return '—';
  }
  if (trigger.nextRunAt) {
    return formatDateTimeLocal(trigger.nextRunAt);
  }
  return '—';
}

const localTz = getLocalTimezoneAbbreviation();
<TableHead>Last Run{localTz ? ` (${localTz})` : ''}</TableHead>
<TableHead>Next Run{localTz ? ` (${localTz})` : ''}</TableHead>

agents-manage-ui/src/lib/utils/format-date.ts (lines 161-200)

export function formatDateTimeLocal(dateString: string): string {
  const normalized = normalizeDateString(dateString);
  const date = new Date(normalized);
  if (Number.isNaN(date.getTime())) return 'Invalid date';

  return new Intl.DateTimeFormat('en-US', {
    month: 'numeric',
    day: 'numeric',
    year: 'numeric',
    hour: 'numeric',
    minute: '2-digit',
    second: '2-digit',
    hour12: true,
  }).format(date);
}

export function getLocalTimezoneAbbreviation(): string {
  const parts = new Intl.DateTimeFormat('en-US', { timeZoneName: 'short' }).formatToParts(new Date());
  return parts.find((p) => p.type === 'timeZoneName')?.value ?? '';
}
🟠 Dataset item bulk creation is missing in Manage UI
  • What failed: The UI exposes only single-item create/edit/delete flows; there is no bulk item creation entry point despite bulk creation being part of the required dataset-item workflow.
  • Impact: Teams cannot execute bulk dataset-item authoring from the Manage UI and must fall back to slower manual single-item entry or direct API usage. This blocks the intended end-to-end workflow for larger evaluation datasets.
  • Steps to reproduce:
    1. Open the project Test Suites page and enter a dataset detail view.
    2. Go to the dataset items table and inspect available create actions.
    3. Try to find a bulk item add/import flow after creating at least one item.
    4. Observe that only single-item create/edit/delete interactions are exposed.
  • Stub / mock context: The run used local development UX toggles that suppress dev overlays/indicators, and no mock route, synthetic backend response, or bypass altered dataset item behavior during this check.
  • Code analysis: I reviewed the dataset-items UI components and server actions in agents-manage-ui, then compared them with manage API routes in agents-api. The UI wires only single-item create/update/delete handlers, while the backend exposes a dedicated bulk-create endpoint, so the missing UI path is a production-code gap rather than test setup noise.
  • Why this is likely a bug: Production UI code omits any bulk-create control or action despite a supported backend bulk endpoint and a required workflow that depends on it.

Relevant code:

agents-manage-ui/src/components/dataset-items/dataset-items-table.tsx (lines 129-145)

emptyState={
  <div className="flex flex-col items-center gap-4">
    <span>No items yet</span>
    <DatasetItemFormDialog
      tenantId={tenantId}
      projectId={projectId}
      datasetId={datasetId}
      isOpen={isCreateDialogOpen}
      onOpenChange={setIsCreateDialogOpen}
      trigger={
        <Button variant="outline" size="sm">
          <Plus className="h-4 w-4" />
          Add first item
        </Button>
      }
    />
  </div>
}

agents-manage-ui/src/lib/actions/dataset-items.ts (lines 27-40)

export async function createDatasetItemAction(
  tenantId: string,
  projectId: string,
  datasetId: string,
  item: DatasetItemInsert
): Promise<ActionResult> {
  try {
    await createDatasetItemApi(tenantId, projectId, datasetId, item);
    revalidatePath(`/${tenantId}/projects/${projectId}/datasets/${datasetId}`);
    return { success: true };
  } catch (error) {
    const errorMessage = error instanceof Error ? error.message : 'Failed to create dataset item';
    return { success: false, error: errorMessage };
  }
}

agents-api/src/domains/manage/routes/evals/datasetItems.ts (lines 193-214)

app.openapi(
  createProtectedRoute({
    method: 'post',
    path: '/{datasetId}/items/bulk',
    summary: 'Create Multiple Dataset Items',
    operationId: 'create-dataset-items-bulk',
    tags: ['Evaluations'],
    permission: requireProjectPermission('edit'),
    request: {
      params: TenantProjectParamsSchema.extend({ datasetId: z.string() }),
      body: {
        content: {
          'application/json': {
            schema: z.array(DatasetItemApiInsertSchema),
          },
        },
      },
    },
    responses: {
      201: {
        description: 'Dataset items created',

Commit: 6aaf578

View Full Run


Tell us how we did: Give Ito Feedback

amikofalvy and others added 9 commits April 7, 2026 23:22
…scoped logger context

Remove repeated ambient fields (tenantId, projectId, agentId, requestId,
conversationId) from logger calls in agentExecutionSteps, scheduledTriggerSteps,
and executionHandler. Add runWithLogContext wrappers for operation-level context.
Convert empty data objects to string-only logger calls.

Update 33 test files to include runWithLogContext in their logger mocks
so tests work with the new imports.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… logger context

Remove ambient fields (tenantId, projectId, agentId) from logger calls across
22 files in agent generation, tools, A2A, context resolution, routes, blob
storage, and stream handling. Convert empty data objects to string-only calls.
Fix global test setup mock to properly return mockLogger from with()/child().

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: pullfrog[bot] <226033991+pullfrog[bot]@users.noreply.github.com>
Co-authored-by: pullfrog[bot] <226033991+pullfrog[bot]@users.noreply.github.com>
…er context

Remove ambient fields (tenantId, projectId, agentId) from logger calls across
41 files in manage routes, eval routes/services/workflows, and middleware.
Convert empty data objects to string-only calls. Middleware files only get
empty-object conversion (ambient fields kept since they run before ALS).

Fix userId incorrectly removed from AgentMcpManager (not ambient).
Update conversations-media test assertion for removed ambient fields.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Convert empty data objects to string-only logger calls across 5 files
in data-access, dolt, utils. Conservative approach for library package:
only empty-object cleanup, no ambient field removal since agents-core
functions may be called outside ALS scope.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ntext

agents-sdk: Convert evaluationClient to use class member logger with
.with({ tenantId, projectId }) — removes 126 repeated field pairs.

agents-work-apps: Convert empty data objects to string-only logger calls
across 13 Slack and GitHub files. Conservative approach — no ambient
field removal since work-apps have their own middleware.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reset to main's version (which includes durable approval code) and
re-applied runWithLogContext wrapper and ambient field removal.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@amikofalvy amikofalvy force-pushed the feat/logger-migration branch from 6aaf578 to 509cd92 Compare April 8, 2026 06:30
…nflict

Re-apply runWithLogContext wrappers and ambient field removal that was
lost during the rebase conflict resolution.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 8, 2026

Preview URLs

Use these stable preview aliases for testing this PR:

These point to the same Vercel preview deployment as the bot comment, but they stay stable and easier to find.

Raw Vercel deployment URLs

Remove agentId/tenantId/projectId from destructuring where they were only
used in logger data objects that have been simplified.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@vercel vercel bot temporarily deployed to Preview – agents-docs April 8, 2026 06:59 Inactive
@github-actions github-actions bot deleted a comment from claude bot Apr 8, 2026
@amikofalvy amikofalvy added this pull request to the merge queue Apr 8, 2026
Merged via the queue into main with commit 63a1358 Apr 8, 2026
24 checks passed
@amikofalvy amikofalvy deleted the feat/logger-migration branch April 8, 2026 07:39
@itoqa
Copy link
Copy Markdown

itoqa bot commented Apr 8, 2026

Ito Test Report ✅

20 test cases ran. 2 additional findings, 18 passed.

Across 20 end-to-end, edge-case, and security scenarios, 18 passed and 2 failed, indicating strong overall regression stability across evaluations/datasets, trigger/webhook flows, GitHub/MCP settings, and auth/media/OAuth/Slack abuse defenses. The two key confirmed production defects were a medium-severity scheduled trigger creation failure when the default browser timezone is unsupported (e.g., Etc/Unknown) and a high-severity playground issue where refresh plus back/forward during active streaming leaves the chat pane/input unrecoverable, while duplicate-submit handling and other security checks remained fail-closed as expected.

✅ Passed (18)
Category Summary Screenshot
Adversarial Invalid/missing bearer token variants against /manage/available-agents were rejected with controlled 401 responses and no sensitive leakage. ADV-1
Adversarial Malicious webhook payload content rendered as inert text in invocation details with no script execution or crash. ADV-2
Adversarial Valid scoped media key returned payload (200), while same key in different conversation/project and traversal-like variants were denied (404/400) with no media leakage or crash. ADV-3
Adversarial Forged Slack event requests were rejected (non-2xx), confirming fail-closed behavior for signature/replay abuse attempts. ADV-4
Adversarial Tampered OAuth callback URLs failed closed with 404 behavior and did not create a linkage path. ADV-5
Edge Created an initial test suite, then rapid double-submitted a second create attempt; resulting datasets list contained exactly one 'ts-edge1-dup' entry, so duplicate-write prevention held. EDGE-1
Edge At 390x844 mobile viewport, trigger workflow actions remained usable and a new webhook trigger appeared in the list. EDGE-2
Edge Stale execution endpoints returned deterministic not_found envelopes without crashing the UI. EDGE-4
Edge Approval replay and out-of-order handling remained idempotent with no duplicate side effects found in code path review. EDGE-5
Happy-path Evaluations page loaded, tab switching worked, and reload remained stable without duplicate error behavior. ROUTE-1
Happy-path GitHub MCP access mode persistence worked across save/reload; Slack MCP mode toggling was not applicable because Slack was not connected in this environment. ROUTE-10
Happy-path Dataset create/update/delete flow worked, rapid double-submit produced at most one duplicate-attempt record, and persistence held after refresh. ROUTE-2
Happy-path Evaluator relation lifecycle was stable: agent scope persisted after explicit save+refresh and cleanly returned to project-wide scope after removal. ROUTE-3
Happy-path Dataset run config creation and trigger path reached run detail successfully, and refreshed run status stayed coherent. ROUTE-4
Happy-path Continuous evaluation results page handled empty-state results correctly, filter controls applied/cleared, and reload stayed stable. ROUTE-5
Happy-path Webhook trigger creation and valid POST dispatch produced an invocation, and invocation payload metadata rendered without UI crash. ROUTE-6
Happy-path Playground chat rendered and completed two prompts with one assistant response per prompt in the same conversation. ROUTE-8
Happy-path GitHub project access mode changes (selected↔all) persisted correctly after save/reload once a valid GitHub installation fixture was present. ROUTE-9
ℹ️ Additional Findings (2)

These findings are unrelated to the current changes but were observed during testing.

Category Summary Screenshot
Edge ⚠️ After refresh and back/forward during streaming, Try it stays active but the chat pane/input becomes unrecoverable. EDGE-3
Happy-path 🟠 Scheduled trigger creation fails on default form timezone in this environment; creation succeeds after selecting a valid timezone manually. ROUTE-7
⚠️ Playground chat loses recovery after refresh and history navigation
  • What failed: The UI keeps Try it active but does not recover a usable chat pane/input for the in-flight conversation; expected behavior is to restore a usable conversation session after navigation.
  • Impact: Users can lose an active conversation after common navigation actions and may be unable to continue without clearing context and restarting. This breaks reliability for long-running playground testing workflows.
  • Steps to reproduce:
    1. Open an agent page and click Try it.
    2. Submit a long-running prompt and refresh the page while streaming is active.
    3. Use browser back then forward, reopen Try it, and attempt to continue the conversation.
    4. Observe that the chat pane/input does not reliably recover for continued use.
  • Stub / mock context: A local development fallback for playground completions remained active in agents-api/src/domains/run/handlers/executionHandler.ts during this scenario, but the confirmed defect is in manage UI conversation state reset/persistence behavior rather than completion provider output.
  • Code analysis: I reviewed the playground mount/unmount flow and agent state store. The playground unmount path always resets conversation identity, and the store also regenerates IDs on reset while not persisting playgroundConversationId, so refresh/remount cannot recover the previous in-flight conversation context.
  • Why this is likely a bug: The production UI explicitly discards and does not persist the active playground conversation identity during remount/reset, which directly explains unrecoverable post-refresh chat state.

Relevant code:

agents-manage-ui/src/components/agent/playground/playground.tsx (lines 52-55)

useEffect(() => {
  // when the playground is closed the chat widget is unmounted so we need to reset the conversation id
  return () => resetPlaygroundConversationId();
}, []);

agents-manage-ui/src/features/agent/state/use-agent-store.ts (lines 138-139)

const { isSidebarSessionOpen: _, ...state } = initialAgentState;
set({ ...state, playgroundConversationId: generateId() });

agents-manage-ui/src/features/agent/state/use-agent-store.ts (lines 253-259)

partialize(state) {
  return {
    jsonSchemaMode: state.jsonSchemaMode,
    isSidebarPinnedOpen: state.isSidebarPinnedOpen,
    hasTextWrap: state.hasTextWrap,
  };
},
🟠 Scheduled trigger create/edit/invocation listing
  • What failed: The trigger is not persisted when submitted with the default browser timezone (Etc/Unknown in this environment), while expected behavior is successful creation with valid defaults or clear validation before submit.
  • Impact: Scheduled automations can silently fail to be created for users whose browser timezone resolves to an unsupported value. This causes missed scheduled executions until users manually change timezone settings.
  • Steps to reproduce:
    1. Navigate to /{tenantId}/projects/{projectId}/triggers?tab=scheduled.
    2. Click New scheduled trigger, select an agent, and continue to the form.
    3. Fill required fields and submit while keeping the default timezone value.
    4. Return to the scheduled trigger list and observe the trigger was not persisted.
    5. Change timezone to a valid IANA value and resubmit to observe successful creation.
  • Stub / mock context: Real identity-provider login was bypassed in agents-manage-ui/src/proxy.ts so a deterministic local admin session could be established; no route-level API stubbing was used for scheduled trigger create/list behavior, and the failure followed production create and scheduling logic in the local stack.
  • Code analysis: I inspected the scheduled trigger form defaults, create route handling, and next-run computation path. The UI defaults cronTimezone directly from browser timezone without validation, and the backend passes that value into cron parsing for next-run computation during create.
  • Why this is likely a bug: Production code accepts an unvalidated browser-derived timezone and then uses it directly in cron parsing, so unsupported timezone values can break create persistence instead of being safely normalized or rejected with clear validation.

Relevant code:

agents-manage-ui/src/components/scheduled-triggers/scheduled-trigger-form.tsx (lines 132-145)

const getDefaultValues = (): ScheduledTriggerFormData => {
  // Get browser's timezone for new triggers
  const browserTimezone = Intl.DateTimeFormat().resolvedOptions().timeZone || 'UTC';

  if (!trigger) {
    const p = defaultsFromParams;
    return {
      enabled: true,
      name: '',
      description: '',
      scheduleType: (p?.scheduleType as 'cron' | 'one-time') || 'cron',
      cronExpression: p?.cronExpression || '',
      cronTimezone: p?.cronTimezone || browserTimezone,

agents-api/src/domains/manage/routes/scheduledTriggers.ts (lines 401-407)

const enabled = body.enabled ?? true;
const nextRunAt = enabled
  ? computeNextRunAt({
      cronExpression: body.cronExpression,
      cronTimezone: body.cronTimezone,
      runAt: body.runAt,
    })
  : null;

packages/agents-core/src/utils/compute-next-run-at.ts (lines 15-21)

if (cronExpression) {
  const baseDate = lastScheduledFor ? new Date(lastScheduledFor) : new Date();
  const interval = CronExpressionParser.parse(cronExpression, {
    currentDate: baseDate,
    tz: cronTimezone || 'UTC',
  });
  return interval.next().toISOString();
}

Commit: 511a805

View Full Run


Tell us how we did: Give Ito Feedback

amikofalvy added a commit that referenced this pull request Apr 8, 2026
These files were modified by PRs #3062 and #3064 which merged after the
logger migration in #3067 but were not updated to use the scoped context
pattern. Wraps tool-approval.ts and tool-wrapper.ts execute paths in
runWithLogContext({ toolCallId, toolName }), strips repeated ambient
fields from individual logger calls, and adds logger scoped context
guidance to AGENTS.md and the api-logging-guidelines skill.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant