Skip to content

feat: add local telemetry ledger and reporting command#39

Closed
ndycode wants to merge 3 commits intodevfrom
feat/product-visibility-telemetry
Closed

feat: add local telemetry ledger and reporting command#39
ndycode wants to merge 3 commits intodevfrom
feat/product-visibility-telemetry

Conversation

@ndycode
Copy link
Owner

@ndycode ndycode commented Mar 4, 2026

Summary

  • add a local-only structured telemetry subsystem (lib/telemetry.ts) with redaction, bounded JSONL storage, log rotation, query, and summary helpers
  • add telemetry configuration support via elemetryEnabled and env override CODEX_AUTH_TELEMETRY_ENABLED
  • add codex auth telemetry command with --json, --since-hours, and --limit
  • instrument high-signal outcomes for CLI and plugin request paths (failures and recoveries)
  • document telemetry behavior and toggles in commands/settings/privacy docs

Why this increases leverage

This introduces a reusable visibility primitive for product iteration without adding any remote analytics dependency. Teams can quickly answer "what is failing/recovering" and layer future experimentation/usage systems on top of the same local event stream.

Minimal implementation

  • local append-only telemetry ledger under ~/.codex/multi-auth/logs/product-telemetry.jsonl
  • config/env gate to disable writes
  • one reporting command for human and JSON output
  • targeted instrumentation points (not noisy tracing)

Files changed

  • lib/telemetry.ts (new)
  • lib/config.ts, lib/schemas.ts, lib/index.ts
  • lib/codex-manager.ts
  • index.ts
  • docs: docs/reference/commands.md, docs/reference/settings.md, docs/privacy.md
  • tests: est/telemetry.test.ts (new), plus updates to CLI/config/schema/index tests

Validation

pm run typecheck

pm run lint

pm test

pm run build

All passed in the isolated worktree branch.

Post-review updates (2026-03-05)

What changed

  • Hardened telemetry parsing by validating source and outcome enum values before events are admitted into query/summary flows.
  • Sorted telemetry report events by timestamp before summary and recent-event rendering to guarantee deterministic ordering.
  • Made CLI telemetry lifecycle emission awaitable (start/finish/exception) to avoid dropped telemetry on fast exits.
  • Expanded regression coverage across telemetry and CLI paths (Windows path output, deterministic since-hours, disabled/exception lifecycle branches, concurrent append serialization, transient fs cleanup retry, index telemetry assertions).
  • Added telemetry rollout/automation note in command reference.

How to test

  • npm run test -- test/telemetry.test.ts test/codex-manager-cli.test.ts test/index.test.ts
  • npm run typecheck
  • npm run lint:ts
  • npm run lint:scripts

Risk / rollout notes

  • Risk level: low-to-medium.
  • Behavior changes are limited to telemetry reporting order, stricter malformed-event filtering, and telemetry write synchronization in CLI command lifecycle.
  • Rollout is standard merge after CI; consumers relying on malformed telemetry rows should now expect those rows to be ignored.

note: greptile review for oc-chatgpt-multi-auth. cite files like lib/foo.ts:123. confirm regression tests + windows concurrency/token redaction coverage.

Greptile Summary

this pr introduces a local-only structured telemetry subsystem (lib/telemetry.ts) backed by a bounded, rotating jsonl log at ~/.codex/multi-auth/logs/product-telemetry.jsonl, instruments high-signal cli and plugin outcomes, and adds a codex auth telemetry reporting command. the overall design is solid — serial append queue, pii redaction, config/env gate, and async+retry for windows filesystem concurrency are all present.

key changes

  • lib/telemetry.ts — new module: serial queueAppend, sanitizeValue with email/token masking, async rotateLogsIfNeeded with runFileOperationWithRetry, queryTelemetryEvents / summarizeTelemetryEvents helpers
  • lib/codex-manager.ts — adds runTelemetry command (--json, --since-hours, --limit) and wraps the entire cli dispatch in a try/catch that emits cli.command.start / cli.command.finish / cli.command.exception events
  • index.ts — instruments plugin request paths: request.auth_refresh_failed, request.network_error, request.server_error, request.stream_failover_recovered, request.accounts_exhausted
  • lib/config.ts / lib/schemas.tstelemetryEnabled config field + CODEX_AUTH_TELEMETRY_ENABLED env override
  • docs and tests are thorough

issues found

  • listTelemetryFiles() in lib/telemetry.ts still uses readdirSync — a synchronous call with no windows ebusy/eperm retry, inconsistent with the async+retry pattern applied to all other fs operations in this pr. an antivirus lock silently returns [] (no events shown, no error). needs fs.readdir wrapped in runFileOperationWithRetry, plus a corresponding vitest coverage case
  • isErrnoException type guard is too broad (error instanceof Error) — it should also assert typeof (error as NodeJS.ErrnoException).code === "string" to be a safe NodeJS.ErrnoException guard

Confidence Score: 3/5

  • mostly safe but one unaddressed sync-fs concurrency path in listTelemetryFiles needs fixing before merge on windows targets
  • the core telemetry machinery is well-built — async+retry for rotation, redaction, serial queue, proper opt-out gate. the instrumentation points are targeted and the tests are thorough. score is held at 3 because readdirSync in listTelemetryFiles is the same class of windows concurrency bug explicitly called out in project convention (AGENTS.md) and already fixed for other ops in this same pr — it should be consistent before merge
  • lib/telemetry.ts — listTelemetryFiles (readdirSync, no retry) and isErrnoException type guard

Important Files Changed

Filename Overview
lib/telemetry.ts new local telemetry module — well-structured with redaction, serial append queue, and async retry for rotation ops, but listTelemetryFiles still uses sync readdirSync with no retry (Windows EBUSY risk), and isErrnoException type guard is too broad
lib/codex-manager.ts adds runTelemetry command and wraps CLI dispatch in a try/catch with lifecycle telemetry events; arg parsing and limit clamping look correct
index.ts instruments plugin request paths (auth_refresh_failed, network_error, server_error, stream_failover_recovered, accounts_exhausted) with void-fired telemetry; gate check and error redaction look correct
lib/config.ts adds getTelemetryEnabled resolving env var CODEX_AUTH_TELEMETRY_ENABLED with safe boolean default of true; consistent with existing pattern
test/telemetry.test.ts good coverage: redaction, malformed entry rejection, since-window filtering, summary, rotation, concurrent appends, and Windows rename retry — missing a test for readdirSync EBUSY path in listTelemetryFiles

Sequence Diagram

sequenceDiagram
    participant CLI as codex auth CLI
    participant PM as codex-manager.ts
    participant TM as lib/telemetry.ts
    participant FS as ~/.codex/multi-auth/logs/

    CLI->>PM: runCodexMultiAuthCli(args)
    PM->>TM: recordTelemetryEvent(cli.command.start)
    TM->>TM: sanitizeValue(details)
    TM->>TM: queueAppend(task)
    TM->>FS: ensureLogDir()
    TM->>FS: rotateLogsIfNeeded() [async+retry]
    TM->>FS: appendFile(product-telemetry.jsonl)

    PM->>PM: dispatch command (login/report/telemetry/…)

    alt command == "telemetry"
        PM->>TM: queryTelemetryEvents({sinceMs, limit})
        TM->>FS: readdirSync(logDir) ⚠️ sync
        TM->>FS: readFile(*.jsonl) [async]
        TM-->>PM: TelemetryEvent[]
        PM->>TM: summarizeTelemetryEvents(events)
        TM-->>PM: TelemetrySummary
        PM-->>CLI: JSON or human output
    end

    PM->>TM: recordTelemetryEvent(cli.command.finish / cli.command.exception)
    TM->>FS: appendFile(product-telemetry.jsonl)

    note over TM,FS: plugin path (index.ts) fires void recordTelemetryEvent<br/>for: auth_refresh_failed, network_error, server_error,<br/>stream_failover_recovered, accounts_exhausted
Loading

Fix All in Codex

Last reviewed commit: 2ab197f

Context used:

  • Rule from dashboard - What: Every code change must explain how it defends against Windows filesystem concurrency bugs and ... (source)

Add a production-safe Phase 1 visibility capability with local-only telemetry.

- Add redacted JSONL telemetry ledger with rotation and query/summarize helpers
- Add telemetry toggle in config/schema with CODEX_AUTH_TELEMETRY_ENABLED override
- Add codex auth telemetry command with json/text output and filters
- Instrument CLI lifecycle and plugin high-signal failure/recovery branches
- Update docs and tests for telemetry behavior and configuration

Co-authored-by: Codex <noreply@openai.com>
@coderabbitai
Copy link

coderabbitai bot commented Mar 4, 2026

📝 Walkthrough

Walkthrough

adds a local telemetry system that writes product events to ~/.codex/multi-auth/logs/product-telemetry.jsonl, controlled via the CODEX_AUTH_TELEMETRY_ENABLED env var or telemetryEnabled config flag. instruments the plugin and cli to emit lifecycle events, includes a new codex auth telemetry command to query and summarize events, and sanitizes sensitive data before persisting.

Changes

Cohort / File(s) Summary
Documentation
docs/privacy.md, docs/reference/commands.md, docs/reference/settings.md
documents new telemetry feature including product-telemetry.jsonl in canonical local files, new codex auth telemetry command with --json, --since-hours, and --limit flags, plus telemetryEnabled setting and CODEX_AUTH_TELEMETRY_ENABLED env override.
Core telemetry subsystem
lib/telemetry.ts
comprehensive new module (316 lines) defining types (TelemetryEvent, TelemetrySource, TelemetryOutcome), config model with defaults, sanitization for sensitive keys/emails/tokens, queued append writes to .jsonl files, log rotation up to maxFiles, query/filter/summarize operations, and utilities for file management and parsing. masks tokens and redacts fields like accessToken, email before persistence.
Configuration system
lib/config.ts, lib/schemas.ts
adds telemetryEnabled: boolean field to DEFAULT_PLUGIN_CONFIG (defaults true), exports new getter getTelemetryEnabled() that resolves env override, adds schema validation for telemetryEnabled in PluginConfigSchema.
Plugin instrumentation
index.ts
integrates telemetry into plugin (76 lines added): imports getTelemetryEnabled and recordTelemetryEvent, emits events on auth_refresh_failed, network_error, server_error, http_error, stream_failover_recovered/attempt_failed, recovered_after_retry, accounts_exhausted outcomes, gated behind telemetry enabled flag.
CLI instrumentation
lib/codex-manager.ts
extends cli with telemetry support (249 lines net): adds telemetry subcommand with --json, --since-hours, --limit flags, implements runTelemetry to fetch/summarize/display events, wraps command execution in try/catch to emit lifecycle events (start, finish, exceptions), includes emitTelemetry helper and option validation via parsePositiveIntegerOption.
Library exports
lib/index.ts
exports telemetry module types and functions via export * from "./telemetry.js".
Plugin and CLI tests
test/index.test.ts, test/codex-manager-cli.test.ts, test/plugin-config.test.ts, test/schemas.test.ts
adds mocks for recordTelemetryEvent, queryTelemetryEvents, summarizeTelemetryEvents, getTelemetryLogPath; tests telemetry command output in json mode, validates telemetry flags, confirms lifecycle events emitted, adds getTelemetryEnabled to config mock and telemetryEnabled to default config test cases.
Telemetry module tests
test/telemetry.test.ts
new 178-line test suite: validates sanitization of sensitive fields (email, accessToken, nested auth), tests filtering by sinceMs and limit with synthetic log writes, verifies summarization by source/outcome/event with timestamps, confirms log rotation creates numbered archives (.1, .2) when file size exceeds max, includes removeWithRetry helper for cleanup with exponential backoff.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

key review areas requiring attention:

  • concurrency & file i/o: lib/telemetry.ts manages an append queue and serialize writes. verify queueAppend semantics under concurrent calls—does it guarantee order? check for potential race conditions on log rotation and reads during rotation.
  • log rotation edge case: rotation logic in lib/telemetry.ts creates numbered files (.1, .2, etc.) up to maxFiles. confirm boundary behavior when rolling over (e.g., .5 -> .6 when maxFiles=5) and that old archives are pruned correctly. test with maxFiles=1 to catch off-by-one errors.
  • windows path handling: telemetry writes to ~/.codex/multi-auth/logs/product-telemetry.jsonl. verify path resolution on windows (docs mention Windows variations in cleanup). check ensureLogDir uses platform-aware path separators and handles UNC paths.
  • sanitization completeness: lib/telemetry.ts redacts EMAIL_PATTERN, TOKEN_PATTERNS, and SENSITIVE_KEYS. verify patterns catch all token formats your plugin uses (oauth, bearer, api keys, etc.). test recursive sanitization depth guards don't leave nested credentials exposed.
  • missing regression tests: test/telemetry.test.ts does not test queryTelemetryEvents with invalid sinceMs (negative, future dates) or malformed .jsonl entries (incomplete json lines). test/codex-manager-cli.test.ts lacks coverage for --since-hours with edge values (0, negative, very large). add negative test cases.
  • telemetry disabled flag: verify that when telemetryEnabled=false, no writes occur and no side effects leak (especially in index.ts plugin hooks). confirm lib/codex-manager.ts respects disabled state in runTelemetry (should gracefully handle missing log dir).
  • PluginConfig backward compatibility: telemetryEnabled defaults to true—ensure old config blobs without this field parse safely in safeParsePluginConfig and getTelemetryEnabled doesn't error on undefined.
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed title follows conventional commits format (feat: ...) with lowercase imperative summary under 72 chars; clearly describes the main change.
Description check ✅ Passed PR description is comprehensive and well-structured, covering summary, rationale, implementation details, file changes, and validation steps. Post-review updates section documents hardening work with clear testing and risk guidance.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/product-visibility-telemetry

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/reference/commands.md`:
- Around line 41-57: Add an upgrade note and npm-script rollout pointer for the
changed telemetry behavior by updating the commands reference next to the `codex
auth telemetry` entry (and the Common Flags entries for `--since-hours` and
`--limit`): insert a short “Upgrade / Rollout” note linking to the
release/upgrade doc or CHANGELOG and mention the new npm script operators should
run (for example, add a line like “See upgrade notes: <link> and run npm run
<rollout-script> to migrate telemetry automation”), ensuring the note appears
immediately after the telemetry command description so operators see
migration/automation impact.

In `@lib/codex-manager.ts`:
- Around line 4253-4268: The emitTelemetry helper currently calls
recordTelemetryEvent with a void cast which creates a race when the process
returns/throws (e.g., around cli.command.finish / cli.command.exception); make
emitTelemetry return a Promise by removing the void cast and marking it async so
callers can await it, then update call sites that immediately return or throw
(those near the cli.command.finish / cli.command.exception usage) to await
emitTelemetry(...) before returning/throwing to ensure recordTelemetryEvent
completes; reference functions: emitTelemetry and recordTelemetryEvent and the
cli.command.finish / cli.command.exception call sites.
- Around line 2325-2330: The returned events from queryTelemetryEvents are not
guaranteed time-ordered, so before calling summarizeTelemetryEvents and before
building the "recent events" block you must sort the events array by the event
timestamp field (e.g. event.timestamp or event.ts) in ascending order; update
the code around the variable events (result of queryTelemetryEvents) so it
creates a sortedEvents = events.slice().sort((a,b) => a.timestamp - b.timestamp
|| a.ts - b.ts) and then pass sortedEvents into summarizeTelemetryEvents and
into any recent-events slicing logic (referencing the events variable,
summarizeTelemetryEvents function, and the code that constructs the recent
output) to ensure summary.firstTimestamp/lastTimestamp and recent lists are
correct.

In `@lib/telemetry.ts`:
- Around line 172-180: The runtime type guard isTelemetryEvent currently only
checks that timestamp/source/event/outcome are strings, allowing invalid enum
values to pass; update isTelemetryEvent to also validate that record.source and
record.outcome are one of the allowed enum members (use the same enum or allowed
value arrays used in aggregation logic) so malformed source/outcome strings are
rejected, and adjust downstream parsing to skip invalid lines; then add a vitest
regression in test/telemetry.test.ts that feeds malformed telemetry lines
(invalid source/outcome) and asserts they are ignored and do not affect the
summary counters (the aggregation code referenced around the summary logic
should continue to rely on the stricter guard).

In `@test/codex-manager-cli.test.ts`:
- Around line 338-380: Replace the runtime Date.now() usage with Vitest fake
timers to make the since-hours assertion deterministic: call vi.useFakeTimers()
and vi.setSystemTime(fixedNow) before creating the now variable and the events,
use that fixedNow for calculation of expected sinceMs and for creating event
timestamps, run the test (invoking runCodexMultiAuthCli) and then restore timers
with vi.useRealTimers(); reference the now variable, queryTelemetryEventsMock
mock calls/sinceMs, and runCodexMultiAuthCli to locate where to apply the
fake-timer setup and teardown.
- Around line 403-427: Add two regression test cases to cover the missing
telemetry branches: (1) a test that sets telemetryEnabled=false (or mocks the
telemetry gate used by runCodexMultiAuthCli) and asserts
recordTelemetryEventMock is not called for "cli.command.start" /
"cli.command.finish", and (2) a test that forces the command handler invoked by
runCodexMultiAuthCli(["auth","features"]) to throw (or mock a thrown error) and
asserts recordTelemetryEventMock is called with an objectContaining event
"cli.command.exception" and outcome "error" (including details.command
"features" and the error info); use the same helpers and spies as the existing
test (runCodexMultiAuthCli, recordTelemetryEventMock, logSpy) so the tests
exercise the async emission paths in lib/codex-manager.ts (around
telemetryEnabled and cli.command.exception).
- Around line 24-26: Tests only cover unix-style telemetry log paths via the
getTelemetryLogPathMock; add a Windows-path regression case by making
getTelemetryLogPathMock (or an additional mock used in the same test suite)
return a Windows-style path string (e.g.
"C:\\Users\\you\\logs\\product-telemetry.jsonl") and duplicate the existing
telemetry assertions against that mock to ensure formatting/parsing works on
Windows paths; apply the same addition to the other telemetry test block that
mirrors the assertions so both places use a Windows-path case in addition to the
Unix-like case.

In `@test/index.test.ts`:
- Around line 111-122: The tests enable telemetry and mock recordTelemetryEvent
(recordTelemetryEventMock) but never assert it; add deterministic assertions
that recordTelemetryEventMock was called (and with expected event names/payload
shapes) in the concurrency flow tests and token refresh/error flows referenced
(the tests around the concurrency block and the refresh/error block).
Specifically, inside the test cases that simulate the failure/recovery path
(around the refresh/error flow) and the token refresh race/concurrency tests,
add expectations like
expect(recordTelemetryEventMock).toHaveBeenCalledTimes(...) and
expect(recordTelemetryEventMock).toHaveBeenCalledWith(expect.objectContaining({
event: "<expected-event-name>" })) to lock the telemetry emission behavior; use
the existing mocked recordTelemetryEventMock symbol and keep assertions
deterministic (use toHaveBeenCalledTimes and
toHaveBeenCalledWith/expect.objectContaining) so regression failures surface.

In `@test/telemetry.test.ts`:
- Around line 63-65: The fixture value for the accessToken property in the test
(the object containing email: "user@example.com", accessToken: "...", nested:
{...}) looks like a real API key and triggers secret scanners; update the
accessToken string to a clearly non-secret sentinel (e.g.
"TEST_TOKEN_PLACEHOLDER" or "placeholder-token-do-not-use") in the telemetry
test fixture so the property name accessToken in the test/telemetry.test.ts no
longer resembles a real secret.
- Around line 160-177: Add two deterministic vitest regression tests in
test/telemetry.test.ts: (1) a concurrency test that targets queueAppend
(lib/telemetry.ts:237-262) by invoking recordTelemetryEvent concurrently (e.g.,
Promise.all over many calls) and asserting no data loss/corruption in the main
log file (use getTelemetryLogPath to read contents) to reproduce and prevent
race conditions; (2) a Windows rm-retry test that mocks fs.rm (or
fs.promises.rm) to first throw EPERM/ENOENT errors and then succeed, asserting
the retry branch is exercised and no exception escapes—use vitest mocks/fake
timers to keep the test deterministic and restore mocks after the test.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9eb7ffc1-3af4-4cb7-9374-4a4fa0aafd35

📥 Commits

Reviewing files that changed from the base of the PR and between d36b04f and ea8f5a1.

📒 Files selected for processing (14)
  • docs/privacy.md
  • docs/reference/commands.md
  • docs/reference/settings.md
  • index.ts
  • lib/codex-manager.ts
  • lib/config.ts
  • lib/index.ts
  • lib/schemas.ts
  • lib/telemetry.ts
  • test/codex-manager-cli.test.ts
  • test/index.test.ts
  • test/plugin-config.test.ts
  • test/schemas.test.ts
  • test/telemetry.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (3)
lib/**

⚙️ CodeRabbit configuration file

focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.

Files:

  • lib/config.ts
  • lib/schemas.ts
  • lib/index.ts
  • lib/codex-manager.ts
  • lib/telemetry.ts
docs/**

⚙️ CodeRabbit configuration file

keep README, SECURITY, and docs consistent with actual CLI flags and workflows. whenever behavior changes, require updated upgrade notes and mention new npm scripts.

Files:

  • docs/privacy.md
  • docs/reference/commands.md
  • docs/reference/settings.md
test/**

⚙️ CodeRabbit configuration file

tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

Files:

  • test/plugin-config.test.ts
  • test/telemetry.test.ts
  • test/index.test.ts
  • test/codex-manager-cli.test.ts
  • test/schemas.test.ts
🧬 Code graph analysis (6)
test/plugin-config.test.ts (1)
lib/config.ts (1)
  • getTelemetryEnabled (725-731)
test/telemetry.test.ts (1)
lib/telemetry.ts (6)
  • getTelemetryConfig (230-232)
  • configureTelemetry (226-228)
  • recordTelemetryEvent (238-263)
  • queryTelemetryEvents (265-285)
  • getTelemetryLogPath (234-236)
  • summarizeTelemetryEvents (287-316)
test/codex-manager-cli.test.ts (1)
lib/codex-manager.ts (1)
  • runCodexMultiAuthCli (4230-4317)
lib/codex-manager.ts (2)
lib/telemetry.ts (5)
  • queryTelemetryEvents (265-285)
  • summarizeTelemetryEvents (287-316)
  • getTelemetryLogPath (234-236)
  • TelemetryOutcome (14-14)
  • recordTelemetryEvent (238-263)
lib/config.ts (1)
  • getTelemetryEnabled (725-731)
index.ts (2)
lib/config.ts (1)
  • getTelemetryEnabled (725-731)
lib/telemetry.ts (1)
  • recordTelemetryEvent (238-263)
lib/telemetry.ts (2)
lib/runtime-paths.ts (1)
  • getCodexLogDir (226-228)
lib/logger.ts (2)
  • maskEmail (430-430)
  • getCorrelationId (137-139)
🪛 Gitleaks (8.30.0)
test/telemetry.test.ts

[high] 64-64: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)

🔇 Additional comments (8)
lib/schemas.ts (1)

35-35: schema change is backward-compatible

lib/schemas.ts:35 adds telemetryEnabled as an optional boolean, so existing config documents continue to parse safely.

lib/config.ts (1)

136-136: telemetry setting resolution is consistent

lib/config.ts:136 and lib/config.ts:725-731 correctly implement default-enabled behavior with env override precedence through the existing boolean resolver path.

Also applies to: 725-731

lib/telemetry.ts (1)

106-135: structured redaction flow is solid

lib/telemetry.ts:106-135 applies recursive sanitization and key-based masking before persistence, aligned with lib/logger.ts:429 masking usage for email-safe output.

index.ts (2)

57-58: telemetry helper wiring is clean and correctly gated

the helper uses lib/config.ts:725-731 for enablement and delegates to lib/telemetry.ts:238-263 in a non-blocking way, which keeps request handling isolated from telemetry failure paths.

Also applies to: 103-103, 1164-1177


1522-1527: event points are high-signal and well-scoped

these emit sites focus on failure/recovery/exhaustion boundaries and align with the local-ledger model in lib/telemetry.ts:238-285 without adding noisy per-request tracing.

Also applies to: 1718-1723, 1992-1997, 2117-2122, 2286-2293, 2312-2322, 2413-2420, 2462-2468

lib/index.ts (1)

32-32: public export wiring is clean

lib/index.ts:32 re-exports telemetry APIs cleanly and keeps the module barrel consistent.

test/schemas.test.ts (1)

43-43: good schema coverage update

test/schemas.test.ts:43 correctly extends the full-config happy path for the new telemetryEnabled field.

test/plugin-config.test.ts (1)

751-768: good coverage for telemetry toggle precedence.

this block cleanly validates default/config/env precedence for telemetry enablement and stays deterministic (test/plugin-config.test.ts:751-768, lib/config.ts:724-730).

ndycode and others added 2 commits March 4, 2026 16:48
Replace sync rotation fs calls with async operations wrapped in bounded retry\nfor EBUSY/EPERM/ENOTEMPTY to handle Windows lock contention safely.\nAlso adds a telemetry test that simulates a transient rename lock and\nverifies rotation still succeeds.\n\nCo-authored-by: Codex <noreply@openai.com>
Address outstanding review feedback for telemetry reliability and coverage:
- enforce telemetry source/outcome enum validation while parsing logs
- sort telemetry report events deterministically before summary/recent output
- await CLI telemetry lifecycle writes before command return/throw
- expand CLI/index/telemetry regressions (windows paths, deterministic time, exception paths, concurrent appends)
- document telemetry rollout/automation pointer in command reference

Co-authored-by: Codex <noreply@openai.com>
@greptile-apps
Copy link

greptile-apps bot commented Mar 5, 2026

Additional Comments (1)

lib/telemetry.ts, line 954
readdirSync in listTelemetryFiles — sync, no Windows retry

the pr upgraded rotateLogsIfNeeded to use runFileOperationWithRetry for fs.stat, fs.unlink, and fs.rename, but listTelemetryFiles still calls readdirSync — a sync operation with no retry. on windows desktops, antivirus scanners briefly lock the log directory and readdirSync will throw EBUSY or EPERM. the outer try/catch silently swallows the error and returns [], so queryTelemetryEvents returns no events and the telemetry command prints "No telemetry events found" — a silent data loss with no user feedback.

this is the same class of problem the pr explicitly addressed elsewhere. replace readdirSync with an async fs.readdir wrapped in runFileOperationWithRetry:

async function listTelemetryFiles(): Promise<string[]> {
  try {
    const entries = await runFileOperationWithRetry(() =>
      fs.readdir(telemetryConfig.logDir)
    );
    const sortable = entries
      .map((name) => {
        const archiveIndex = parseArchiveSuffix(name);
        if (archiveIndex === null) return null;
        return { name, archiveIndex };
      })
      .filter((value): value is { name: string; archiveIndex: number } => value !== null)
      .sort((left, right) => right.archiveIndex - left.archiveIndex);
    return sortable.map((entry) => join(telemetryConfig.logDir, entry.name));
  } catch {
    return [];
  }
}

queryTelemetryEvents would then need await listTelemetryFiles(). a vitest coverage case simulating an EBUSY on fs.readdir (similar to the existing rename retry test) should accompany this fix.

@ndycode ndycode changed the base branch from main to dev March 6, 2026 01:10
ndycode added a commit that referenced this pull request Mar 6, 2026
Co-authored-by: Codex <noreply@openai.com>
@ndycode
Copy link
Owner Author

ndycode commented Mar 6, 2026

Merged via integration PR to safely combine multiple approved changes.

@ndycode ndycode closed this Mar 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant