Add remaining Network SDKStats short-interval signals#156
Merged
JacksonWeber merged 4 commits intoJun 2, 2026
Conversation
…eption, duration Extends PR microsoft#145 with the remaining Network SDKStats signals per the Application Insights SDKStats spec: - Request_Failure_Count (keyed by endpoint/host/statusCode) - Retry_Count (keyed by endpoint/host/statusCode) - Throttle_Count (keyed by endpoint/host/statusCode) - Exception_Count (keyed by endpoint/host/exceptionType) - Request_Duration (avg per endpoint/host) A365 exporter classifies HTTP responses (success/retry/throttle/failure) per spec and records duration on every attempt; thrown fetch errors are bucketed into the Exception_Count metric. OTLP wrappers record duration on every export and emit an Exception_Count for FAILED results. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR completes the SDKStats “Network” short-interval metric set by adding failure/retry/throttle/exception counters plus a per-interval average request duration metric, and wiring them into both the A365 exporter and the OTLP exporter wrappers.
Changes:
- Expanded
sdkstats/networkStatswith five new counters, a duration accumulator (sum/count → per-interval average on drain), and a sharedclassifyStatusCodehelper. - Updated A365 exporter and OTLP wrapper decorators to record the new signals (including per-attempt duration for A365 retries).
- Added/extended unit tests covering new recorders, duration averaging, status bucketing, and exporter wiring.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
src/sdkstats/networkStats.ts |
Adds new network metric constants, counters, duration averaging, and HTTP status classification. |
src/sdkstats/metrics.ts |
Registers the additional observable gauges and maps variable-length keys to attributes via keyAttributes. |
src/sdkstats/otlpWrapper.ts |
Records duration for all exports and emits a stable Exception_Count label on non-success results. |
src/sdkstats/index.ts |
Re-exports the new metric names, recorders, and classifyStatusCode. |
src/a365/exporter/Agent365Exporter.ts |
Records duration + status/exception counters per request attempt; adds exception-type bucketing helper. |
test/internal/unit/sdkstats/networkStats.test.ts |
Adds unit tests for new recorders, duration averaging, and status bucketing. |
test/internal/unit/sdkstats/metrics.test.ts |
Validates OTel metric emission (attributes + avg duration) for the new signals. |
test/internal/unit/sdkstats/otlpWrapper.test.ts |
Verifies OTLP wrapper emits exception count + duration on FAILED and duration on SUCCESS. |
test/internal/unit/a365/agent365NetworkStats.test.ts |
Adds A365 exporter wiring tests for failure/retry/throttle/exception/duration metrics. |
Comments suppressed due to low confidence (1)
src/sdkstats/otlpWrapper.ts:89
wrapExportdoesn't handle the case where the wrapped exporter throws synchronously (before invoking the callback). In that scenario the wrapper won't recordRequest_Duration/Exception_Countand the exception will escape to callers. Consider wrappinginner(settle)in a try/catch, recording duration/exception, and invokingresultCallbackwith a FAILEDExportResult(including the caught error) to keep wrapper behavior consistent with its docstring.
function wrapExport<T>(
host: string,
inner: (resultCallback: (result: ExportResult) => void) => void,
resultCallback: (result: ExportResult) => void,
_items: T,
): void {
const start = Date.now();
const settle = (result: ExportResult): void => {
recordDuration(OTLP_ENDPOINT_CATEGORY, host, Date.now() - start);
if (result.code === ExportResultCode.SUCCESS) {
recordSuccess(OTLP_ENDPOINT_CATEGORY, host);
} else {
recordException(OTLP_ENDPOINT_CATEGORY, host, OTLP_FAILED_EXCEPTION_TYPE);
}
resultCallback(result);
};
inner(settle);
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- otlpWrapper.wrapExport: catch synchronous throws from inner exporter; record Request_Duration + Exception_Count and invoke resultCallback with FAILED ExportResult so wrapper behavior matches its docstring. - Add unit test covering the synchronous-throw path. - a365NetworkStats test: rename misleading '429 throttle' test and remove inaccurate inline comment; the test now exercises only 439 (a pure throttle status) and documents the actual classifyStatusCode ordering (THROTTLE_STATUSES checked first; 429 is retry-only). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
Author
|
Also addressed the suppressed-confidence finding about
Full unit suite: 844 passed / 5 todo / 0 failed. |
lzchen
reviewed
May 28, 2026
lzchen
reviewed
May 28, 2026
The aep-health-and-standards/Telemetry-Collection-Spec sdkstats spec was updated (#938, #942) to require non-Breeze exporters report network signals per the OTLP/HTTP response specification: success/failure/retry/throttle counts with the HTTP status code dimension, falling back to Exception_Count only when no response code is available. - Classify FAILED OTLP exports using the error surfaced by the upstream delegate: numeric HTTP status (100-599) on OTLPExporterError -> Request_Failure_Count (or Retry_Count for 429/502/503/504); the delegate's synthetic 'Export failed with retryable status' message -> Retry_Count with statusCode='unknown' (status is lost upstream); network/timeout errors -> Exception_Count with a bounded type (Timeout/Network/Client exception). - Add forceFlush forwarder to NetworkStatsLogExporter to satisfy the LogRecordExporter interface (pre-existing tsc error on the branch). - Extend unit tests for the new classification paths and the bounded exceptionType labels. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Resolve conflicts in Agent365Exporter.ts (combine SDKStats imports with OpenTelemetryConstants import) and otlpWrapper.ts (keep extended doc comment, drop duplicate forceFlush). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
lzchen
approved these changes
Jun 2, 2026
rads-1996
reviewed
Jun 2, 2026
rads-1996
reviewed
Jun 2, 2026
JacksonWeber
added a commit
that referenced
this pull request
Jun 2, 2026
#165) * refactor(sdkstats): consolidate shared constants under sdkstats/constants.ts Addresses Radhika's follow-up suggestions on PR #156: - Move OTLP wrapper magic constants (endpoint category, unknown-status sentinel, retryable-network error codes, exception-type labels, OTLP retryable HTTP statuses) into a new src/sdkstats/constants.ts. - Move the wire-format metric-name constants (REQUEST_SUCCESS_NAME / REQUEST_FAILURE_NAME / REQUEST_DURATION_NAME / RETRY_COUNT_NAME / THROTTLE_COUNT_NAME / EXCEPTION_COUNT_NAME), the NETWORK_METRIC_NAMES tuple, and the HTTP status-code buckets out of networkStats.ts into the same constants module. networkStats.ts re-exports them for backwards compatibility. - Have the A365 exporter use the shared A365_ENDPOINT_CATEGORY label and the EXC_TIMEOUT/EXC_NETWORK/EXC_CLIENT exceptionType constants instead of duplicating the literal strings. - Document why we cannot just import StatsbeatCounter from @azure/monitor-opentelemetry-exporter (the enum lives at dist/{esm,commonjs}/export/statsbeat/types and the package's exports field only publishes '.' and './package.json', so the enum is not part of the public surface; we mirror its values and keep them in lockstep). All 923 tests pass, lint clean, build green. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs(sdkstats/constants): fix broken @link paths and clarify why StatsbeatCounter cannot be imported - {@link} paths were written as if from a parent of src/sdkstats/; corrected to be relative to src/sdkstats/constants.ts itself (./networkStats, ./otlpWrapper, ../a365/exporter/Agent365Exporter). - Expanded the StatsbeatCounter rationale with the exact TS2307 error produced under our moduleResolution=NodeNext config and pointed at the upstream Azure SDK repo for the public-export ask. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * test(sdkstats): consume new shared constants in tests instead of duplicating string literals Updates the four SDKStats test files (networkStats, otlpWrapper, metrics, agent365NetworkStats) to import and use the constants now exposed from sdkstats/constants.ts (A365_ENDPOINT_CATEGORY, OTLP_ENDPOINT_CATEGORY, OTLP_UNKNOWN_STATUS, OTLP_HTTP_RETRYABLE_STATUSES, EXC_TIMEOUT/NETWORK/CLIENT) instead of duplicating the literal strings inline. The single test that intentionally pins the wire-format metric-name strings to their literal values (networkStats.test.ts \�xposes all six SDKStats network metric names\) is preserved so a constant rename still fails loudly. Also rewrites the OTLP_HTTP_RETRYABLE_STATUSES iteration to drive the loop directly from the shared set so adding/removing a status updates both the wrapper and the test in lockstep. 923 tests pass, lint clean, build green. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Follows up #145 by adding the remaining Network SDKStats short-interval metrics defined in the AppInsights SDKStats spec, aligned with the updated sdkstats.md (PRs #938 and #942).
New metrics
Existing Request_Success_Count is unchanged.
A365 status code classification (Breeze HTTP semantics)
Per the spec,
endpoint=�365follows the same HTTP code set asendpoint=�reeze:OTLP status code classification (OTLP/HTTP semantics)
The updated spec requires OTLP exporters report HTTP status codes per the OTLP/HTTP response section. The upstream
@opentelemetry/otlp-exporter-basedelegate exposes a status code onOTLPExporterError.codeonly for non-retryable failures; retryable HTTP responses (429/502/503/504) are collapsed into a synthetic message with the original code discarded. The wrapper classifies as follows:Request_Failure_Count(orRetry_Countfor 429/502/503/504 if the failure path still carries it).Retry_CountwithstatusCode="unknown"(status is lost upstream; spec dimension preserved).AbortError/TimeoutError/ "Request timed out" ->Exception_Count(exceptionType="Timeout exception").TypeErroror Node socket errors (ECONNREFUSED/ENOTFOUND/...) ->Exception_Count(exceptionType="Network exception").Exception_Count(exceptionType="Client exception").Throttle (
429 + Retry-After) is intentionally bucketed as retry rather than throttle because the upstream delegate does not surface response headers; over-counting throttle would distort customer panic signals.Wiring
classifyStatusCodedispatches to the right recorder. Caught exceptions are bucketed (Timeout / Network / Client) mirroring the AzMonExceptionTypeenum.Request_Durationis recorded; failures are routed through the OTLP-aware classifier described above. Also addsforceFlushtoNetworkStatsLogExporterso it conforms to the currentLogRecordExporterinterface.Tests
classifyStatusCodebuckets, A365 failure/retry/throttle/exception/duration paths, and the OTLP wrapper's failure / retry / retryable-loss / network / timeout / non-HTTP-code branches.