Skip to content

feat: add operational counters for sampling, requirements, and tools (#467)#883

Open
ajbozarth wants to merge 2 commits intogenerative-computing:mainfrom
ajbozarth:feat/operational-counters-467
Open

feat: add operational counters for sampling, requirements, and tools (#467)#883
ajbozarth wants to merge 2 commits intogenerative-computing:mainfrom
ajbozarth:feat/operational-counters-467

Conversation

@ajbozarth
Copy link
Copy Markdown
Contributor

@ajbozarth ajbozarth commented Apr 17, 2026

Misc PR

Type of PR

  • Bug Fix
  • New Feature
  • Documentation
  • Other

Description

Adds six new OpenTelemetry counters as part of the telemetry epic (#443),
giving operators visibility into retry behaviour, validation failure rates,
and tool call health alongside the existing LLM-level metrics:

  • mellea.sampling.attempts / mellea.sampling.successes / mellea.sampling.failures — tagged by strategy class name
  • mellea.requirement.checks / mellea.requirement.failures — tagged by requirement class name and failure reason
  • mellea.tool.calls — tagged by tool name and status ("success" / "failure")

Follows the established plugin pattern: lazy-init counter globals + record_*
helpers in metrics.py, consumed by three new Plugin classes in
metrics_plugins.py (SamplingMetricsPlugin, RequirementMetricsPlugin,
ToolMetricsPlugin) that hook into existing hook events in FIRE_AND_FORGET
mode. No metric calls in business-logic files.

Also extends SamplingIterationPayload and SamplingLoopEndPayload with a
strategy_name field (backward-compatible, defaults to "") so plugins can
tag counters with the strategy class name.

Testing

  • Tests added to the respective file if code was changed
  • New code has 100% coverage if code as added
  • Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

Attribution

  • AI coding assistants used

…enerative-computing#467)

Adds six new OpenTelemetry counters giving operators visibility into
retry behaviour, validation failure rates, and tool call health:
mellea.sampling.attempts/successes/failures, mellea.requirement.checks/failures,
and mellea.tool.calls.

Follows the established lazy-init globals + record_* helpers + Plugin hooks
pattern. Extends SamplingIterationPayload and SamplingLoopEndPayload with a
strategy_name field so plugins can tag counters by strategy class.

Assisted-by: Claude Code
Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>
@ajbozarth ajbozarth self-assigned this Apr 17, 2026
@github-actions github-actions bot added the enhancement New feature or request label Apr 17, 2026
@github-actions
Copy link
Copy Markdown
Contributor

The PR description has been updated. Please fill out the template for your PR to be reviewed.

@ajbozarth ajbozarth changed the title feat: add operational counters for sampling, requirements, and tools … feat: add operational counters for sampling, requirements, and tools (#467) Apr 17, 2026
@ajbozarth ajbozarth marked this pull request as ready for review April 17, 2026 23:03
@ajbozarth ajbozarth requested a review from a team as a code owner April 17, 2026 23:03
PeriodicExportingMetricReader background threads (60 s default) would
fire after pytest closed stdout once the suite crossed the 60 s mark,
causing "I/O operation on closed file" and OTLP UNAVAILABLE errors.

Assisted-by: Claude Code
Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>
@ajbozarth
Copy link
Copy Markdown
Contributor Author

This is parallel work to #882 which adds the other remaining metrics telemetry and whichever merges second will need a manual merge to deal with the overlapping git diffs.

@akihikokuroda
Copy link
Copy Markdown
Member

This looks good to me as the implementation of the issue #467. I wonder how these counters are useful. They are grouped by strategies and requirements. They may not give useful insight without some more fine groupings.

Copy link
Copy Markdown
Member

@akihikokuroda akihikokuroda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I have a general comment about the issue that this PR implements.

@ajbozarth
Copy link
Copy Markdown
Contributor Author

This looks good to me as the implementation of the issue #467. I wonder how these counters are useful. They are grouped by strategies and requirements. They may not give useful insight without some more fine groupings.

@jakelorocco @psschwei I actually wondered about this while implementing. I thought the idea was interesting and valuable, but my work did not create an example of how it could be used (which maybe I should add?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add counters for operational metrics specific to Mellea's sampling and validation loops

2 participants