Skip to content

feat: add metric name validation#2911

Merged
markphelps merged 3 commits intomainfrom
md/metric-name-validation
Apr 8, 2026
Merged

feat: add metric name validation#2911
markphelps merged 3 commits intomainfrom
md/metric-name-validation

Conversation

@michaeldwan
Copy link
Copy Markdown
Member

Summary

Add validation for metric names in record_metric() and delete operations to ensure metric names are valid for downstream systems (OTel, Prometheus, JSON).

Validation Rules

  • Each segment must start with a letter (a-z, A-Z) and end with a letter or digit
  • Segments can contain letters, digits, and underscores (_)
  • Segments cannot start or end with underscores
  • Segments cannot contain consecutive underscores (__)
  • Use dots (.) to create nested objects (e.g., timing.inference)
  • Maximum 128 characters total
  • Maximum 4 dot-separated segments
  • Cannot be predict_time (reserved by runtime)
  • Cannot start with cog. (reserved for system metrics)

Changes

  • crates/coglet-python/src/metric_scope.rs: Added validate_metric_name() and validate_metric_name_for_delete() with 18 unit tests
  • docs/python.md: Added "Naming rules" section under Metrics
  • python/cog/predictor.py: Updated docstring for record_metric()
  • integration-tests/tests/coglet_metrics.txtar: Removed predict_time user-override test
  • integration-tests/tests/coglet_metrics_validation.txtar: New integration test for validation errors
  • crates/coglet-python/coglet/_sdk/__init__.pyi: Updated docstrings

Breaking Change

Users can no longer set predict_time or use the cog. prefix in metric names. Previously, predict_time was silently overridden by the runtime. Now it raises a ValueError with a clear message.

Copy link
Copy Markdown

@ask-bonk ask-bonk bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metric name validation logic looks solid with good test coverage. However, there's a significant code duplication issue between validate_metric_name() and validate_metric_name_for_delete() - they share ~90 lines of identical validation logic.

I recommend refactoring to extract the common validation logic into a shared helper function, with only the reserved name checks varying between the two public functions.

The duplication spans lines 407-498 (validate_metric_name_for_delete) which is nearly identical to lines 293-396 (validate_metric_name). The only meaningful difference is that validate_metric_name checks reserved names (predict_time and cog.* prefix) while validate_metric_name_for_delete skips those checks.

@ask-bonk
Copy link
Copy Markdown

ask-bonk bot commented Apr 7, 2026

Posted review identifying the ~90 line code duplication between validate_metric_name() and validate_metric_name_for_delete() in metric_scope.rs. The PR adds solid validation logic with good test coverage, but should refactor to extract common validation into a shared helper function.

github run

Add validation for metric names in record_metric() and delete operations.

Rules:
- Each segment must start with letter, end with letter/digit
- Only letters, digits, underscores allowed
- No leading/trailing/consecutive underscores
- No empty segments (leading/trailing/consecutive dots)
- Max 128 characters, max 4 segments
- Reserved: predict_time and cog.* prefix

Includes:
- validate_metric_name() and validate_metric_name_for_delete()
- 18 unit tests covering all rules
- Updated documentation in docs/python.md
- Integration test for validation errors
@michaeldwan michaeldwan force-pushed the md/metric-name-validation branch from 2ef5c1d to a641ec7 Compare April 7, 2026 17:47
@ask-bonk
Copy link
Copy Markdown

ask-bonk bot commented Apr 7, 2026

The code duplication issue has been addressed. The author extracted the common validation logic into a shared helper function validate_metric_name_structure() (lines 293-385), which is now called by both validate_metric_name() and validate_metric_name_for_delete(). This eliminates the ~90 line duplication that was previously identified.

LGTM

github run

@ask-bonk
Copy link
Copy Markdown

ask-bonk bot commented Apr 7, 2026

LGTM

github run

@ask-bonk
Copy link
Copy Markdown

ask-bonk bot commented Apr 7, 2026

LGTM

github run

@michaeldwan michaeldwan marked this pull request as ready for review April 7, 2026 20:41
@michaeldwan michaeldwan requested a review from a team as a code owner April 7, 2026 20:41
@michaeldwan michaeldwan enabled auto-merge April 7, 2026 20:49
@michaeldwan michaeldwan added this pull request to the merge queue Apr 8, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 8, 2026
@markphelps markphelps added this pull request to the merge queue Apr 8, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 8, 2026
@markphelps markphelps merged commit cec2644 into main Apr 8, 2026
40 checks passed
@markphelps markphelps deleted the md/metric-name-validation branch April 8, 2026 18:35
markphelps added a commit that referenced this pull request Apr 8, 2026
…icate/cog into mphelps/push-phase-progress

* 'mphelps/push-phase-progress' of https://github.com/replicate/cog: (95 commits)
  feat: add metric name validation (#2911)
  Rename `cog run` to `cog exec` (#2916)
  chore(deps): bump github.com/google/go-containerregistry (#2884)
  fix: replace removed libgl1-mesa-glx in tensorflow integration test (#2914)
  ci: enforce stub freshness in CI, fix existing stub drift (#2912)
  feat: add schema-compare command to test harness (#2891)
  chore(deps): bump uuid from 1.22.0 to 1.23.0 in /crates (#2887)
  chore(deps): bump github.com/hashicorp/go-version from 1.7.0 to 1.9.0 (#2909)
  chore(deps): bump insta from 1.46.3 to 1.47.2 in /crates (#2908)
  fix: support list[X] | None inputs + integration tests for PEP 604 union File/Path coercion (#2882)
  ci: exclude Dependabot PRs from auto-code review (#2910)
  chore(deps): bump actions/checkout from 4 to 6 (#2904)
  chore(deps): bump github.com/testcontainers/testcontainers-go/modules/registry (#2886)
  fix: metrics bugs in coglet prediction server (#2896)
  Bump version to 0.17.2 (#2903)
  fix(coglet): propagate metric scope to async event loop thread (#2902)
  chore: remove unnecessary nolint directive in test (#2803)
  feat(coglet): add Sentry error reporting for infrastructure errors (#2865)
  fix: homebrew cask postflight xattr references wrong binary name (#2899)
  fix: include custom metrics in cog predict --json output (#2897)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants