Conversation
There was a problem hiding this comment.
The metric name validation logic looks solid with good test coverage. However, there's a significant code duplication issue between validate_metric_name() and validate_metric_name_for_delete() - they share ~90 lines of identical validation logic.
I recommend refactoring to extract the common validation logic into a shared helper function, with only the reserved name checks varying between the two public functions.
The duplication spans lines 407-498 (validate_metric_name_for_delete) which is nearly identical to lines 293-396 (validate_metric_name). The only meaningful difference is that validate_metric_name checks reserved names (predict_time and cog.* prefix) while validate_metric_name_for_delete skips those checks.
|
Posted review identifying the ~90 line code duplication between |
Add validation for metric names in record_metric() and delete operations. Rules: - Each segment must start with letter, end with letter/digit - Only letters, digits, underscores allowed - No leading/trailing/consecutive underscores - No empty segments (leading/trailing/consecutive dots) - Max 128 characters, max 4 segments - Reserved: predict_time and cog.* prefix Includes: - validate_metric_name() and validate_metric_name_for_delete() - 18 unit tests covering all rules - Updated documentation in docs/python.md - Integration test for validation errors
2ef5c1d to
a641ec7
Compare
|
The code duplication issue has been addressed. The author extracted the common validation logic into a shared helper function LGTM |
|
LGTM |
|
LGTM |
…icate/cog into mphelps/push-phase-progress * 'mphelps/push-phase-progress' of https://github.com/replicate/cog: (95 commits) feat: add metric name validation (#2911) Rename `cog run` to `cog exec` (#2916) chore(deps): bump github.com/google/go-containerregistry (#2884) fix: replace removed libgl1-mesa-glx in tensorflow integration test (#2914) ci: enforce stub freshness in CI, fix existing stub drift (#2912) feat: add schema-compare command to test harness (#2891) chore(deps): bump uuid from 1.22.0 to 1.23.0 in /crates (#2887) chore(deps): bump github.com/hashicorp/go-version from 1.7.0 to 1.9.0 (#2909) chore(deps): bump insta from 1.46.3 to 1.47.2 in /crates (#2908) fix: support list[X] | None inputs + integration tests for PEP 604 union File/Path coercion (#2882) ci: exclude Dependabot PRs from auto-code review (#2910) chore(deps): bump actions/checkout from 4 to 6 (#2904) chore(deps): bump github.com/testcontainers/testcontainers-go/modules/registry (#2886) fix: metrics bugs in coglet prediction server (#2896) Bump version to 0.17.2 (#2903) fix(coglet): propagate metric scope to async event loop thread (#2902) chore: remove unnecessary nolint directive in test (#2803) feat(coglet): add Sentry error reporting for infrastructure errors (#2865) fix: homebrew cask postflight xattr references wrong binary name (#2899) fix: include custom metrics in cog predict --json output (#2897) ...
Summary
Add validation for metric names in
record_metric()and delete operations to ensure metric names are valid for downstream systems (OTel, Prometheus, JSON).Validation Rules
a-z,A-Z) and end with a letter or digit_)__).) to create nested objects (e.g.,timing.inference)predict_time(reserved by runtime)cog.(reserved for system metrics)Changes
crates/coglet-python/src/metric_scope.rs: Addedvalidate_metric_name()andvalidate_metric_name_for_delete()with 18 unit testsdocs/python.md: Added "Naming rules" section under Metricspython/cog/predictor.py: Updated docstring forrecord_metric()integration-tests/tests/coglet_metrics.txtar: Removed predict_time user-override testintegration-tests/tests/coglet_metrics_validation.txtar: New integration test for validation errorscrates/coglet-python/coglet/_sdk/__init__.pyi: Updated docstringsBreaking Change
Users can no longer set
predict_timeor use thecog.prefix in metric names. Previously,predict_timewas silently overridden by the runtime. Now it raises aValueErrorwith a clear message.