Skip to content

Reworked quality gates and testing#23

Merged
ossirytk merged 3 commits intomainfrom
vibing-quality
Mar 28, 2026
Merged

Reworked quality gates and testing#23
ossirytk merged 3 commits intomainfrom
vibing-quality

Conversation

@ossirytk
Copy link
Copy Markdown
Owner

@ossirytk ossirytk commented Mar 28, 2026

  • Fix pytest exit code handling in scripts/quality/capture_code_metrics.py (fail on any non-zero code, distinct messages for coverage vs test/runner failures)
  • Fix lazy evaluation in scripts/rag/analyze_rag_text_enrichment.py (use lambdas for short-circuit classifier evaluation)
  • Lint passes (ruff)
  • Tests pass (15/15 for capture_code_metrics)
  • Code review completed
  • Security scan: 0 alerts

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Previously both triggers used branches ['**'], running the workflow twice on every PR push. Now push is scoped to main only.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 28, 2026 08:02
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Reworks CI quality gates to avoid duplicate workflow runs on PR pushes and introduces a new “code quality metrics” gate (coverage + complexity) along with additional unit tests to improve coverage across utility/script components.

Changes:

  • Scope quality_gate.yml push trigger to main only, and add a new “capture code quality metrics” CI step.
  • Add pytest-cov, radon, and xenon plus a new scripts/quality/capture_code_metrics.py snapshot+gate command and coverage config in pyproject.toml.
  • Add multiple new unit-test modules covering utility functions and script entrypoints; refactor some script internals for testability.

Reviewed changes

Copilot reviewed 21 out of 24 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
uv.lock Adds dev dependencies for coverage/complexity tooling (pytest-cov, radon, xenon, coverage, etc.).
pyproject.toml Registers capture-code-metrics, adds dev deps, adds coverage config, and adds Ruff per-file ignores for tests.
.github/workflows/quality_gate.yml Avoids duplicate runs by scoping push to main; runs the new code-metrics capture/gate step.
scripts/quality/capture_code_metrics.py New command to run pytest-cov + xenon/radon, enforce gates, and write snapshots to logs/code_quality/.
scripts/rag/push_rag_data.py Refactors fingerprint mismatch detection into a helper for readability/testability.
scripts/rag/migrate_collection_embedding.py Extracts document row cleaning into _clean_document_rows.
scripts/rag/manage_collections_core_evaluation.py Extracts embedder/db-cache setup into _setup_embedder_and_db_cache.
scripts/rag/manage_collections_commands_lint.py Extracts lint-file processing into a helper (but currently introduces a Click command wiring issue).
scripts/rag/manage_collections_commands_eval.py Extracts report-writing and metric-gate checks into helpers.
scripts/rag/manage_collections_commands_collections.py Extracts bulk deletion confirmation into _confirm_bulk_deletion.
scripts/rag/analyze_rag_text_validation.py Extracts validation and duplicate-UUID checks into helpers.
scripts/rag/analyze_rag_text_enrichment.py Refactors entity category inference and metadata enrichment into smaller helpers.
scripts/rag/analyze_rag_coverage.py Refactors report formatting into section helpers.
tests/test_web_app_utils.py New unit tests for web_app pure utility functions.
tests/test_script_entrypoints.py New unit tests validating wrapper-script __main__ behavior via runpy.
tests/test_retrieval_keys.py New unit tests for pure functions in core/retrieval_keys.py.
tests/test_rag_validation.py New unit tests for RAG metadata validation and error paths.
tests/test_quality_gate.py New unit tests for lightweight helper functions in scripts/quality_gate.py.
tests/test_capture_code_metrics.py New unit tests for parsing helpers in capture_code_metrics.
tests/test_persona_drift.py Adds additional edge-case tests and deduplicates anchor setup.
logs/code_quality/latest.json Adds a committed “latest” code quality snapshot.
logs/code_quality/history.csv Adds a committed history of code quality snapshots.
.gitignore Ignores coverage artifacts and ignores logs/* except logs/code_quality/.

Comment thread scripts/quality/capture_code_metrics.py Outdated
Comment on lines +192 to +195
# pytest exits 2 when tests fail; exit 1 means only the coverage threshold was missed
if cov_returncode not in (0, 2):
click.echo(f"FAIL: coverage threshold not met ({coverage['total_pct']}% < {_COVERAGE_THRESHOLD}%)", err=True)
gate_failed = True
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

capture_code_metrics treats pytest return codes inconsistently (comment says tests fail with exit code 2, and the gate only fails when return code is not in (0, 2)). Pytest’s exit codes differ (e.g., tests failing is typically 1; 2 is interrupted), and pytest-cov may also use a non-zero code for --cov-fail-under. As written, some failing states can be misclassified and potentially not fail the gate. Consider: (1) failing the gate on any non-zero return code, and (2) emitting distinct error messages for “tests failed” vs “coverage below threshold” based on the actual exit code or parsed output.

Suggested change
# pytest exits 2 when tests fail; exit 1 means only the coverage threshold was missed
if cov_returncode not in (0, 2):
click.echo(f"FAIL: coverage threshold not met ({coverage['total_pct']}% < {_COVERAGE_THRESHOLD}%)", err=True)
gate_failed = True
coverage_failed = coverage["total_pct"] < coverage["threshold"]
if cov_returncode != 0:
if coverage_failed:
click.echo(
f"FAIL: coverage threshold not met ({coverage['total_pct']}% < {coverage['threshold']}%)",
err=True,
)
else:
click.echo(
f"FAIL: pytest exited with code {cov_returncode}; tests or runner failed (see output above).",
err=True,
)
gate_failed = True
elif coverage_failed:
click.echo(
f"FAIL: coverage threshold not met ({coverage['total_pct']}% < {coverage['threshold']}%)",
err=True,
)
gate_failed = True

Copilot uses AI. Check for mistakes.
Comment on lines +155 to +162
_classify_technology_primary(entity_lower),
_classify_faction(entity_lower, context_text, has_location_hint_in_entity),
_classify_location(context_text, has_location_hint_in_entity),
_classify_event(entity_lower, context_text),
_classify_technology_secondary(entity_stripped, context_text),
_classify_character(entity_stripped, context_text),
)
for result in classifiers:
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

infer_category_with_confidence builds a tuple of classifier results (classifiers = (...)), which eagerly calls every classifier even though only the first match is used. This removes the short-circuit behavior the previous if/elif chain had and adds unnecessary regex/any() work on every call. Consider iterating through callables (or calling each classifier inside the loop) so evaluation stops at the first match.

Suggested change
_classify_technology_primary(entity_lower),
_classify_faction(entity_lower, context_text, has_location_hint_in_entity),
_classify_location(context_text, has_location_hint_in_entity),
_classify_event(entity_lower, context_text),
_classify_technology_secondary(entity_stripped, context_text),
_classify_character(entity_stripped, context_text),
)
for result in classifiers:
lambda: _classify_technology_primary(entity_lower),
lambda: _classify_faction(
entity_lower,
context_text,
has_location_hint_in_entity,
),
lambda: _classify_location(context_text, has_location_hint_in_entity),
lambda: _classify_event(entity_lower, context_text),
lambda: _classify_technology_secondary(entity_stripped, context_text),
lambda: _classify_character(entity_stripped, context_text),
)
for classifier in classifiers:
result = classifier()

Copilot uses AI. Check for mistakes.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@ossirytk
Copy link
Copy Markdown
Owner Author

@copilot apply changes based on the comments in this thread

…sifier evaluation

Agent-Logs-Url: https://github.com/ossirytk/light-chat/sessions/fd06894f-a153-436d-8653-dfae864a9e18

Co-authored-by: ossirytk <136164501+ossirytk@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 28, 2026

@copilot apply changes based on the comments in this thread

Applied both changes in commit 89a376f:

  1. capture_code_metrics.py: Replaced the not in (0, 2) check with a proper != 0 gate. Any non-zero pytest exit now fails the gate. When cov_returncode != 0, it emits a distinct error: "coverage threshold not met" if coverage is also below threshold (the typical cause), or "pytest exited with code X; tests or runner failed" otherwise. A coverage_failed + returncode-0 path also fails the gate.

  2. analyze_rag_text_enrichment.py: Wrapped each classifier call in a lambda so they are evaluated lazily. The loop now calls classifier() and stops at the first non-None result, restoring the original short-circuit behaviour.

@ossirytk ossirytk merged commit 2d84cd9 into main Mar 28, 2026
@ossirytk ossirytk deleted the vibing-quality branch March 28, 2026 08:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants