Reworked quality gates and testing by ossirytk · Pull Request #23 · ossirytk/light-chat

ossirytk · 2026-03-28T08:02:33Z

Fix pytest exit code handling in scripts/quality/capture_code_metrics.py (fail on any non-zero code, distinct messages for coverage vs test/runner failures)
Fix lazy evaluation in scripts/rag/analyze_rag_text_enrichment.py (use lambdas for short-circuit classifier evaluation)
Lint passes (ruff)
Tests pass (15/15 for capture_code_metrics)
Code review completed
Security scan: 0 alerts

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Previously both triggers used branches ['**'], running the workflow twice on every PR push. Now push is scoped to main only. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Reworks CI quality gates to avoid duplicate workflow runs on PR pushes and introduces a new “code quality metrics” gate (coverage + complexity) along with additional unit tests to improve coverage across utility/script components.

Changes:

Scope quality_gate.yml push trigger to main only, and add a new “capture code quality metrics” CI step.
Add pytest-cov, radon, and xenon plus a new scripts/quality/capture_code_metrics.py snapshot+gate command and coverage config in pyproject.toml.
Add multiple new unit-test modules covering utility functions and script entrypoints; refactor some script internals for testability.

Reviewed changes

Copilot reviewed 21 out of 24 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`uv.lock`	Adds dev dependencies for coverage/complexity tooling (`pytest-cov`, `radon`, `xenon`, `coverage`, etc.).
`pyproject.toml`	Registers `capture-code-metrics`, adds dev deps, adds coverage config, and adds Ruff per-file ignores for tests.
`.github/workflows/quality_gate.yml`	Avoids duplicate runs by scoping push to `main`; runs the new code-metrics capture/gate step.
`scripts/quality/capture_code_metrics.py`	New command to run pytest-cov + xenon/radon, enforce gates, and write snapshots to `logs/code_quality/`.
`scripts/rag/push_rag_data.py`	Refactors fingerprint mismatch detection into a helper for readability/testability.
`scripts/rag/migrate_collection_embedding.py`	Extracts document row cleaning into `_clean_document_rows`.
`scripts/rag/manage_collections_core_evaluation.py`	Extracts embedder/db-cache setup into `_setup_embedder_and_db_cache`.
`scripts/rag/manage_collections_commands_lint.py`	Extracts lint-file processing into a helper (but currently introduces a Click command wiring issue).
`scripts/rag/manage_collections_commands_eval.py`	Extracts report-writing and metric-gate checks into helpers.
`scripts/rag/manage_collections_commands_collections.py`	Extracts bulk deletion confirmation into `_confirm_bulk_deletion`.
`scripts/rag/analyze_rag_text_validation.py`	Extracts validation and duplicate-UUID checks into helpers.
`scripts/rag/analyze_rag_text_enrichment.py`	Refactors entity category inference and metadata enrichment into smaller helpers.
`scripts/rag/analyze_rag_coverage.py`	Refactors report formatting into section helpers.
`tests/test_web_app_utils.py`	New unit tests for `web_app` pure utility functions.
`tests/test_script_entrypoints.py`	New unit tests validating wrapper-script `__main__` behavior via `runpy`.
`tests/test_retrieval_keys.py`	New unit tests for pure functions in `core/retrieval_keys.py`.
`tests/test_rag_validation.py`	New unit tests for RAG metadata validation and error paths.
`tests/test_quality_gate.py`	New unit tests for lightweight helper functions in `scripts/quality_gate.py`.
`tests/test_capture_code_metrics.py`	New unit tests for parsing helpers in `capture_code_metrics`.
`tests/test_persona_drift.py`	Adds additional edge-case tests and deduplicates anchor setup.
`logs/code_quality/latest.json`	Adds a committed “latest” code quality snapshot.
`logs/code_quality/history.csv`	Adds a committed history of code quality snapshots.
`.gitignore`	Ignores coverage artifacts and ignores `logs/*` except `logs/code_quality/`.

Copilot · 2026-03-28T08:07:38Z

+    # pytest exits 2 when tests fail; exit 1 means only the coverage threshold was missed
+    if cov_returncode not in (0, 2):
+        click.echo(f"FAIL: coverage threshold not met ({coverage['total_pct']}% < {_COVERAGE_THRESHOLD}%)", err=True)
+        gate_failed = True


capture_code_metrics treats pytest return codes inconsistently (comment says tests fail with exit code 2, and the gate only fails when return code is not in (0, 2)). Pytest’s exit codes differ (e.g., tests failing is typically 1; 2 is interrupted), and pytest-cov may also use a non-zero code for --cov-fail-under. As written, some failing states can be misclassified and potentially not fail the gate. Consider: (1) failing the gate on any non-zero return code, and (2) emitting distinct error messages for “tests failed” vs “coverage below threshold” based on the actual exit code or parsed output.

Suggested change

# pytest exits 2 when tests fail; exit 1 means only the coverage threshold was missed

if cov_returncode not in (0, 2):

click.echo(f"FAIL: coverage threshold not met ({coverage['total_pct']}% < {_COVERAGE_THRESHOLD}%)", err=True)

gate_failed = True

coverage_failed = coverage["total_pct"] < coverage["threshold"]

if cov_returncode != 0:

if coverage_failed:

click.echo(

f"FAIL: coverage threshold not met ({coverage['total_pct']}% < {coverage['threshold']}%)",

err=True,

)

else:

click.echo(

f"FAIL: pytest exited with code {cov_returncode}; tests or runner failed (see output above).",

err=True,

)

gate_failed = True

elif coverage_failed:

click.echo(

f"FAIL: coverage threshold not met ({coverage['total_pct']}% < {coverage['threshold']}%)",

err=True,

)

gate_failed = True

Copilot · 2026-03-28T08:07:39Z

+        _classify_technology_primary(entity_lower),
+        _classify_faction(entity_lower, context_text, has_location_hint_in_entity),
+        _classify_location(context_text, has_location_hint_in_entity),
+        _classify_event(entity_lower, context_text),
+        _classify_technology_secondary(entity_stripped, context_text),
+        _classify_character(entity_stripped, context_text),
+    )
+    for result in classifiers:


infer_category_with_confidence builds a tuple of classifier results (classifiers = (...)), which eagerly calls every classifier even though only the first match is used. This removes the short-circuit behavior the previous if/elif chain had and adds unnecessary regex/any() work on every call. Consider iterating through callables (or calling each classifier inside the loop) so evaluation stops at the first match.

Suggested change

_classify_technology_primary(entity_lower),

_classify_faction(entity_lower, context_text, has_location_hint_in_entity),

_classify_location(context_text, has_location_hint_in_entity),

_classify_event(entity_lower, context_text),

_classify_technology_secondary(entity_stripped, context_text),

_classify_character(entity_stripped, context_text),

)

for result in classifiers:

lambda: _classify_technology_primary(entity_lower),

lambda: _classify_faction(

entity_lower,

context_text,

has_location_hint_in_entity,

),

lambda: _classify_location(context_text, has_location_hint_in_entity),

lambda: _classify_event(entity_lower, context_text),

lambda: _classify_technology_secondary(entity_stripped, context_text),

lambda: _classify_character(entity_stripped, context_text),

)

for classifier in classifiers:

result = classifier()

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

ossirytk · 2026-03-28T08:18:28Z

@copilot apply changes based on the comments in this thread

…sifier evaluation Agent-Logs-Url: https://github.com/ossirytk/light-chat/sessions/fd06894f-a153-436d-8653-dfae864a9e18 Co-authored-by: ossirytk <136164501+ossirytk@users.noreply.github.com>

Copilot · 2026-03-28T08:22:07Z

@copilot apply changes based on the comments in this thread

Applied both changes in commit 89a376f:

capture_code_metrics.py: Replaced the not in (0, 2) check with a proper != 0 gate. Any non-zero pytest exit now fails the gate. When cov_returncode != 0, it emits a distinct error: "coverage threshold not met" if coverage is also below threshold (the typical cause), or "pytest exited with code X; tests or runner failed" otherwise. A coverage_failed + returncode-0 path also fails the gate.
analyze_rag_text_enrichment.py: Wrapped each classifier call in a lambda so they are evaluated lazily. The loop now calls classifier() and stops at the first non-None result, restoring the original short-circuit behaviour.

Reworked quality gates and testing

d0dfde1

Previously both triggers used branches ['**'], running the workflow twice on every PR push. Now push is scoped to main only. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 28, 2026 08:02

Copilot started reviewing on behalf of ossirytk March 28, 2026 08:03 View session

Copilot AI reviewed Mar 28, 2026

View reviewed changes

style: apply ruff format to enrichment and test file

b9c2738

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot started work on behalf of ossirytk March 28, 2026 08:18 View session

Apply review suggestions: fix pytest exit code handling and lazy clas…

89a376f

…sifier evaluation Agent-Logs-Url: https://github.com/ossirytk/light-chat/sessions/fd06894f-a153-436d-8653-dfae864a9e18 Co-authored-by: ossirytk <136164501+ossirytk@users.noreply.github.com>

Copilot finished work on behalf of ossirytk March 28, 2026 08:22

ossirytk merged commit 2d84cd9 into main Mar 28, 2026

ossirytk deleted the vibing-quality branch March 28, 2026 08:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reworked quality gates and testing#23

Reworked quality gates and testing#23
ossirytk merged 3 commits intomainfrom
vibing-quality

ossirytk commented Mar 28, 2026 •

edited by Copilot AI

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 28, 2026

Uh oh!

Copilot AI Mar 28, 2026

Uh oh!

ossirytk commented Mar 28, 2026

Uh oh!

Copilot AI commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-    # pytest exits 2 when tests fail; exit 1 means only the coverage threshold was missed
-    if cov_returncode not in (0, 2):
-        click.echo(f"FAIL: coverage threshold not met ({coverage['total_pct']}% < {_COVERAGE_THRESHOLD}%)", err=True)
-        gate_failed = True
+    coverage_failed = coverage["total_pct"] < coverage["threshold"]
+    if cov_returncode != 0:
+        if coverage_failed:
+            click.echo(
+                f"FAIL: coverage threshold not met ({coverage['total_pct']}% < {coverage['threshold']}%)",
+                err=True,
+            )
+        else:
+            click.echo(
+                f"FAIL: pytest exited with code {cov_returncode}; tests or runner failed (see output above).",
+                err=True,
+            )
+        gate_failed = True
+    elif coverage_failed:
+        click.echo(
+            f"FAIL: coverage threshold not met ({coverage['total_pct']}% < {coverage['threshold']}%)",
+            err=True,
+        )
+        gate_failed = True

Conversation

ossirytk commented Mar 28, 2026 • edited by Copilot AI Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

ossirytk commented Mar 28, 2026

Uh oh!

Copilot AI commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ossirytk commented Mar 28, 2026 •

edited by Copilot AI

Loading