Conversation
Previously both triggers used branches ['**'], running the workflow twice on every PR push. Now push is scoped to main only. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Reworks CI quality gates to avoid duplicate workflow runs on PR pushes and introduces a new “code quality metrics” gate (coverage + complexity) along with additional unit tests to improve coverage across utility/script components.
Changes:
- Scope
quality_gate.ymlpush trigger tomainonly, and add a new “capture code quality metrics” CI step. - Add
pytest-cov,radon, andxenonplus a newscripts/quality/capture_code_metrics.pysnapshot+gate command and coverage config inpyproject.toml. - Add multiple new unit-test modules covering utility functions and script entrypoints; refactor some script internals for testability.
Reviewed changes
Copilot reviewed 21 out of 24 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
uv.lock |
Adds dev dependencies for coverage/complexity tooling (pytest-cov, radon, xenon, coverage, etc.). |
pyproject.toml |
Registers capture-code-metrics, adds dev deps, adds coverage config, and adds Ruff per-file ignores for tests. |
.github/workflows/quality_gate.yml |
Avoids duplicate runs by scoping push to main; runs the new code-metrics capture/gate step. |
scripts/quality/capture_code_metrics.py |
New command to run pytest-cov + xenon/radon, enforce gates, and write snapshots to logs/code_quality/. |
scripts/rag/push_rag_data.py |
Refactors fingerprint mismatch detection into a helper for readability/testability. |
scripts/rag/migrate_collection_embedding.py |
Extracts document row cleaning into _clean_document_rows. |
scripts/rag/manage_collections_core_evaluation.py |
Extracts embedder/db-cache setup into _setup_embedder_and_db_cache. |
scripts/rag/manage_collections_commands_lint.py |
Extracts lint-file processing into a helper (but currently introduces a Click command wiring issue). |
scripts/rag/manage_collections_commands_eval.py |
Extracts report-writing and metric-gate checks into helpers. |
scripts/rag/manage_collections_commands_collections.py |
Extracts bulk deletion confirmation into _confirm_bulk_deletion. |
scripts/rag/analyze_rag_text_validation.py |
Extracts validation and duplicate-UUID checks into helpers. |
scripts/rag/analyze_rag_text_enrichment.py |
Refactors entity category inference and metadata enrichment into smaller helpers. |
scripts/rag/analyze_rag_coverage.py |
Refactors report formatting into section helpers. |
tests/test_web_app_utils.py |
New unit tests for web_app pure utility functions. |
tests/test_script_entrypoints.py |
New unit tests validating wrapper-script __main__ behavior via runpy. |
tests/test_retrieval_keys.py |
New unit tests for pure functions in core/retrieval_keys.py. |
tests/test_rag_validation.py |
New unit tests for RAG metadata validation and error paths. |
tests/test_quality_gate.py |
New unit tests for lightweight helper functions in scripts/quality_gate.py. |
tests/test_capture_code_metrics.py |
New unit tests for parsing helpers in capture_code_metrics. |
tests/test_persona_drift.py |
Adds additional edge-case tests and deduplicates anchor setup. |
logs/code_quality/latest.json |
Adds a committed “latest” code quality snapshot. |
logs/code_quality/history.csv |
Adds a committed history of code quality snapshots. |
.gitignore |
Ignores coverage artifacts and ignores logs/* except logs/code_quality/. |
| # pytest exits 2 when tests fail; exit 1 means only the coverage threshold was missed | ||
| if cov_returncode not in (0, 2): | ||
| click.echo(f"FAIL: coverage threshold not met ({coverage['total_pct']}% < {_COVERAGE_THRESHOLD}%)", err=True) | ||
| gate_failed = True |
There was a problem hiding this comment.
capture_code_metrics treats pytest return codes inconsistently (comment says tests fail with exit code 2, and the gate only fails when return code is not in (0, 2)). Pytest’s exit codes differ (e.g., tests failing is typically 1; 2 is interrupted), and pytest-cov may also use a non-zero code for --cov-fail-under. As written, some failing states can be misclassified and potentially not fail the gate. Consider: (1) failing the gate on any non-zero return code, and (2) emitting distinct error messages for “tests failed” vs “coverage below threshold” based on the actual exit code or parsed output.
| # pytest exits 2 when tests fail; exit 1 means only the coverage threshold was missed | |
| if cov_returncode not in (0, 2): | |
| click.echo(f"FAIL: coverage threshold not met ({coverage['total_pct']}% < {_COVERAGE_THRESHOLD}%)", err=True) | |
| gate_failed = True | |
| coverage_failed = coverage["total_pct"] < coverage["threshold"] | |
| if cov_returncode != 0: | |
| if coverage_failed: | |
| click.echo( | |
| f"FAIL: coverage threshold not met ({coverage['total_pct']}% < {coverage['threshold']}%)", | |
| err=True, | |
| ) | |
| else: | |
| click.echo( | |
| f"FAIL: pytest exited with code {cov_returncode}; tests or runner failed (see output above).", | |
| err=True, | |
| ) | |
| gate_failed = True | |
| elif coverage_failed: | |
| click.echo( | |
| f"FAIL: coverage threshold not met ({coverage['total_pct']}% < {coverage['threshold']}%)", | |
| err=True, | |
| ) | |
| gate_failed = True |
| _classify_technology_primary(entity_lower), | ||
| _classify_faction(entity_lower, context_text, has_location_hint_in_entity), | ||
| _classify_location(context_text, has_location_hint_in_entity), | ||
| _classify_event(entity_lower, context_text), | ||
| _classify_technology_secondary(entity_stripped, context_text), | ||
| _classify_character(entity_stripped, context_text), | ||
| ) | ||
| for result in classifiers: |
There was a problem hiding this comment.
infer_category_with_confidence builds a tuple of classifier results (classifiers = (...)), which eagerly calls every classifier even though only the first match is used. This removes the short-circuit behavior the previous if/elif chain had and adds unnecessary regex/any() work on every call. Consider iterating through callables (or calling each classifier inside the loop) so evaluation stops at the first match.
| _classify_technology_primary(entity_lower), | |
| _classify_faction(entity_lower, context_text, has_location_hint_in_entity), | |
| _classify_location(context_text, has_location_hint_in_entity), | |
| _classify_event(entity_lower, context_text), | |
| _classify_technology_secondary(entity_stripped, context_text), | |
| _classify_character(entity_stripped, context_text), | |
| ) | |
| for result in classifiers: | |
| lambda: _classify_technology_primary(entity_lower), | |
| lambda: _classify_faction( | |
| entity_lower, | |
| context_text, | |
| has_location_hint_in_entity, | |
| ), | |
| lambda: _classify_location(context_text, has_location_hint_in_entity), | |
| lambda: _classify_event(entity_lower, context_text), | |
| lambda: _classify_technology_secondary(entity_stripped, context_text), | |
| lambda: _classify_character(entity_stripped, context_text), | |
| ) | |
| for classifier in classifiers: | |
| result = classifier() |
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
@copilot apply changes based on the comments in this thread |
…sifier evaluation Agent-Logs-Url: https://github.com/ossirytk/light-chat/sessions/fd06894f-a153-436d-8653-dfae864a9e18 Co-authored-by: ossirytk <136164501+ossirytk@users.noreply.github.com>
Applied both changes in commit
|
scripts/quality/capture_code_metrics.py(fail on any non-zero code, distinct messages for coverage vs test/runner failures)scripts/rag/analyze_rag_text_enrichment.py(use lambdas for short-circuit classifier evaluation)✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.