Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
4bbf3bc
feat(report): add canonical analysis profiles and bump report schema …
orenlab Apr 6, 2026
6953347
feat(mcp,vscode): clarify repository health and triage focus semantics
orenlab Apr 6, 2026
333ccc8
feat(mcp,vscode): clarify repository health and triage focus semantics
orenlab Apr 7, 2026
ebcd474
feat(claude): prefer workspace runtimes and poetry envs before global…
orenlab Apr 8, 2026
f9be591
fix(core,ci): harden git diff validation, make segment digests canoni…
orenlab Apr 8, 2026
7ef49d0
feat(metrics): add adoption and public API baselines with compact sch…
orenlab Apr 9, 2026
b88cc11
feat: add coverage join and golden-fixture clone exclusions
orenlab Apr 13, 2026
8ddd743
feat: align client surfaces with coverage join and always-on adoption…
orenlab Apr 13, 2026
4067c58
chore(docs): align AGENTS and contract docs with current code
orenlab Apr 14, 2026
d2cfc51
fix(vscode,claude): harden client settings, logging, and bundle valid…
orenlab Apr 14, 2026
49b3d7c
feat(html): refine provenance, empty states, and filter controls
orenlab Apr 15, 2026
9acac52
fix(cli,bench): stabilize metrics mode and baseline path handling
orenlab Apr 15, 2026
0c70546
fix(cache,html): invalidate stale api-surface cache and unify report …
orenlab Apr 15, 2026
38ec861
chore(docs): refresh README.md
orenlab Apr 16, 2026
6840326
feat(mcp): add compact threshold context for empty design checks
orenlab Apr 16, 2026
5f39b3f
chore(deps): update project deps and pin actual version
orenlab Apr 16, 2026
5ffac60
chore(release): finalize changelog for 2.0.0b5
orenlab Apr 16, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/codeclone.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
uses: actions/checkout@v6.0.2
with:
fetch-depth: 0

Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ jobs:
- name: Run tests
# Smoke CLI tests intentionally disable subprocess coverage collection
# to avoid runner-specific flakiness while keeping parent-process coverage strict.
run: uv run pytest --cov=codeclone --cov-report=term-missing --cov-fail-under=98
run: uv run pytest --cov=codeclone --cov-report=term-missing --cov-fail-under=99

- name: Verify baseline exists
if: ${{ matrix.python-version == '3.13' }}
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -39,3 +39,4 @@ site/
/.uv-cache/
/package-lock.json
extensions/vscode-codeclone/node_modules
/coverage.xml
98 changes: 90 additions & 8 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,22 @@ uv run pytest -q tests/test_codex_plugin.py

## 4) Baseline contract (v2, stable)

### Versioned constants (single source of truth)

All schema/version constants live in `codeclone/contracts.py`. **Always read them from code, never copy
from another doc.** Current values (verified at write time):

| Constant | Source | Current value |
|-----------------------------------|------------------------------|---------------|
| `BASELINE_SCHEMA_VERSION` | `codeclone/contracts.py` | `2.1` |
| `BASELINE_FINGERPRINT_VERSION` | `codeclone/contracts.py` | `1` |
| `CACHE_VERSION` | `codeclone/contracts.py` | `2.5` |
| `REPORT_SCHEMA_VERSION` | `codeclone/contracts.py` | `2.8` |
| `METRICS_BASELINE_SCHEMA_VERSION` | `codeclone/contracts.py` | `1.2` |

When updating any doc that mentions a version, re-read `codeclone/contracts.py` first. Do not derive
versions from another document.

### Baseline file structure (canonical)

```json
Expand All @@ -144,7 +160,7 @@ uv run pytest -q tests/test_codex_plugin.py
"name": "codeclone",
"version": "X.Y.Z"
},
"schema_version": "2.0",
"schema_version": "2.1",
"fingerprint_version": "1",
"python_tag": "cp313",
"created_at": "2026-02-08T14:20:15Z",
Expand All @@ -163,8 +179,9 @@ uv run pytest -q tests/test_codex_plugin.py
### Rules

- `schema_version` is **baseline schema**, not package version.
- Runtime writes baseline schema `2.0`.
- Runtime accepts baseline schema `1.x` and `2.x` for compatibility checks.
- Runtime writes baseline schema `2.1`.
- Runtime accepts baseline schema `1.0` and `2.0`–`2.1` (governed by
`_BASELINE_SCHEMA_MAX_MINOR_BY_MAJOR` in `codeclone/baseline.py`).
- Compatibility is tied to:
- `fingerprint_version`
- `python_tag`
Expand Down Expand Up @@ -358,8 +375,8 @@ Architecture is layered, but grounded in current code (not aspirational diagrams
`codeclone/grouping.py`, `codeclone/scanner.py`) produces normalized structural facts and clone candidates.
- **Domain/contracts layer** (`codeclone/models.py`, `codeclone/contracts.py`, `codeclone/errors.py`,
`codeclone/domain/*.py`) defines typed entities and stable enums/constants used across layers.
- **Persistence contracts** (`codeclone/baseline.py`, `codeclone/cache.py`, `codeclone/metrics_baseline.py`) store
trusted comparison state and optimization state.
- **Persistence contracts** (`codeclone/baseline.py`, `codeclone/cache.py`, `codeclone/cache_io.py`,
`codeclone/metrics_baseline.py`) store trusted comparison state and optimization state.
- **Canonical report + projections** (`codeclone/report/json_contract.py`, `codeclone/report/*.py`) converts analysis
facts to deterministic, contract-shaped outputs.
- **HTML/UI rendering** (`codeclone/html_report.py`, `codeclone/_html_report/*`, `codeclone/_html_*.py`,
Expand Down Expand Up @@ -411,8 +428,12 @@ Use this map to route changes to the right owner module.
deterministic.
- `codeclone/baseline.py` — baseline schema/trust/integrity/compatibility contract; all baseline format changes go here
with explicit contract process.
- `codeclone/cache.py` — cache schema/integrity/profile compatibility and serialization; cache remains
- `codeclone/cache.py` — cache schema/status/profile compatibility and high-level serialization policy; cache remains
optimization-only.
- `codeclone/cache_io.py` — IO-layer helpers for the cache: atomic JSON read/write
(`read_json_document`, `write_json_document_atomically`), canonical JSON (`canonical_json`), and
HMAC signing/verification (`sign_cache_payload`, `verify_cache_payload_signature`); attribute these
functions to `cache_io.py`, not `cache.py`.
- `codeclone/report/json_contract.py` — canonical report schema builder/integrity payload; any JSON contract shape
change belongs here.
- `codeclone/report/*.py` (other modules) — deterministic projections/format transforms (
Expand Down Expand Up @@ -529,7 +550,7 @@ Policy:
### Public / contract-sensitive surfaces

- CLI flags, defaults, exit codes, and stable script-facing messages.
- Baseline schema/trust semantics/integrity compatibility (`2.0` baseline contract family).
- Baseline schema/trust semantics/integrity compatibility (`BASELINE_SCHEMA_VERSION` contract family).
- Cache schema/status/profile compatibility/integrity (`CACHE_VERSION` contract family).
- Canonical report JSON schema/payload semantics (`REPORT_SCHEMA_VERSION` contract family).
- Documented report projections and their machine/user-facing semantics (HTML/Markdown/SARIF/Text).
Expand Down Expand Up @@ -621,7 +642,68 @@ Avoid deep package hierarchies unless they clearly reduce coupling.

---

## 20) Minimal checklist for PRs (agents)
## 20) Agent safety rules

These rules exist because of real incidents in this repo. They are non-negotiable.

### Scope discipline

- Touch only files directly related to your current task.
- Do not "clean up", reformat, or refactor code in files outside your task scope.
- Do not delete functions, classes, blocks, or whole files written by other contributors unless
deletion is the explicit goal of your task.
- If you discover unrelated issues, report them in your final message — do not fix them silently.
- Before starting work, run `git status` and review uncommitted/untracked changes. They may belong
to a parallel agent or to the maintainer; do not delete or overwrite them without explicit approval.

### Documentation hygiene

- Every doc claim about code (schema version, module path, function name, MCP tool count, exit code,
CLI flag) must be verified against the **current** code before writing or editing.
- Always read version constants from `codeclone/contracts.py` (see Section 4 table), never from
another doc.
- When updating a file that mentions schema versions, verify **every** version reference in that
file — not only the one you came to change.
- Do not remove narrative content from docs you did not author. Add or correct only.
- Do not replace a multi-section doc with a "pointer" stub unless the maintainer explicitly asks for it.
- Do not create new `*.md` design specs ("PROPOSED", "FUTURE", "RFC") inside `docs/`. Use the
maintainer's planning channel instead — orphaned specs become stale and misleading.

### Audit completeness

- When the maintainer asks to audit "all" of something, list every file you actually opened in your
final report. Selective audits silently skip the most error-prone files.
- Prefer parallel `Explore` agents partitioned by file group over a single sequential pass —
coverage is the contract, not effort.

### Shared helpers

- HTML/UI helpers (`_html_badges.py`, `_html_css.py`, `_html_js.py`, `_html_escape.py`,
`_html_report/_glossary.py`) are imported, not duplicated locally inside `_html_report/_sections/*`.
If you need a helper that doesn't exist, add it to the shared module.
- Glossary terms used in stat-card labels live in `codeclone/_html_report/_glossary.py`. Adding a
new label without a glossary entry is a contract gap.

### Conflict avoidance

- Do not force-push, `git reset --hard`, or `git checkout --` over uncommitted work without
explicit maintainer approval.
- If your changes conflict with recent commits or other agents' work, rebase or merge cleanly —
never silently drop the other side.
- Never use `--no-verify` to bypass pre-commit hooks; fix the underlying issue.

### Verification before "done"

- A task that touches HTML rendering is not complete until
`pytest tests/test_html_report.py -x -q` is green.
- A task that touches MCP is not complete until
`pytest tests/test_mcp_service.py tests/test_mcp_server.py -x -q` is green.
- A task that touches docs schema/version claims is not complete until you have grep'd the whole
file for *all* version-shaped strings and verified each against `codeclone/contracts.py`.

---

## 21) Minimal checklist for PRs (agents)

- [ ] Change is deterministic.
- [ ] Contracts preserved or versioned.
Expand Down
42 changes: 41 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,46 @@
# Changelog

## [2.0.0b4]
## [2.0.0b5] - 2026-04-16

Expands the canonical contract with adoption, API-surface, and coverage-join layers; clarifies run interpretation
across MCP/HTML/clients; tightens MCP launcher/runtime behavior.

### Contracts, metrics, and review surfaces

- Report schema `2.8`: add `coverage_adoption`, `api_surface`, `coverage_join`, and optional
`clones.suppressed.*` (for `golden_fixture_paths`); separate coverage hotspots vs scope gaps.
- Baselines: clone `2.1`, metrics `1.2`; compact `api_surface` payload (`local_name` on disk, qualnames at runtime);
read-compatible with `2.0` / `1.1`.
- Add public/private visibility classification for public-symbol metrics (no clone/fingerprint changes).
- Add annotation/docstring adoption coverage: parameter, return, public docstrings, explicit `Any`.
- Add opt-in API surface inventory + baseline diff (snapshots, additions, breaking changes).
- Add coverage join (`--coverage`): per-function facts + findings for below-threshold or missing-in-scope functions;
current-run only (not baseline truth, no fingerprint impact).
- Add `golden_fixture_paths`: exclude matching clone groups from health/gates while keeping suppressed facts.
- Add gates: `--min-typing-coverage`, `--min-docstring-coverage`, `--fail-on-typing-regression`,
`--fail-on-docstring-regression`, `--fail-on-api-break`, `--fail-on-untested-hotspots`, `--coverage-min`.
- Surface adoption/API/coverage-join in MCP, CLI Metrics, report payloads, and HTML (Overview + Quality subtab).
- Preserve embedded metrics and optional `api_surface` in unified baselines.
- Cache `2.5`: make analysis-profile compatibility API-surface-aware; invalidate stale non-API warm caches; preserve parameter order; align warm/cold API diffs.

### MCP, HTML, and client interpretation

- Surface effective analysis profile in report meta, MCP summary/triage, and HTML subtitle.
- Add `health_scope`, `focus`, `new_by_source_kind` to MCP summary/triage.
- Make baseline mismatch explicit (python tags + no-valid-baseline signal).
- Surface `Coverage Join` facts and the optional `coverage` MCP help topic in
the VS Code extension when the connected server supports them.
- Prefer workspace-local launchers over `PATH` (Poetry fallback).
- Add `workspace_root` to force project `.venv` selection.

### Safety and maintenance

- Validate `git_diff_ref` as safe single-revision expressions.
- Replace segment digest `repr()` with canonical JSON bytes (determinism).
- Align CI coverage gate (`fail_under = 99`) and refresh `actions/checkout` pin.
- Refresh branch metadata/docs for `2.0.0b5`; update README badge to `89 (B)`.

## [2.0.0b4] - 2026-04-05

### MCP server

Expand Down
29 changes: 25 additions & 4 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -138,10 +138,10 @@ CodeClone maintains several versioned schema contracts:

| Schema | Current version | Owner |
|------------------|-----------------|-------------------------------------|
| Baseline | `2.0` | `codeclone/baseline.py` |
| Report | `2.1` | `codeclone/report/json_contract.py` |
| Cache | `2.2` | `codeclone/cache.py` |
| Metrics baseline | `1.0` | `codeclone/metrics_baseline.py` |
| Baseline | `2.1` | `codeclone/baseline.py` |
| Report | `2.8` | `codeclone/report/json_contract.py` |
| Cache | `2.4` | `codeclone/cache_io.py` |
| Metrics baseline | `1.2` | `codeclone/metrics_baseline.py` |

Any change to schema shape or semantics requires version review, documentation, and tests.

Expand Down Expand Up @@ -209,6 +209,27 @@ uv run pytest -q tests/test_mcp_service.py tests/test_mcp_server.py

---

## Commit Messages

Use the repository's existing **Conventional Commits** style:

- format: `type(scope): imperative summary`
- keep `type` lowercase (`feat`, `fix`, `docs`, `chore`, ...)
- keep the summary short, imperative, and specific to the user-visible change
- use a narrow scope when it helps (`metrics`, `mcp,vscode`, `core,ci`, ...)
- split unrelated changes into separate commits instead of writing one broad summary

Examples from the current history:

- `fix(core,ci): harden git diff validation, make segment digests canonical, and align CI policy`
- `feat(metrics): add adoption and public API baselines with compact schema-aware storage`
- `chore(docs): align AGENTS and contract docs with current code`

If a commit needs extra context, keep the subject line concise and explain the
rest in the commit body.

---

## Code Style

- Python **3.10 – 3.14**
Expand Down
Loading
Loading