Skip to content

cochange: files that change together #5

@simon-marcus

Description

@simon-marcus

Implement `commitlens cochange [--min N] [--top N]` — pairs of files that frequently appear in the same commit.

Deliverable

Module: `src/commitlens/cochange.py`. Register in `_register_subcommands`.

CLI args: `--min N` (minimum co-occurrence count to include; default 3), `--top N` (cap output rows; default 30).

Output schema

```json
{
"_render": "",
"pairs": [
{"a": "src/auth.py", "b": "tests/test_auth.py", "together": 18, "a_total": 22, "b_total": 19, "jaccard": 0.78},
...
]
}
```

`together` is the count of commits that touched BOTH files. `a_total` / `b_total` are the total commits touching each. `jaccard` is `together / (a_total + b_total - together)`, rounded to 2 decimals. Sort by `jaccard` desc, then `together` desc.

Filter out:

  • pairs where together < `--min`
  • pairs where one of the files is the same path

Human render

Fixed-width table. Columns: a (truncate to 35), b (truncate to 35), together, jaccard (display as 0.78).

Algorithm

For each commit (`git log --name-only --pretty=format:%H`), collect the set of paths. Increment counters for each unique pair (a, b) with a < b. Aggregate at the end. Skip commits that touched only one file.

Memory: a 1000-commit repo with ~10 files/commit = ~50000 pair entries. Use `collections.Counter` keyed on tuples; this is fine. Don't worry about scaling beyond that.

Tests

`tests/test_cochange.py` using `git_repo`:

  • Two files committed together 4 times → appears in output (above default min=3).
  • Two files committed together 1 time → does NOT appear.
  • A pair `(foo, bar)` and a pair `(foo, baz)` — assert ordering by jaccard.
  • A commit touching only one file doesn't blow up the algorithm.

Metadata

Metadata

Assignees

No one assigned

    Labels

    symphonyCreated by Symphony

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions