Skip to content

[enhancement] Deduplicate monorepo gems sharing identical CHANGELOG #12

@justi

Description

@justi

Summary

When two gems are published from the same monorepo with a unified CHANGELOG (e.g. sentry-rails and sentry-ruby from getsentry/sentry-ruby), postcut emits identical bullet content for both gems. This wastes tokens in the generated LLM context document without adding information.

Observed

In a generated .postcut/<model>.md for a project depending on both sentry-rails and sentry-ruby, sections ### sentry-rails and ### sentry-ruby contain bit-identical bullet lists. Verified with:

diff <(awk 'BEGIN{p=0} /^### sentry-rails/{p=1;next} /^### /{if(p)exit} p' doc.md | grep "^- ") \
     <(awk 'BEGIN{p=0} /^### sentry-ruby/{p=1;next} /^### /{if(p)exit} p' doc.md | grep "^- ")
# → empty diff, 17 identical bullets each

Root cause

Confirmed: upstream getsentry/sentry-ruby monorepo publishes a unified CHANGELOG.md for the entire SDK ecosystem, with no per-gem section differentiation per version. Postcut faithfully reflects this — but for a project pulling both gems, the doc duplicates ~17 bullets × ~200 chars = ~3.4K tokens.

Comparison: rails meta-gem already handled

README.md notes that the rails meta-gem expands into per-component prefixes ([activerecord], [actionview], ...). This works well in current output. Sentry-style monorepos need similar treatment but inverted: instead of expanding one gem into many sub-sections, group multiple gems sharing a CHANGELOG under one section.

Suggested approaches

  1. Group section header: ### sentry-rails, sentry-ruby with one bullet list.
  2. Reference pointer: Emit full content for first gem alphabetically, then for siblings emit ### sentry-ruby\n\n*See [sentry-rails](#sentry-rails) — shared CHANGELOG from getsentry/sentry-ruby monorepo.*
  3. Detect via shared upstream URL: when two gems' gem specification | homepage_uri points to the same repo AND their CHANGELOGs hash-match, treat as duplicates.

Impact

Token efficiency for in-flight LLM context. A typical Rails app may pull 3-5 monorepo siblings (e.g. sentry-*, pay-*, view_component-*) — savings compound.

🤖 Reported via Claude Code review

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions