Skip to content

v1.5.0 — Analyze a git ref instead of the working tree

Latest

Choose a tag to compare

@ericodx ericodx released this 08 Jun 17:55
b88a401

A new --source-ref flag lets analysis read source files from a git ref (branch, sha, HEAD, or :0 for the index) instead of the working tree. Primary motivation: pre-commit integration under git commit --only, where the stash + keep-index dance produces a Frankenstein working tree that triggers false-positive duplications.


What's new

Feature

  • --source-ref <ref> CLI flag and sourceRef: YAML key (#33): When set, file listing, hashing, tokenization, and suppression scanning all operate on the blob contents at that ref. Working-tree state becomes irrelevant.

    swift-cpd --source-ref HEAD Sources/    # last committed state
    swift-cpd --source-ref :0   Sources/    # the index (staged blobs)
    swift-cpd --source-ref main Sources/    # a branch tip
    # .swift-cpd.yml
    sourceRef: HEAD

    CLI takes precedence over YAML. Empty string is treated as unset (reads the working tree). See the Reading from a git ref section for pre-commit recipes and caveats.

Bug fix

  • Path normalization across /private/var and /var symlinks fixed: GitRefSourceFileLister and GitRefSourceReader previously rejected legitimate paths inside macOS-style standardized roots when one side of the comparison was already collapsed. Both sides of the prefix check are now standardized via NSString.standardizingPath, exposed via a shared free function repositoryRelativePath(for:in:). Surfaced by the linked-worktree (G13) test.

Cache control

  • Cache schema bumped to v2 with composite keys: On-disk format is now a versioned envelope ({ "schemaVersion": 2, "entries": { ... } }). Entry keys are composite (<resolvedSha>|<path> for git-ref runs, <path> for working-tree runs) so concurrent runs against different refs do not collide.

    Note: Existing v1 caches are silently invalidated on first load — one full tokenization pass after upgrading, then steady state. No manual rm -rf .swift-cpd-cache required.

Documentation

  • New chapter Docs/CodeBase/12-source-io.md covering the git-backed IO module (GitProcessRunner, GitRefResolver, GitRefSourceFileLister, GitRefSourceReader, SourceRefError, repositoryRelativePath).
  • Updated chapters:
    • Docs/USAGE.md — new "Reading from a git ref" section with pre-commit recipes, CRLF/submodule notes, git requirement, and xcode format caveat.
    • Docs/Architecture/01-overview.md — IO module in the map; entry-point flow shows the source-ref branch.
    • Docs/Architecture/02-pipeline.md — IO stage in the mermaid; thresholds table reorganized by DetectionOptions/CacheOptions/SourceOptions.
    • Docs/Architecture/05-supporting-systems.md — cache schema v2 envelope and one-shot invalidation documented.
    • Docs/CodeBase/02-file-discovery.md — renamed to "File Discovery & Source IO"; covers SourceFileLister/SourceReader protocols and both implementations.
    • Docs/CodeBase/04-pipeline.md — new AnalysisPipeline.init signature with three options structs.
    • Docs/CodeBase/09-reporting.mdsourceRef/resolvedSha in AnalysisResult and across all reporters.
    • Docs/CodeBase/10-cache-baseline.mdCacheKey, Envelope, FileHasher.hash(data:) overload.
  • README gained a short "Analyze a git ref instead of the working tree" section linking back to USAGE.

Reporting

  • Text and HTML reporters include the ref in their header when set: Found 4 clone(s) in 96 files (at HEAD, 0.42s). Output is byte-identical to v1.4.0 when sourceRef is absent.
  • JSON reporter gains two optional top-level fields, sourceRef and resolvedSha. Encoded via encodeIfPresentomitted entirely when not set, so existing JSON consumers are unaffected.

Refactor

  • AnalysisPipeline.init reduced from 9 parameters to 3 grouped option structs (DetectionOptions, CacheOptions, SourceOptions), each in its own file.
  • One type per file: 8 nested types extracted via <Parent>+<Nested>.swift extension files (AnalysisPipeline+DetectionOptions, JsonReport+CodingKeys, FileCache+Envelope, etc.).
  • Test files decomposed: nested @Suites in DetectionMutationTests, DetectorMutationTests, and ReportingMutationTests promoted to top-level — one file per suite (13 new files).
  • Code style sweep: multi-line guards, if-else-if chains rewritten as switch where appropriate (and reverted to guard where Sonar flagged single-case switches), helpers/factories/fixtures moved to TestSupport/.
  • Dead code removed: FileHasher.hash(contentsOf:) overload (production migrated to hash(data:) in Phase 1) and SourceRefError.noMatchingFiles case (never thrown).

Build & CI

  • New Makefile with make test, make coverage, and make sonar targets mirroring the CI workflow so local results match CI byte-for-byte.
  • Scripts/lcov-to-sonar.awk isolates the LCOV → SonarCloud generic XML conversion.

Requirements

  • macOS 15+
  • Swift 6.2+
  • git on PATH when using --source-ref

Installation

See the Installation Guide for Homebrew, pre-built binary, SPM plugin, and pre-commit hook setup.


Quality bar

  • 944 tests in 91 suites
  • 100% line coverage restored
  • 0 dead code reported by Periphery
  • All swiftlint --strict, swift-format, swift-marshal, codespell, and gitleaks checks green

What's Changed

  • docs: rename product to Swift Copy-Paste Detector to match CLI acronym by @ericodx in #32
  • feat: --source-ref reads source files from a git ref instead of the working tree by @ericodx in #34

Full Changelog: v1.4.0...v1.5.0