A new --source-ref flag lets analysis read source files from a git ref (branch, sha, HEAD, or :0 for the index) instead of the working tree. Primary motivation: pre-commit integration under git commit --only, where the stash + keep-index dance produces a Frankenstein working tree that triggers false-positive duplications.
What's new
Feature
-
--source-ref <ref>CLI flag andsourceRef:YAML key (#33): When set, file listing, hashing, tokenization, and suppression scanning all operate on the blob contents at that ref. Working-tree state becomes irrelevant.swift-cpd --source-ref HEAD Sources/ # last committed state swift-cpd --source-ref :0 Sources/ # the index (staged blobs) swift-cpd --source-ref main Sources/ # a branch tip
# .swift-cpd.yml sourceRef: HEAD
CLI takes precedence over YAML. Empty string is treated as unset (reads the working tree). See the Reading from a git ref section for pre-commit recipes and caveats.
Bug fix
- Path normalization across
/private/varand/varsymlinks fixed:GitRefSourceFileListerandGitRefSourceReaderpreviously rejected legitimate paths inside macOS-style standardized roots when one side of the comparison was already collapsed. Both sides of the prefix check are now standardized viaNSString.standardizingPath, exposed via a shared free functionrepositoryRelativePath(for:in:). Surfaced by the linked-worktree (G13) test.
Cache control
-
Cache schema bumped to v2 with composite keys: On-disk format is now a versioned envelope (
{ "schemaVersion": 2, "entries": { ... } }). Entry keys are composite (<resolvedSha>|<path>for git-ref runs,<path>for working-tree runs) so concurrent runs against different refs do not collide.Note: Existing v1 caches are silently invalidated on first load — one full tokenization pass after upgrading, then steady state. No manual
rm -rf .swift-cpd-cacherequired.
Documentation
- New chapter
Docs/CodeBase/12-source-io.mdcovering the git-backed IO module (GitProcessRunner,GitRefResolver,GitRefSourceFileLister,GitRefSourceReader,SourceRefError,repositoryRelativePath). - Updated chapters:
Docs/USAGE.md— new "Reading from a git ref" section with pre-commit recipes, CRLF/submodule notes, git requirement, andxcodeformat caveat.Docs/Architecture/01-overview.md— IO module in the map; entry-point flow shows the source-ref branch.Docs/Architecture/02-pipeline.md— IO stage in the mermaid; thresholds table reorganized byDetectionOptions/CacheOptions/SourceOptions.Docs/Architecture/05-supporting-systems.md— cache schema v2 envelope and one-shot invalidation documented.Docs/CodeBase/02-file-discovery.md— renamed to "File Discovery & Source IO"; coversSourceFileLister/SourceReaderprotocols and both implementations.Docs/CodeBase/04-pipeline.md— newAnalysisPipeline.initsignature with three options structs.Docs/CodeBase/09-reporting.md—sourceRef/resolvedShainAnalysisResultand across all reporters.Docs/CodeBase/10-cache-baseline.md—CacheKey,Envelope,FileHasher.hash(data:)overload.
- README gained a short "Analyze a git ref instead of the working tree" section linking back to USAGE.
Reporting
- Text and HTML reporters include the ref in their header when set:
Found 4 clone(s) in 96 files (at HEAD, 0.42s). Output is byte-identical to v1.4.0 whensourceRefis absent. - JSON reporter gains two optional top-level fields,
sourceRefandresolvedSha. Encoded viaencodeIfPresent— omitted entirely when not set, so existing JSON consumers are unaffected.
Refactor
AnalysisPipeline.initreduced from 9 parameters to 3 grouped option structs (DetectionOptions,CacheOptions,SourceOptions), each in its own file.- One type per file: 8 nested types extracted via
<Parent>+<Nested>.swiftextension files (AnalysisPipeline+DetectionOptions,JsonReport+CodingKeys,FileCache+Envelope, etc.). - Test files decomposed: nested
@Suites inDetectionMutationTests,DetectorMutationTests, andReportingMutationTestspromoted to top-level — one file per suite (13 new files). - Code style sweep: multi-line guards,
if-else-ifchains rewritten asswitchwhere appropriate (and reverted toguardwhere Sonar flagged single-case switches), helpers/factories/fixtures moved toTestSupport/. - Dead code removed:
FileHasher.hash(contentsOf:)overload (production migrated tohash(data:)in Phase 1) andSourceRefError.noMatchingFilescase (never thrown).
Build & CI
- New
Makefilewithmake test,make coverage, andmake sonartargets mirroring the CI workflow so local results match CI byte-for-byte. Scripts/lcov-to-sonar.awkisolates the LCOV → SonarCloud generic XML conversion.
Requirements
- macOS 15+
- Swift 6.2+
gitonPATHwhen using--source-ref
Installation
See the Installation Guide for Homebrew, pre-built binary, SPM plugin, and pre-commit hook setup.
Quality bar
- 944 tests in 91 suites
- 100% line coverage restored
- 0 dead code reported by Periphery
- All
swiftlint --strict,swift-format,swift-marshal,codespell, andgitleakschecks green
What's Changed
- docs: rename product to Swift Copy-Paste Detector to match CLI acronym by @ericodx in #32
- feat: --source-ref reads source files from a git ref instead of the working tree by @ericodx in #34
Full Changelog: v1.4.0...v1.5.0