Skip to content

fix(detection): improve apache camel compare parity#795

Merged
mstykow merged 5 commits intomainfrom
fix/camel-compare-followups
Apr 26, 2026
Merged

fix(detection): improve apache camel compare parity#795
mstykow merged 5 commits intomainfrom
fix/camel-compare-followups

Conversation

@mstykow
Copy link
Copy Markdown
Owner

@mstykow mstykow commented Apr 26, 2026

Summary

  • improve shared UTF-16 decoding so BOM-less or corrupted-BOM inputs stay on the decoded-text path, including the Camel template files that previously missed Apache-2.0
  • recover Apache, Spring, and OpenShift notice attributions and trim written-by prose spillover in shared author heuristics, including the Camel SBOM XML false-positive sentence
  • filter version-shaped SBOM pseudo-emails before truncation and record the explicit Apache Camel benchmark checkpoint in docs/BENCHMARKS.md

Issues

  • Covers: Apache Camel compare target from docs/implementation-plans/package-detection/PARSER_VERIFICATION_SCORECARD.md
  • Closes: none

Scope and exclusions

  • Included: shared text decoding in src/utils/file.rs, shared copyright/author heuristics, shared email host filtering, Camel benchmark/chart refresh
  • Explicit exclusions: unrelated remaining Camel compare deltas outside the three targeted gaps, such as the longstanding top-level package-count mismatch

Intentional differences from Python

  • keep the UTF-16 improvement generic by recognizing BOM-less and corrupted-BOM UTF-16 shapes before falling back to Latin-1 or binary-string extraction
  • prefer unique SBOM emails before the cap only after filtering version-shaped pseudo-hosts, instead of keeping obviously fake package-version addresses in the truncated result set

Follow-up work

  • Created or intentionally deferred:
    • recorded compare artifacts:
      • baseline before these fixes: .provenant/compare-runs/20260426T074449Z-camel-39047
      • first fully-fixed benchmark checkpoint: .provenant/compare-runs/20260426T202057Z-camel-80585
      • immediate repeat for timing variance: .provenant/compare-runs/20260426T203456Z-camel-82481
    • the benchmark row uses camel-80585; the repeat camel-82481 was retained only to check fluctuation and came out 7.84s slower (1.67%), which suggests the earlier 473s vs 514s spread was mostly different code state rather than pure runtime noise
    • remaining non-targeted Camel compare deltas are left for later triage

Expected-output fixture changes

  • Files changed: none
  • Why the new expected output is correct: this branch changes shared scanner behavior and validates it with targeted Rust tests plus repeated compare-output artifacts; no checked-in golden or expected-output fixtures required updates

mstykow and others added 5 commits April 26, 2026 22:46
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
@mstykow mstykow merged commit 0c9d9fc into main Apr 26, 2026
15 checks passed
@mstykow mstykow deleted the fix/camel-compare-followups branch April 26, 2026 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant