v2.0.0: 93% Recall at 8% Noise Rate (SecretBench), "Dark Matter" Coverage and Performance
LatestDeepSecrets 2.0 is here! 🚀
Now tested against the SecretBench Benchmark with:
- 93% Recall
- 8% Noise Rate
- ~9K Extra Findings outside the benchmark scope
Improvements
Coverage
- Nested Formats Parsing:
LexerTokenizeris now able to detect code nesting and correctly parse situations like "inline YAML inside YAML inside Markdown". - CheapVarDetector: Detects potential variable declarations even in "unlexable" code.
- Edge-cases: Better variable extraction for edge cases in Shell, JavaScript, Markdown, PHP, and C#.
- Deeper Language Support: Expanded native tracking surface for R(d), Ruby, and Nix configurations.
- Tighter Regexes: Improved "classic" backup regexes for secrets from AWS, Stripe, MailChimp, and our favorite
-----BEGINconstructions.
Precision
- Confidence Scoring: Every finding candidate is dynamically scored using a system that evaluates naming layouts, value entropy, and semantic "naturalness" to drop false positive rate.
Performance and Stability
- ~30% faster and more reliable for large files (up to 200 MB) with rich semantics.
- The UI now shows the progress and estimates for each file, as well as the overall progress.
Important Changes
Switching to SARIF reports
Warning
The legacy JSON report format is now deprecated and will be removed in the next major release. For now, you can still select it via --outformat json, which will trigger a deprecation warning.
We are switching to the industry-standard SARIF (v2.1.0) format to provide seamless integration with orchestration, CI/CD pipelines, and ASPM systems like GitHub Security and DefectDojo:
- Virtual Subrules: (e.g.,
S105-LOW) to enforce properprecisionandsecurity-severityinside third-party dashboards. - Smart Tracking (
partialFingerprints): Populates fingerprints of findings. Even with automatic masking enabled, downstream systems can track moving secrets without duplicating them.
Warning
Breaking Change for Existing Alerts:
If you have used DeepSecrets before and already have a set of deduplicated findings in GitHub or DefectDojo, switching to Virtual Subrules will dynamically alter rule IDs (e.g., changing them to S105-LOW or S105-HIGH). This will likely cause your platforms to treat them as new issues, creating a one-time wave of duplicate alerts. I am truly sorry for this temporary inconvenience, but this change is vital for proper semantic precision mapping going forward.
📦 Quick Upgrade
pip install --upgrade deepsecretsFull Changelog: v1.4.0...v2.0.0