Skip to content

Conversation

@oxnick
Copy link
Collaborator

@oxnick oxnick commented Nov 12, 2025

Description

Fixed a critical issue where only the first occurrence of a secret pattern was being detected and masked in log messages. The MaskerLogger now properly detects and masks all instances of secrets within the same string, significantly improving security coverage.

Related Issue

Fixes # (if there's an associated issue number)

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Code refactoring
  • Performance improvement
  • Test improvements

Changes Made

  • Fixed regex matching in ahocorasick_regex_match.py: Changed from regex.search() to regex.finditer() to find all occurrences of each pattern in a line, not just the first one
  • Completely rewrote _mask_secret() method in masker_formatter.py: Replaced the problematic string replacement approach with a position-based masking system that:
    • Handles multiple occurrences of the same secret correctly
    • Processes both grouped and non-grouped regex matches
    • Applies replacements from right to left to preserve match positions
    • Avoids issues with overlapping or similar matches
  • Added comprehensive tests: Created two new test cases to verify multiple leak detection works for both identical and different secret types

Testing

  • All existing tests pass
  • Added new tests for the changes
  • Tested manually (describe below)

Manual Testing Steps

  1. Created log messages with multiple instances of the same password: "First password=secretpassword and second password=anothersecret and third password=secretpassword"
  2. Verified that all three password instances are now properly masked
  3. Tested with multiple different secret types in the same string to ensure comprehensive coverage
  4. Confirmed that existing single-secret masking functionality remains unaffected

Checklist

  • My code follows the project's style guidelines
  • I have performed a self-review of my code
  • I have commented my code where necessary
  • I have added/updated docstrings for all functions and classes
  • I have added type annotations to all functions and classes
  • My changes generate no new linting errors
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Screenshots (if applicable)

Before Fix:

First password=secretpassword and second password=anothersecret and third password=secretpassword
# Only first instance would be masked

After Fix:

First password=************** and second password=************* and third password=**************
# All instances are now properly masked

Additional Context

This fix addresses a significant security vulnerability where sensitive data could leak through logs if the same secret appeared multiple times in a single log message. The previous implementation used msg.replace(group, masked_part, 1) which explicitly limited replacement to only the first occurrence.

The new implementation:

  1. Uses finditer() to find all regex matches in the text
  2. Collects all match positions and their replacements
  3. Applies masking from right to left to avoid position shifting
  4. Properly handles both regex groups and full matches

This ensures comprehensive secret detection and masking while maintaining backward compatibility with existing functionality.


Note

Switch to find-all regex matching and a position-based masking algorithm to mask multiple/overlapping secrets, with extensive tests; bump version to 1.1.0b2.

  • Masking engine:
    • Use regex.finditer() in maskerlogger/ahocorasick_regex_match.py to collect all matches per pattern.
    • Rewrite _mask_secret in maskerlogger/masker_formatter.py to a position-based masker that:
      • Masks multiple occurrences and overlapping spans.
      • Prefers capture groups; falls back to full match when groups are empty/None or absent.
      • Honors configurable redact percentage across groups and full matches.
  • Tests (tests/test_masked_logger.py):
    • Add cases for multiple identical secrets, multiple different secrets, overlapping matches, empty/None capture groups, no-capture fallback, mixed paths, and varying redact percentages.
  • Version: bump pyproject.toml to 1.1.0b2.

Written by Cursor Bugbot for commit 5d73052. This will update automatically on new commits. Configure here.

@oxnick oxnick requested a review from aviadlevy November 12, 2025 14:51
@oxnick oxnick self-assigned this Nov 12, 2025
@codecov-commenter
Copy link

codecov-commenter commented Nov 12, 2025

Codecov Report

❌ Patch coverage is 96.55172% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 92.65%. Comparing base (cb158aa) to head (5d73052).

Files with missing lines Patch % Lines
maskerlogger/masker_formatter.py 96.42% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #22      +/-   ##
==========================================
+ Coverage   91.82%   92.65%   +0.83%     
==========================================
  Files           4        4              
  Lines         159      177      +18     
==========================================
+ Hits          146      164      +18     
  Misses         13       13              
Flag Coverage Δ
unittests 92.65% <96.55%> (+0.83%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@oxnick oxnick force-pushed the feat/catch-multiple-secrets-in-str branch from d36b5f2 to 4225008 Compare November 12, 2025 15:08
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is being reviewed by Cursor Bugbot

Details

Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

Copy link
Collaborator

@aviadlevy aviadlevy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once missing lines are covered in tests, we're good

@aviadlevy aviadlevy merged commit 7e8a016 into main Nov 13, 2025
8 checks passed
@aviadlevy aviadlevy deleted the feat/catch-multiple-secrets-in-str branch November 13, 2025 09:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants