Skip to content

Add Custom Redaction Rules (Block Words, Dates, Regex Patterns)#16

Merged
karant-dev merged 5 commits intomainfrom
copilot/add-custom-redaction-rules
Dec 12, 2025
Merged

Add Custom Redaction Rules (Block Words, Dates, Regex Patterns)#16
karant-dev merged 5 commits intomainfrom
copilot/add-custom-redaction-rules

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Dec 12, 2025

  • Update types: Add blockWords (string[]), customRegex (CustomRegexRule[]), and customDates (string[]) to DetectionSettings
  • Add helper utils: Create date parsing utility (datePatterns.ts) to generate multiple date format regex patterns from a date string
  • Update useDetectionSettings hook: Add functions to manage block words, custom regex patterns, and custom dates with validation
  • Update detection logic: Modify detection flow in ocr.ts and useOCR.ts to include block words, dates, and custom regex
  • Update Settings UI: Add "Advanced" section to SettingsDropdown.tsx with Block Words, Custom Dates, and Custom Regex fields with validation
  • Build and lint passes
  • Run code review and address feedback
  • Run CodeQL security check (no issues found)
  • Fix over-redaction issue: Updated overlap detection to require text-based validation (word must contain/match the sensitive text)
  • Refactor: Extract hasValidOverlap helper function to reduce code duplication

Bug Fix

Fixed issue where custom rules (block words, dates, regex) would redact entire lines instead of just the matching text. The fix adds text-based validation on top of positional overlap detection - now a word is only redacted if it actually contains the matched text or vice versa.

Before: DATE:12-12-25*TIME:06:22* would be entirely redacted when matching date 12-12-25
After: Only the specific date portion is redacted

Screenshots

Settings dropdown

Original prompt

This section details on the original issue you should resolve

<issue_title>[Feat]: Custom Redaction Rules (Regex & Blacklist)</issue_title>
<issue_description>### Is your feature request related to a problem?
The default detectors don't catch my proprietary Project Names (e.g. 'Project Titan') or specific Invoice IDs.

Describe the solution you'd like

Add an 'Advanced' section to Settings with two fields:

  1. Block Words: List of words to always redact (e.g. names, birth dates, etc).
  2. Custom Regex: User-defined regex patterns (e.g. 'INV-\d{4}').

Describe alternatives you've considered

Drawing manual boxes over them.

Additional context

Important: Validate regex to prevent crashes. Handle case-sensitivity options.</issue_description>

<agent_instructions>Be smart about this, instead of just using plain string, for something like birth dates, parse the date, and redact that date in any format dates can show up in</agent_instructions>

Comments on the Issue (you are @copilot in this section)


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Co-authored-by: karant-dev <22682299+karant-dev@users.noreply.github.com>
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Dec 12, 2025

Deploying autoredact with  Cloudflare Pages  Cloudflare Pages

Latest commit: dca4867
Status: ✅  Deploy successful!
Preview URL: https://7aee0d47.autoredact.pages.dev
Branch Preview URL: https://copilot-add-custom-redaction.autoredact.pages.dev

View logs

…ate validation

Co-authored-by: karant-dev <22682299+karant-dev@users.noreply.github.com>
Copilot AI changed the title [WIP] Add custom redaction rules for project names and invoice IDs Add Custom Redaction Rules (Block Words, Dates, Regex Patterns) Dec 12, 2025
Copilot AI requested a review from karant-dev December 12, 2025 15:56
Copy link
Copy Markdown
Owner

@karant-dev karant-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested custom string (using my DL number) and date and it worked decently (caught dates even with different separator) but it redact the whole line it found those in instead of just the date or custom string

Source image, I've manually made the red redaction as that is my DL no.

Image after redaction, note full lines redacted where string and date matched

…tive text

Co-authored-by: karant-dev <22682299+karant-dev@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Dec 12, 2025

I tested custom string (using my DL number) and date and it worked decently (caught dates even with different separator) but it redact the whole line it found those in instead of just the date or cust...

Fixed in f7d639e. The overlap detection now requires text-based validation - a word is only redacted if it actually contains the matched text (or vice versa for multi-word matches). This prevents over-redaction when OCR groups text together like DATE:12-12-25*TIME:06:22*.

Co-authored-by: karant-dev <22682299+karant-dev@users.noreply.github.com>
Copy link
Copy Markdown
Owner

@karant-dev karant-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works

@karant-dev karant-dev marked this pull request as ready for review December 12, 2025 16:15
@karant-dev karant-dev merged commit d80943a into main Dec 12, 2025
5 checks passed
@karant-dev karant-dev deleted the copilot/add-custom-redaction-rules branch December 12, 2025 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feat]: Custom Redaction Rules (Regex & Blacklist)

2 participants