Skip to content

Patterns and Regular Expressions

Dimitris Katsiros edited this page Aug 25, 2019 · 1 revision

Patterns.json

The anonymizer service uses regular expressions to identify sensitive information in the text.

Each regural expression is considered a pattern.

All patterns are stored in patterns.json file.

Existing Patterns

The anonymizer already contains tested patterns trying to identify the following entities:

  • Phone Numbers
  • Vehicle Numbers
  • Identity Cards
  • Ibans
  • Afm
  • Amka
  • Brands
  • Addresses
  • Known Addresses
  • Names and Surnames
  • Places
  • Decision Numbers

You can turn off a pattern by simply setting "active": "False".

Adding Your Own Regular Expressions

You can always add your own regular expressions by turning on custom_regex.

After setting "active": "True", the parser will search for any given pattern inside "pattern": {}.

For example if you want to anonymize emails, you can add:

        },
        "custom_regex": {
            "active": "True",
            "pattern": {
                "my_custom_email_pattern": "(^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$)"
            }
        }

If the service ran in verbose mode, all entities identified will be named my_custom_email_pattern and will be anonymized properly.