Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patterns for PII scan command #5415

Closed
sanderstad opened this issue Apr 26, 2019 · 7 comments

Comments

Projects
None yet
6 participants
@sanderstad
Copy link
Contributor

commented Apr 26, 2019

This issue is meant for anyone to put in their patterns during the development of the PII scan command.

@sanderstad sanderstad added the Feature label Apr 26, 2019

@sanderstad sanderstad self-assigned this Apr 26, 2019

@Alex-Yates

This comment has been minimized.

Copy link

commented Apr 30, 2019

Here are a few for you. Mostly UK specific. :-)

UK phone numbers:
11 digits, starting with a 0. The second digit determines the type of number. (1 is a standard landline. 7 is mobile. There are others):
https://en.wikipedia.org/wiki/Telephone_numbers_in_the_United_Kingdom

UK post codes:
https://stackoverflow.com/questions/164979/uk-postcode-regex-comprehensive
Example Regex provided by UK government:
([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([A-Za-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9][A-Za-z]?))))\s?[0-9][A-Za-z]{2})

UK NHS Numbers:
https://www.nhs.uk/using-the-nhs/about-the-nhs/what-is-an-nhs-number/
These actually all contain a CHECKSUM. They are all 10 digit MOD 11s. Not sure if you want to get into that level of detail or not?

IPv4 addresses:
https://www.regular-expressions.info/ip.html

IPv6 addresses:
https://stackoverflow.com/questions/53497/regular-expression-that-matches-valid-ipv6-addresses

@robertgadach

This comment has been minimized.

Copy link

commented Apr 30, 2019

Visa is starting to issue 19-digit codes: The pattern is 4-digits, 4-digits, 4-digits, 7-digits

  • Example: 4123-1234-1234-1234567

  • Regex: (4\d{3}[-| ]\d{4}[-| ]\d{4}[-| ]\d{7})|(4\d{19})

MasterCard is issuing cards in the 222100‑272099 range

  • Example: 2720-9912-3456-7899
  • Regex: ^(?:5[1-5][0-9]{2}|222[1-9]|22[3-9][0-9]|2[3-6][0-9]{2}|27[01][0-9]|2720)[0-9]{12}$
@shaneis

This comment has been minimized.

Copy link
Contributor

commented May 2, 2019

Personal Public Service (PPS) Numbers in Ireland (equivalent to Social Security Numbers in the states) are 7 numbers followed by 1 or 2 Uppercase letters.

  • 1234567A
  • 9876543ZY
$PPS -cmatch '\d{7}([A-Z]){1,2}'
# or
$PPS -cmatch '\b\d{7}([A-Z]){1,2}\b'
@sanderstad

This comment has been minimized.

Copy link
Contributor Author

commented May 3, 2019

Thank you guys. I've added all of the patterns that were not already in there or had to be changed.

@Alex-Yates

  • Added the UK phone number
  • Added UK zip codes
  • Added the NHS number
  • The IP Addresses were already in the the file

@robertgadach

  • Added the new patterns for MasterCard
  • Added the new pattern for VISA

@shaneis
There already was a regex for the PPS number that also matched those numbers

Thanks again for your help

@potatoqualitee

This comment has been minimized.

Copy link
Member

commented May 3, 2019

frikken 💣 , sander!

sanderstad pushed a commit that referenced this issue May 3, 2019

sanderstad
Added Spanish translations
Removed redundant patterns
Fixes #5415

sanderstad pushed a commit that referenced this issue May 3, 2019

sanderstad
@ClaudioESSilva

This comment has been minimized.

Copy link
Collaborator

commented May 25, 2019

Thanks for this @sanderstad

here some patterns from Portugal:

phones numbers
Country code: +351
9 digits starting with

  • 2 - landline
  • 9 - mobile
  • 707 - premium rate
  • 800 xxx xxx Freephone
  • 808 xxx xxx Shared cost
  • 809 xxx xxx Shared cost
  • 607 xxx xxx Premium rate audio text (never saw this one)
  • 30x xxx xxx VoIP carriers

Tax fiscal number
9 digits Starting with

  • 1 to 3 - personal (single person)
  • 5 - collective person ( companies)
  • 6 - public entities
  • 90 or 91 - Condominiums, Irregular Companies, Inheritance Indivises whose successor was an individual entrepreneur;

Postal code
4digits-3digits
Example: 1000-103

@sanderstad

This comment has been minimized.

Copy link
Contributor Author

commented Jun 7, 2019

Closing this issue. New patterns can be added later on using the JSON file

@sanderstad sanderstad closed this Jun 7, 2019

potatoqualitee added a commit that referenced this issue Jun 17, 2019

Improve patterns for PII recognition (#5740)
* Added description property
Renamed social security numbers
Changed IPv4 and IPv6
Added UK Zipcode

* Renamed file

* Initial commit

* Initial commit

* Added new mastercard pattern
Added new VIS pattern

* Moved patterns around
Added UK phone number

* Added germany zip code

* Added Canadian zip code

* Added German translations

* Added French translations

* Added Spanish translations
Removed redundant patterns
Fixes #5415

* Added Dutch translations

* Added Dutch translations
Fixes #5415

* Moved pattern
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.