### Using Regex for PII detection. 
##### A very computationall inexpensive means of screening a large corpus of information for the most dangerous PII, is using regular expressions for detecting: social security numbers (SSN), social insurance numbers (SIN), credit card numbers, bank account numbers, and passport numbers/codes. Note that these regular expressions are not foolproof and may need to be adapted depending on the specific formats you want to detect.
##### Regular expressions have limitations in detecting PII when it is embedded in more complex structures or when the format is highly variable (e.g. how. 

In [1]:
import re

# Sample text containing PII
text = """John Doe lives at 1234 Elm Street, Springfield, and his phone number is (555) 123-4567.
          His email address is john.doe@example.com. His SSN is 123-45-6789, and his credit card number
          is 1234-5678-9012-3456. His bank account number is 123456789012. His passport number is AB1234567."""

# Regular expressions for different types of PII
name_pattern = r'\b[A-Z][a-z]+\s[A-Z][a-z]+(?:\s[A-Z][a-z]+)?\b'
email_pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'
phone_pattern = r'\b\(?(\d{3})\)?[-.\s]?(\d{3})[-.\s]?(\d{4})\b'
address_pattern = r'\b\d{1,5}\s\w+\s(?:Street|St|Avenue|Ave|Road|Rd|Drive|Dr|Lane|Ln|Boulevard|Blvd)\b'
ssn_sin_pattern = r'\b\d{3}[-\s]?\d{2}[-\s]?\d{4}\b'
credit_card_pattern = r'\b\d{4}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}\b'
bank_account_pattern = r'\b\d{12}\b'
passport_pattern = r'\b[A-Z]{2}\d{7}\b'

# Apply regex patterns to the sample text
names = re.findall(name_pattern, text)
emails = re.findall(email_pattern, text)
phone_numbers = re.findall(phone_pattern, text)
addresses = re.findall(address_pattern, text)
ssn_sin = re.findall(ssn_sin_pattern, text)
credit_cards = re.findall(credit_card_pattern, text)
bank_accounts = re.findall(bank_account_pattern, text)
passports = re.findall(passport_pattern, text)

# Print the detected PII
print("Names:", names)
print("Emails:", emails)
print("Phone Numbers:", phone_numbers)
print("Addresses:", addresses)
print("SSN/SIN:", ssn_sin)
print("Credit Card Numbers:", credit_cards)
print("Bank Account Numbers:", bank_accounts)
print("Passport Numbers/Codes:", passports)


Names: ['John Doe', 'Elm Street']
Emails: ['john.doe@example.com']
Phone Numbers: [('555', '123', '4567')]
Addresses: ['1234 Elm Street']
SSN/SIN: ['123-45-6789']
Credit Card Numbers: ['1234-5678-9012-3456']
Bank Account Numbers: ['123456789012']
Passport Numbers/Codes: ['AB1234567']
