# Check whether an LLM response contains PII (Personally Identifiable Information)

**Using the `PIIFilter` validator**

This is a simple check that looks for the presence of a few common PII patterns
It is not intended to be a comprehensive check for PII and to be a quick check that can be used to filter out responses that are likely to contain PII. It uses the Microsoft Presidio library to check for PII.


In [1]:
# Install the necessary packages
! pip install presidio-analyzer presidio-anonymizer -q
! python -m spacy download en_core_web_lg -q


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m23.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m23.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_lg')


In [2]:
# Import the guardrails package
import guardrails as gd
from guardrails.validators import PIIFilter
from rich import print

In [3]:
# Create Guard object with this validator
# One can specify either pre-defined set of PII or SPI (Sensitive Personal Information) entities by passing in the `pii` or `spi` argument respectively.
# It can be passed either durring intialization or later through the metadata argument in parse method.

# One can also pass in a list of entities supported by Presidio to the `pii_entities` argument.
guard = gd.Guard.from_string(
    validators=[PIIFilter(pii_entities="pii", on_fail="fix")],
    description="testmeout",
)



In [4]:
# Parse the text
text = "My email address is demo@lol.com, and my phone number is 1234567890"
output = guard.parse(
    llm_output=text,
)

# Print the output
print(output)

Here, both EMAIL_ADDRESS and PHONE_NUMBER are detected as PII.


In [5]:
# Let's test with passing through metadata for the same guard object
# This will take precendence over the entities passed in during initialization
output = guard.parse(
    llm_output=text,
    metadata={"pii_entities": ["EMAIL_ADDRESS"]},
)

# Print the output
print(output)

As you can see here, only EMAIL_ADDRESS is detected as PII, and the PHONE_NUMBER is not detected as PII.


In [6]:
# Let's try with SPI entities
# Create a new guard object
guard = gd.Guard.from_string(
    validators=[PIIFilter(pii_entities="spi", on_fail="fix")],
    description="testmeout",
)

In [7]:
# Parse text
text = "My email address is demo@xyz.com, and my account number is 1234789012367654."

output = guard.parse(
    llm_output=text,
)

# Print the output
print(output)

Here, only the US_BANK_NUMBER is detected as PII, as specified in the "spi" entities. Refer to the documentation for more information on the "pii" and "spi" entities. Obviosuly, you can pass in any [Presidio-supported entities](https://microsoft.github.io/presidio/supported_entities/) through the metadata.


In [8]:
# Another example
text = "My ITIN is 923756789 and my driver's license number is 87651239"

output = guard.parse(
    llm_output=text,
    metadata={"pii_entities": ["US_ITIN", "US_DRIVER_LICENSE"]},
)

# Print the output
print(output)

#### In this way, any PII entity that you want to check for can be passed in through the metadata and masked by Guardrails for your LLM outputs. Of-course, like all other examples, you can integrate this into your own code and workflows through the complete Guard execution.
