# **Detecting PII in LLM-Powered Chat Applications**

by [Grayson Adkins](https://twitter.com/GraysonAdkins), updated April 17, 2024  

This notebook demonstrates how to evaluate user prompts and LLM responses for personally identifiable infromation (PII) such as contact information, financial or banking info, digital identifiers, job related data, or other sensitive personal information.  

We use the [`bigcode/starpii`](https://huggingface.co/bigcode/starpii) PII detection model available on Hugging Face plus LangChain for crafting prompt templates and TruLens for running evaluation and visualizing results.

<a href="https://colab.research.google.com/drive/1hDIIgKUJVoxm_ymglD3w_Z7IQasRwyqA?usp=sharing" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Attribution

 This notebook builds on [examples provided by TruLens](https://github.com/truera/trulens/tree/main/trulens_eval/examples).  


## Disclaimer

Both TruLens and LangChain are new frameworks with rapidly changing interfaces. I found several deprecated or broken features that I had to resolve while working on this notebook. Be advised that you may similarly find issues with the code here, due to those dependencies.

## Install dependencies

In [13]:
!pip install -qU trulens_eval langchain

[0m

In [21]:
!pip install -qU langchain_openai

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/309.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m309.7/309.7 kB[0m [31m9.1 MB/s[0m eta [36m0:00:00[0m
[0m

In [15]:
import os
from dotenv import load_dotenv,find_dotenv

# # Load OPENAI_API_KEY from local .env file
# load_dotenv(find_dotenv())

# Or set it like this
os.environ["OPENAI_API_KEY"] = "sk-..."

## Print key to check
# print(os.environ["OPENAI_API_KEY"])

## Set up TruLens

In [None]:
from trulens_eval import Feedback
from trulens_eval import OpenAI as trulens_provider_openai
from trulens_eval import Tru

tru = Tru()
tru.reset_database()

In [22]:
# Imports from langchain to build app. You may need to install langchain first
# with the following:
# ! pip install langchain>=0.0.170
from langchain.chains import LLMChain
from langchain_openai import OpenAI
from langchain.prompts import PromptTemplate
from langchain.prompts.chat import HumanMessagePromptTemplate, ChatPromptTemplate

In [23]:
full_prompt = HumanMessagePromptTemplate(
    prompt=PromptTemplate(
        template=
        "Provide a helpful response with relevant background information for the following: {prompt}",
        input_variables=["prompt"],
    )
)

chat_prompt_template = ChatPromptTemplate.from_messages([full_prompt])

llm = OpenAI(temperature=0.9, max_tokens=128)

chain = LLMChain(llm=llm, prompt=chat_prompt_template, verbose=True)

In [24]:
prompt_input = 'Sam Altman is the CEO at OpenAI, and uses the password: password1234 .'

## TruLens feedback functions

This feedback function includes chain of thought reasoning.

In [27]:
from trulens_eval.feedback.provider.hugs import Huggingface

# Hugging Face based feedback function collection class
hf_provider = Huggingface()

In [None]:
# Define a pii_detection feedback function using HuggingFace.
# By default this will check language match on the main app input
f_pii_detection = Feedback(hf_provider.pii_detection_with_cot_reasons).on_input()

## Create TruLens recorder

In [None]:
tru_recorder = TruChain(chain,
    app_id='Chain1_ChatApplication',
    feedbacks=[f_pii_detection])

## Execute eval

In [None]:
with tru_recorder as recording:
    llm_response = chain(prompt_input)

display(llm_response)

## Display results

In [None]:
records, feedback = tru.get_records_and_feedback(app_ids=[])

# Make it a little easier to read
import pandas as pd

pd.set_option("display.max_colwidth", None)
records[["input", "output"] + feedback]

In [None]:
tru.run_dashboard() # open a local streamlit app to explore

# tru.stop_dashboard() # stop if needed