## PROTECTION USING LLM-GUARD

LLM-Guard is a guardrail which we can use for a range of security checking on prompts being sent to an AI model, and responses being returned.  In this notebook we'll use it to stop sensitive information being sent out to Gemini.
The guardrail achieves this by redacting sensitive information and then recovering it once we get the response.



We'll start by suppressing warning messages.  

In [1]:
import warnings
warnings.filterwarnings('ignore')

Next, we'll install the llm-guard library. It has a key dependency on a specific version of numpy, so we'll also install that first. We'll still see an error but we can ignore it for what we're doing in this notebook.

In [None]:
%pip -q install numpy==1.25.1 llm-guard

We'll now look at an example of how we use llm-guard to scan our input.  Note we have our Google API key set in the Colab vault.  

We won't use every input and output guardrail for our demonstration. We'll use the Anonymize/Deanonymize paired guardrails, and as examples we'll select three additional input and three additional output guardrails.  We'll also be using the llm-guard vault to store the sensitive data that we anonymize in order to recover it for deanonymization.

In [3]:
import os
import google.generativeai as genai
from google.colab import userdata
from llm_guard import scan_output, scan_prompt
from llm_guard.input_scanners import Anonymize, PromptInjection, TokenLimit, Toxicity
from llm_guard.output_scanners import Deanonymize, NoRefusal, Relevance, Sensitive
from llm_guard.vault import Vault

We can now set up a handle to run prompts through the Google AI API.

In [4]:
genai.configure(api_key=userdata.get("GOOGLE_API_KEY"))

We'll set up a vault to use for managing the data we need to anonymize.  We'll load four input scanners and four output scanners.  This takes some time to run first time as it has to download scanning models.

In [None]:
vault = Vault()
input_scanners = [Anonymize(vault), Toxicity(), TokenLimit(), PromptInjection()]
output_scanners = [Deanonymize(vault), NoRefusal(), Relevance(), Sensitive()]

We'll now hard code a sensitive prompt and use the llm-guard scanner to check it.  We'll check whether we can run the query, and we'll display what the sanitized query looks like.

In [None]:
prompt = f"""
Make an SQL insert statement to add a new user to our database. His name is John Doe and his email is test@test.com. His phone number is 555-123-4567.
His credit card number is 4567-8901-2345-6789 and he works in Test LLC.
"""

sanitized_prompt, results_valid, results_score = scan_prompt(input_scanners, prompt)

Let's display the sanitized prompt that we've generated.

In [None]:
print(sanitized_prompt)

OK, we found an email address and redacted it. We also found a person element which is flagged as sensitive and redacted that.  A toxicity scann was run but did not find any toxic content. Note the phone numbers don't appear in the redacted prompt.

We can now send the redacted prompt to Gemini and then display the response.

In [None]:
model = genai.GenerativeModel("gemini-1.5-flash")
response = model.generate_content(sanitized_prompt)
print(response.text)

So Gemini has removed the credit card data that was in the original query - it has its own built in guardrail for this.  

We can now run the output scanner to recover the redacted text and check the output.

In [None]:
sanitized_response_text, results_valid, results_score = scan_output(
    output_scanners, sanitized_prompt, response.text
)

Let's now see what the final result is from running our response through the output guardrails.

In [None]:
print(sanitized_response_text)

OK, we've got our result and the originally redacted text has been added back in place of the redaction placeholders - other than the credit card data whuich was removed by Gemini. The warning from the guardrails is carried through, together with the SQL statement with the remaining sensitive data reconstituted into the output results.
