# Overview:
This notebook contains code to use Generative AI model for performing Redaction of PII/ Sensitive data on the given text. 
Uses Azure Open AI Chat Completions API with System message, Few Shot Training example to achieve it. 

## Getting started
- Follow the Dev Setup, Usage instructions in the [a relative link](README.MD) 
- Then execute the code blocks below

##  Load the configuration from environment variables

In [4]:
from dotenv import load_dotenv
import os

load_dotenv()  # This loads the variables from .env

azure_oai_endpoint = os.getenv("AZURE_OAI_ENDPOINT")
azure_oai_key = os.getenv("AZURE_OAI_KEY")
azure_oai_model = os.getenv("AZURE_OAI_MODEL")

if not azure_oai_endpoint or not azure_oai_key or not azure_oai_model:
    print("One or more of input values is not defined!")

## Prompt setup
- Set the System message and Few Shot message for the prompt 
- Also set an input message 

In [5]:
# System message for the prompt
systemMessage = {"role":"system",
                 "content":"""You are to help redact PII and Sensitive data from the content the user provides.
                  The nature of the content is mostly an email or multiple emails in an email thread. 
                 Sensitive data is any PII or any data that can help in identifying an entity. 
                 It includes, but not limited to email, names of individuals, companies, phone numbers, ip addresses, physical addresses, web site urls, server names, zip codes. 
                 To redact replace PII/ Sensitive words with [Redacted <type of data>]. When you redact email it should be replaced with [Redacted Email] and so on for each type of data redacted. 
                 Do redaction even if you have atleast 0.25 probability of something being sensitive. 
                 You already know to recognize an email, name of an individual, name of a company, phone number, ip address, physical address, zip code, web site url. 
                 If you dont know to recognize a server name, an example is \"MA1PR01MB4163.INDPRD01.PROD.OUTLOOK.COM\". """}

# Test messages
# fewshotTrainingMessage1prompt = {"role":"user","content":"From:ravi@zelarsoft.com"}
# fewshotTrainingMessage1completion = {"role":"assistant","content":"From:[Redacted Email]"}
# input_text = """Hi Suresh, 
# please note alternate email ravi@hotmail.com 
# """

fewshotTrainingMessage1Prompt = {"role":"user",
                                 "content":
                                 """From: John Smith <john.smith@contoso.com>
Sent: Tuesday, October 17, 2023 9:32:19 AM
To: Alex Weber <alex.weber@contoso.com>
Subject: FW: Your mailbox is almost full.
 
Fyi  
 
Thanks & Regards
John Smith
<image003.png>
Valley Springs LLC.
1582 Maple Avenue, Springfield, IL 62704, USA.
Tel: +1- (555) 123-4567,
Email: john.smith@contoso.com, Website: www.contoso.com
 
Please consider the environment before printing this E-mail
 
THIS E-MAIL IS INTENDED ONLY FOR THE ADDRESSEE(S) AND MAY CONTAIN CONFIDENTIAL INFORMATION.   IF YOU ARE NOT THE INTENDED RECIPIENT, YOU ARE HEREBY NOTIFIED THAT ANY USE OF THIS INFORMATION OR DISSEMINATION, DISTRIBUTION OR COPYING OF THIS E-MAIL IS STRICTLY PROHIBITED.  IF YOU HAVE RECEIVED THIS E-MAIL IN ERROR, PLEASE NOTIFY THE SENDER IMMEDIATELY BY RETURN E-MAIL AND DELETE THE ORIGINAL MESSAGE
 
From: MicrosoftExchange329e71ec88ae4615bbc36ab6ce41109e@contoso.com<MicrosoftExchange329e71ec88ae4615bbc36ab6ce41109e@contoso.com>
Sent: Monday, October 16, 2023 8:17 PM
To: John Smith <john.smith@contoso.com>
Subject: Your mailbox is almost full.
Importance: High
 
Your mailbox is almost full.
	
49.18 GB	49.5 GB
To make room in your mailbox, delete any items you don't need and empty your Deleted Items folder.
Learn more about storage limits.
Mailbox address:
john.smith@contoso.com"""}

fewshotTrainingMessage1Completion= {"role":"assistant",
                                    "content":"""
From: [Redacted Name] <[Redacted Email]>
Sent: Tuesday, October 17, 2023 9:32:19 AM
To: [Redacted Name] <[Redacted Email]>
Subject: FW: Your mailbox is almost full.
 
Fyi  
 
Thanks & Regards
[Redacted Name]
<image003.png>
[Redacted Company Name]
[Redacted Address]
Tel: [Redacted Phone number],
Email: [Redacted Email], Website: [Redacted Website]
 
Please consider the environment before printing this E-mail
 
THIS E-MAIL IS INTENDED ONLY FOR THE ADDRESSEE(S) AND MAY CONTAIN CONFIDENTIAL INFORMATION.   IF YOU ARE NOT THE INTENDED RECIPIENT, YOU ARE HEREBY NOTIFIED THAT ANY USE OF THIS INFORMATION OR DISSEMINATION, DISTRIBUTION OR COPYING OF THIS E-MAIL IS STRICTLY PROHIBITED.  IF YOU HAVE RECEIVED THIS E-MAIL IN ERROR, PLEASE NOTIFY THE SENDER IMMEDIATELY BY RETURN E-MAIL AND DELETE THE ORIGINAL MESSAGE
 
From: [Redacted Email]<[Redacted Email]>
Sent: Monday, October 16, 2023 8:17 PM
To: [Redacted Name] <[Redacted Email]>
Subject: Your mailbox is almost full.
Importance: High
 
Your mailbox is almost full.
	
49.18 GB	49.5 GB
To make room in your mailbox, delete any items you don't need and empty your Deleted Items folder.
Learn more about storage limits.
Mailbox address:
[Redacted Email]
"""}

input_text = """From: Adele Vance <adele.vance@fabrikam.com>
Sent: Wednesday, August 16, 2023 7:27:40 PM
To: Grady Archie <grady.archie@m365supportgroup.com>
Cc: Joni Sherma <joni.sherman@m365supportgroup.com>
Subject: Re: Microsoft 365 Deployement for fabrikam.com
 
Hi Grady,

I have shared our godaddy details in a separate email. Please check.

Thank you,
Adele
"""

prompt_message = {"role":"user", "content": input_text}
messages  = [systemMessage, fewshotTrainingMessage1Prompt, fewshotTrainingMessage1Completion, prompt_message]

## Client code
Setup Azure Open AI client and make the call. Note: This code is based on OpenAI Sdk version >= 1.0

In [8]:

from openai import AzureOpenAI

client = AzureOpenAI(
            azure_endpoint=azure_oai_endpoint,
            api_key=azure_oai_key,
            api_version="2023-07-01-preview")

completion = client.chat.completions.create(
            model = azure_oai_model,
            temperature = 0.5,
            max_tokens = 800,
            messages = messages
        )

print(completion.model_dump_json(indent=2))

{
  "id": "chatcmpl-8L3rqgcar3tGUtbqnHaHuQByaU0uY",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "From: [Redacted Name] <[Redacted Email]>\nSent: Wednesday, August 16, 2023 7:27:40 PM\nTo: [Redacted Name] <[Redacted Email]>\nCc: [Redacted Name] <[Redacted Email]>\nSubject: Re: Microsoft 365 Deployement for [Redacted Company Name]\n \nHi [Redacted Name],\n\nI have shared our [Redacted Company Name] details in a separate email. Please check.\n\nThank you,\n[Redacted Name]",
        "role": "assistant",
        "function_call": null,
        "tool_calls": null
      },
      "content_filter_results": {
        "hate": {
          "filtered": false,
          "severity": "safe"
        },
        "self_harm": {
          "filtered": false,
          "severity": "safe"
        },
        "sexual": {
          "filtered": false,
          "severity": "safe"
        },
        "violence": {
          "filtered": false,
          "