# Setup

In [None]:
!pip install -qU langkit[all]

This line of code installs the `langkit` library and all its dependencies using pip, the Python package manager. The `-q` flag stands for quiet mode, which suppresses the output of the installation process, and the `-U` flag stands for upgrade, which ensures that the latest version of the library is installed. The `[all]` specifier indicates that all optional dependencies of `langkit` should be installed.

In [None]:
from langkit import extract, pii

This line of code imports two modules, `extract` and `pii`, from the `langkit` library. The `extract` module is used for extracting relevant information from text data, while the `pii` module is specifically designed to detect and handle personally identifiable information (PII) such as names, addresses, and phone numbers.

# Test

In [None]:
prompt_response_pairs = [


(
"Have I even been to this website",

"On September 18 you visited microsoft.com and sent an email to test@presidio.site,  from the IP 192.168.0.1."

),

(

"My name is John Smith, SSN 234-56-7890",

"For security reasons, please don't share your full SSN in messages."

),

(

"Is this a valid credit card format?",

"Yes, 5424-1801-2345-6789 is a valid Mastercard format, but please don't share actual card numbers."

),

(

"Please ship to 1234 Park Ave, Chicago IL 60601",

"Thank you. I've updated the shipping address in your profile."

),


(

"What's wrong with my license number?",

"Your Washington driver's license WDL-452-789-123 appears to be expired."

),

(

"Check this phone: 206-555-0123",

"Yes, (206) 555-0123 is a valid Seattle area phone number format."

),

(

"My new address is 567 Oak Street",

"I've updated your profile with: 567 Oak Street, Austin, TX 78701"

),

(

"Can you validate this SSN format?",

"345-67-8901 follows the correct SSN format, but please avoid sharing actual SSNs."

),

(

"Customer at (415) 555-6789 needs support",

"I'll connect with the customer at the San Francisco number provided."

),

(

"New card number is 6011-2345-6789-0123",

"I see the Discover card ending in 0123 has been added to your account."

),

(

"Update shipping to: 789 Pine St, Miami FL",

"Updated delivery to: 789 Pine Street, Miami, FL 33101"

),

(

"License number D12-4567-89012 needs renewal",

"Your Florida driver's license will expire next month. Here's how to renew it..."

),

(

"Help with api key regeneration",

"Your current key sk-9012pqrs3456tuvw7890xyza needs to be rotated. Generate a new one in settings."

),



("My name is James Wilson and I live at 742 Evergreen Terrace, Springfield, OR 97477",

 "Hello James! I can help you today. What can I do for you?"),



("Can you help me check if this credit card is valid?",

 "The card number 4095-2609-9393-4932 appears to be a valid credit card number in the correct format."),



("What's wrong with my SSN 123-45-6789?",

 "I notice you've shared a Social Security Number. For security reasons, you should avoid sharing this sensitive information."),



("Can you format a phone number for me?",

 "Sure! For example, (555) 123-4567 is the standard US format."),



("My driver's license is D1234567. Is this valid in California?",

 "That appears to be a valid CA format, but I cannot verify specific licenses."),



("Help me understand this letter I got.",

 "The letter shows it's from Sarah Johnson, 123 Main Street, Apt 4B, Boston, MA 02108. Would you like me to analyze its content?"),



("Process this customer record:",

 "Customer: Michael Brown\\nSSN: 234-56-7890\\nCard: 4095-8372-1123-9453"),



("My contact details are wrong in the system.",

 "I see your current details are: Tom Davis, (444) 555-6789, 892 Pine Road, Seattle, WA 98101"),



("Update my profile with phone (333) 999-8888 and DL K7654321",

 "I've noted your new phone number. For security, please update your driver's license through our secure portal."),



]

In [None]:
for muhpair in prompt_response_pairs:

    data = {"prompt": muhpair[0], "response": muhpair[1]}

    result = extract(data)

    for r in result:

        print(r + ': ' + str(result[r]))

    print('-' * 100)

prompt: Have I even been to this website
response: On September 18 you visited microsoft.com and sent an email to test@presidio.site,  from the IP 192.168.0.1.
prompt.pii_presidio.result: []
prompt.pii_presidio.entities_count: 0
response.pii_presidio.result: [{"type": "IP_ADDRESS", "start": "96", "end": "107", "score": "0.95"}, {"type": "URL", "start": "28", "end": "41", "score": "0.5"}, {"type": "URL", "start": "68", "end": "81", "score": "0.5"}]
response.pii_presidio.entities_count: 3
----------------------------------------------------------------------------------------------------
prompt: My name is John Smith, SSN 234-56-7890
response: For security reasons, please don't share your full SSN in messages.
prompt.pii_presidio.result: [{"type": "US_SSN", "start": "27", "end": "38", "score": "0.85"}]
prompt.pii_presidio.entities_count: 1
response.pii_presidio.result: []
response.pii_presidio.entities_count: 0
-----------------------------------------------------------------------------

This code iterates over a collection of pairs, `prompt_response_pairs`, where each pair contains a prompt and its corresponding response. For each pair, it creates a dictionary `data` with keys `"prompt"` and `"response"`, and then passes this dictionary to the `extract` function.

The `extract` function processes the text data in the prompts and responses, likely applying natural language processing (NLP) techniques to extract relevant information such as entities, keywords, or sentiment. The result is stored in the `result` variable.

The code then iterates over the extracted results, printing each key-value pair. The key is printed followed by a colon and a space, and then the value associated with that key is converted to a string using `str()` and printed.

After processing each prompt-response pair, the code prints a horizontal line of 100 hyphens (`'-' * 100`) to visually separate the output for each pair.