# Input Data

##### The input email data is generated using Google's Gemini API. To make the use case realistic, emails of different categories are generated. The categories include:
- Refund & Return Requests
- Account, Billing & Subscription Inquiries
- Shipping & Order Status
- Mixed / Complex Scenarios

Number of sample emails : 50

In [1]:
import pandas as pd 
df = pd.read_csv("data_inputs/emails.csv")
display(df.head())

Unnamed: 0,id,subject,body
0,1,Trouble connecting my new headphones,"Hi Support,\nI bought the SonicBoom Wireless H..."
1,2,Error code 404 during setup,"Hello,\nI am trying to install the Vector Grap..."
2,3,Watch battery draining too fast,"To whom it may concern,\nI’ve noticed that my ..."
3,4,Coffee machine making strange noises,"Hi there,\nMy BaristaPro Espresso Maker starte..."
4,5,Hub not finding devices,"Good morning,\nI recently upgraded my router, ..."


Sample Email

In [2]:
print(f"**Subject**:\n{df.iloc[0].subject}*\n\n**Body**:\n{df.iloc[0].body}")

**Subject**:
Trouble connecting my new headphones*

**Body**:
Hi Support,
I bought the SonicBoom Wireless Headphones last Tuesday, and I’m having a really hard time keeping them connected to my phone. They pair fine initially, but after about 10 minutes, the audio cuts out completely. I’ve tried resetting them and forgetting the device on my iPhone, but the issue persists. Can someone help me troubleshoot this? I need them for my commute.
Best regards,
Jason Miller


### Extracting structured information with LLM

- We will use a LLM based approach to extract structured information from the sample emails
- To do so, we first define a structured output in which we expect the LLM to extract the information
 

Few Questions :
1. What information can be extracted without using an LLM in a realistic scenario?
- We can extract metadata like sender name and email, email time from the email address
2. Is intent a fixed category (Are there any fixed possible number of possible intents)
- For the sake of simplicity lets assume that intent is not limited to a fixed category. Which means we can have any different types of intents. In a realistic scenario, we should have a limited set of intents to which all incidents map, and all unmappable should be in others category
3. If a particular information is not extractable in the email, what should we do about it?
- To avoid false information and avoid LLM's hallucination we will let the LLM return None for anything it cannot understand
4. What are the mandatory fields?
- Intent and the product information is required to understand the request
- Some customers might not mention their name in the email hence customer name can be optional
- Requested action can also be missing as the customer might need some help in finding the resolution


### RequestAttributes Fields

| Field Name         | Description                                                                                       | Mandatory       |
|--------------------|---------------------------------------------------------------------------------------------------|-----------------|
| **intent**          | Nature of the request (e.g., refund, replacement).                                                | Yes             |
| **customer_name**   | Name of the customer. Use `None` if not mentioned.                                                | No              |
| **product**         | Name of the product (e.g., Bluetooth speaker, mixer grinder).                                     | Yes             |
| **requested_action**| Customer's desired action (e.g., refund, replacement). Use `None` if there isn’t enough information.| No              |


In [13]:
from groq_utils import RequestAttributes,get_email_attributes, email_structure

In [5]:
test_data = df.iloc[0]
test_input = email_structure.format(subject=test_data.subject,
                       body=test_data.body)


In [6]:
test_output = await get_email_attributes(test_input)

# Now lets run this for all the sample emails in the dataframe

To run this over the data we have
- we will make asynchronous calls to the LLMs as the data is big and it is not a good practices to synchronously loop over this
- once we have run it for all the data we will create additional columns to the dataframe
- at last we will save the dataframe as output

In [8]:
email_dataset = df.apply(lambda x: email_structure.format(subject=x.subject,body = x.body),axis=1).to_list()

In [9]:
from typing import List
import asyncio
async def run_for_all_examples(list_of_email:List[str]):
    results = await asyncio.gather(*(get_email_attributes(email_item) for email_item in email_dataset))
    return results

In [64]:
results = await run_for_all_examples(email_dataset)

Request failed for:  etro Red" colorway, size 11. Your website says "Out of Stock," but I was wondering if you have any inventory in your physical warehouse that isn't listed online? Or do you know when you will restock?
Michael Jordan
Error code: 429 - {'error': {'message': 'Rate limit reached for model `openai/gpt-oss-120b` in organization `org_01jafq4jnfep6r02ahnt3y1m95` service tier `on_demand` on tokens per minute (TPM): Limit 8000, Used 7843, Requested 390. Please try again in 1.7475s. Need more tokens? Upgrade to Dev Tier today at https://console.groq.com/settings/billing', 'type': 'tokens', 'code': 'rate_limit_exceeded'}}
Request failed for:  ing Gown. I selected standard shipping, but I just found out I need it for an event this Friday. Is it possible to pay the difference to upgrade to Overnight or 2-Day shipping? Please let me know ASAP so it doesn't ship standard.
Rachel Green
Error code: 429 - {'error': {'message': 'Rate limit reached for model `openai/gpt-oss-120b` in org

### In the above sample we see that we are facing rate limit exceeded error, so we will have to handle rate limits gracefully


To handle this we will do the following things:
- Limit the execution of the tasks to only a few workers (To prevent concurrent request error)
- Induce a small sleep timer after each request to prevent (To prevent request per minute error)

In [11]:
from typing import List
import asyncio
async def worker(item, sem):
    async with sem:          # acquire semaphore
        await asyncio.sleep(0.2) # we will create a small sleep time
        return await get_email_attributes(item)  # actual work
    
async def run_for_all_examples(list_of_email:List[str]):
    sem = asyncio.Semaphore(2)   # allow max 2 tasks at a time
    tasks = [worker(email_item, sem) for email_item in email_dataset]
    results = await asyncio.gather(*tasks)
    return results

In [14]:
results = await run_for_all_examples(email_dataset)

In [15]:
# Finally populate the dataframe from all the values
for attribute in RequestAttributes.model_fields.keys():
    df[attribute] = [i.get(attribute,"") for i in results]

In [None]:
# saving the dataframe
df.to_csv("data_output/results.csv",index=False)

In [107]:
df

Unnamed: 0,id,subject,body,email_content,intent,customer_name,product,requested_action
0,1,Trouble connecting my new headphones,"Hi Support,\nI bought the SonicBoom Wireless H...",Subject: Error code 404 during setup \n Body:H...,troubleshoot connectivity,Jason Miller,SonicBoom Wireless Headphones,troubleshoot
1,2,Error code 404 during setup,"Hello,\nI am trying to install the Vector Grap...",Subject: Error code 404 during setup \n Body:H...,installation_error,Sarah Jenkins,Vector Graphics Suite 2024,troubleshoot installation
2,3,Watch battery draining too fast,"To whom it may concern,\nI’ve noticed that my ...",Subject: Error code 404 during setup \n Body:H...,battery performance issue,Amit Patel,Apex Smartwatch Gen 5,investigate battery issue
3,4,Coffee machine making strange noises,"Hi there,\nMy BaristaPro Espresso Maker starte...",Subject: Error code 404 during setup \n Body:H...,technical support,Elena Rodriguez,BaristaPro Espresso Maker,technician
4,5,Hub not finding devices,"Good morning,\nI recently upgraded my router, ...",Subject: Error code 404 during setup \n Body:H...,connectivity issue,Michael Chang,HomeGuard Security Hub,assist
5,6,Display issues on new laptop,"Hi Team,\nI received my Titan Gaming Laptop X1...",Subject: Error code 404 during setup \n Body:H...,display issue,Jessica Bloom,Titan Gaming Laptop X15,repair
6,7,Broken screen in shipment,"Hello,\nI am extremely disappointed. I just op...",Subject: Error code 404 during setup \n Body:H...,damage,David Thorne,UltraView 4K Monitor,replacement or full refund
7,8,You sent the wrong color,"Hi,\nI ordered the Velvet Armchair in Navy Blu...",Subject: Error code 404 during setup \n Body:H...,wrong color,Priya Singh,Velvet Armchair,return and send correct color
8,9,Returning the vacuum,"Hi Customer Service,\nI would like to request ...",Subject: Error code 404 during setup \n Body:H...,refund,Tom Hansen,Cyclone Stick Vacuum,refund
9,10,Cancel and refund please,"Hello,\nMy son accidentally purchased the Gala...",Subject: Error code 404 during setup \n Body:H...,refund,Linda Wei,Galactic Raiders Game,refund
