# üß† Inferring Insights from Text

In this lesson, we‚Äôll use LLMs to infer structured insights from unstructured text.

We'll perform tasks like:
- Sentiment classification
- Emotion recognition
- Topic detection
- Entity extraction

üîç This is particularly useful for:
- Customer complaint triage
- Risk or fraud alert pipelines
- Document classification
- Compliance analysis


In [None]:
import os
from openai import OpenAI
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

# Retrieve the API key
api_key = os.getenv("OPENAI_API_KEY")

# Sanity check (should print a masked version)
if api_key:
    print("‚úÖ API key loaded successfully.")
else:
    print("‚ùå API key not found. Please check your .env file.")

In [None]:
# Initialize OpenAI client
client = OpenAI(api_key=api_key)


In [None]:
def get_completion(prompt, model="gpt-4"):
    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        temperature=0,
    )
    return response.choices[0].message.content

## üì® Sample Complaint: Financial Services Context

We‚Äôll work with this customer complaint email throughout the notebook.

It represents a frustrated customer who may be experiencing a serious issue ‚Äî our job is to infer insights such as sentiment, topic, urgency, and more.


In [None]:
customer_email = """
Dear Team,

I‚Äôve noticed unauthorized transactions on my account over the past week. 
Despite raising a ticket 3 days ago, I haven‚Äôt received any updates. 
This is extremely frustrating and concerning given the nature of the issue.

Please escalate this matter urgently ‚Äî my confidence in your service is quickly eroding.

Regards,  
Sanjay Nair
"""


## üîç Step 1: Sentiment Classification

We‚Äôll ask the model to infer the **overall sentiment** of the customer complaint.

This is useful for:
- Triage prioritization
- Agent tone matching
- Customer satisfaction analysis


In [None]:
prompt = f"""
Classify the sentiment of the following customer message as one of: 
"positive", "neutral", or "negative".

Message:
\"\"\"{customer_email}\"\"\"
"""

sentiment = get_completion(prompt)
print("üìä Sentiment:", sentiment)


## üò° Step 2: Emotion Detection

We‚Äôll now extract specific emotions the customer might be expressing, such as:
- frustration
- confusion
- urgency
- anger
- disappointment
- relief

Understanding emotion is crucial in regulated domains, where tone can indicate risk, dissatisfaction, or escalation triggers.


In [None]:
prompt = f"""
Identify the emotions expressed in the following customer email. 
List them as comma-separated values.

Message:
\"\"\"{customer_email}\"\"\"
"""

emotions = get_completion(prompt)
print("üß† Emotions Detected:", emotions)


## üè∑Ô∏è Step 3: Topic Classification

We‚Äôll ask the model to identify the **main topic** of the customer complaint.

This is useful for:
- Routing issues to the right team
- Monitoring common themes (e.g., fraud, login issues, billing)
- Structured reporting and triage


In [None]:
prompt = f"""
Classify the main topic of the following customer complaint. 
Use a short label such as: "fraud", "technical issue", "billing", "login/access", or "other".

Message:
\"\"\"{customer_email}\"\"\"
"""

topic = get_completion(prompt)
print("üè∑Ô∏è Topic:", topic)


## üì¶ Step 4: Multi-Attribute Inference (Structured Output)

Let‚Äôs extract several structured insights from the same message:
- sentiment
- emotions (as a list)
- topic
- escalation_needed (boolean)
- urgency_level (low, medium, high)

We'll ask the LLM to return the result as a **valid JSON object**.


In [None]:
prompt = f"""
Extract structured information from the following customer email.

Return a JSON object with the following keys:
- sentiment: "positive", "neutral", or "negative"
- emotions: list of strings
- topic: short label like "fraud", "billing", "access", etc.
- escalation_needed: true or false
- urgency_level: "low", "medium", or "high"

Email:
\"\"\"{customer_email}\"\"\"
"""

structured_output = get_completion(prompt)
print(structured_output)


## üß™ Step 5: Parse and Inspect Structured Output

Once we receive a JSON object from the LLM, we can:
- Load it into Python
- Access each field
- Use the values in alerts, routing rules, or databases


In [None]:
import json

# Convert raw string to dictionary
try:
    parsed = json.loads(structured_output)

    print("‚úÖ Parsed JSON:")
    for k, v in parsed.items():
        print(f"{k}: {v}")
except json.JSONDecodeError as e:
    print("‚ùå JSON Parsing Failed:", e)
    print("\nRaw output:\n", structured_output)


## üß© Real-World Use Case: Automated Triage & Risk Detection

Teams often deal with large volumes of customer emails, audit notes, compliance feedback, and internal reports ‚Äî all in unstructured text form.

### ‚ö†Ô∏è The Challenge:
- Manually reviewing and tagging messages is time-consuming and inconsistent.
- Urgent issues (e.g., fraud, data breaches) may not be detected fast enough.
- Reporting dashboards lack structured fields like topic, urgency, or sentiment.

### üí° The LLM Solution:
By using large language models (LLMs), we can infer key insights from each message:
- **Topic**: What is the message about? (e.g., fraud, billing, login issue)
- **Sentiment**: Is the tone negative or neutral?
- **Emotions**: Is the user frustrated, confused, angry?
- **Urgency & Escalation**: Should this be prioritized or escalated?

LLMs return these insights in **structured JSON format**, enabling:
- üì® **Auto-routing** to the correct team
- ‚è±Ô∏è **SLA prioritization** for urgent issues
- üß† **Trend analysis** across thousands of messages

### ‚úÖ Outcomes:
- Faster resolution times
- Improved customer satisfaction
- Reduced compliance risk
- Better visibility into operational pain points

This approach applies across domains:
- üìÑ Regulatory reports
- üõ†Ô∏è Helpdesk tickets
- üí¨ Feedback surveys
- üßæ Financial audit logs

