#Conversation Management & JSON Extraction (Groq/OpenAI-compatible)

This notebook implements:

Task 1 — conversation history manager with truncation options and periodic (k-th run) summarization using Groq's OpenAI-compatible API.

Task 2 — structured extraction using a JSON schema + function-calling style; demonstrates parsing + validation on sample chats.

In [1]:
# Install dependencies
!pip install openai requests tqdm



In [2]:
import os
from openai import OpenAI

# 🔑 Set Groq API key
os.environ["GROQ_API_KEY"] = "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"   # replace with your key
GROQ_API_KEY = os.environ["GROQ_API_KEY"]

# ✅ Initialize Groq client (OpenAI-compatible)
client = OpenAI(
    api_key=GROQ_API_KEY,
    base_url="https://api.groq.com/openai/v1"
)

# Pick a Groq-supported model
MODEL_NAME = "llama-3.1-8b-instant"

In [None]:
from typing import List, Dict, Optional

class ConversationManager:
    def __init__(self, summarize_every_k:int=3, summarization_model:Optional[str]=MODEL_NAME):
        self.history: List[Dict[str,str]] = []
        self.summary: Optional[str] = None
        self.turn_count: int = 0
        self.k = summarize_every_k
        self.summarization_model = summarization_model

    def add_message(self, role: str, content: str, do_summarize:bool=True):
        assert role in ("user", "assistant", "system")
        self.history.append({"role": role, "content": content})
        self.turn_count += 1

        if do_summarize and self.k > 0 and (self.turn_count % self.k == 0):
            self.summarize_history()

    def summarize_history(self):
        if not self.history:
            return None

        convo_text = "\n".join([f"{m['role']}: {m['content']}" for m in self.history])

        try:
            resp = client.chat.completions.create(
                model=self.summarization_model,
                messages=[
                    {"role": "system", "content": "You are a concise summarizer. Focus on user intents, key facts, and decisions."},
                    {"role": "user", "content": convo_text}
                ],
                temperature=0.2,
                max_tokens=200,
            )
            summary_text = resp.choices[0].message.content.strip()
        except Exception as e:
            print("API summarization failed, falling back. Error:", e)
            summary_text = self._local_summary(convo_text)

        # ✅ Reset history with only the summary
        self.summary = summary_text
        self.history = [{"role": "system", "content": f"SUMMARY: {summary_text}"}]
        return summary_text

    def _local_summary(self, text:str, max_sentences:int=4) -> str:
        import re
        sentences = re.split(r'(?<=[.!?])\s+', text.strip())
        return " ".join(sentences[:max_sentences]) or text[:200]

    def truncate_by_turns(self, n:int):
        self.history = self.history[-n:] if n > 0 else []

    def truncate_by_chars(self, max_chars:int):
        kept, total = [], 0
        for msg in reversed(self.history):
            l = len(msg['content'])
            if total + l > max_chars:
                break
            kept.insert(0, msg)
            total += l
        self.history = kept

    def show_history(self):
        print("---- HISTORY ----")
        for i, m in enumerate(self.history):
            print(f"[{i}] {m['role']}: {m['content'][:400]}")
        print("---- SUMMARY (cached) ----")
        print(self.summary)
        print("---- turn_count:", self.turn_count, "----")


In [None]:
cm = ConversationManager(summarize_every_k=3)

samples = [
    ("user", "Hello, I'm planning a trip to Goa next month. Can you help?"),
    ("assistant", "Sure — when are you traveling and how many people?"),
    ("user", "I'm going from 12 Oct to 16 Oct, it's just me."),  # triggers summary
    ("assistant", "Great. Do you have a budget and preferred activities?"),
    ("user", "I like beaches and local food. budget around 30k INR."),
    ("assistant", "Noted. Would you like hotel suggestions?"),   # triggers summary
    ("user", "Yes, please. Also I have a dietary restriction: no shellfish."),
]

print("Adding conversation messages and showing periodic summarization...\n")
for role, txt in samples:
    cm.add_message(role, txt)
    cm.show_history()
    print("\n---\n")


Adding conversation messages and showing periodic summarization...

---- HISTORY ----
[0] user: Hello, I'm planning a trip to Goa next month. Can you help?
---- SUMMARY (cached) ----
None
---- turn_count: 1 ----

---

---- HISTORY ----
[0] user: Hello, I'm planning a trip to Goa next month. Can you help?
[1] assistant: Sure — when are you traveling and how many people?
---- SUMMARY (cached) ----
None
---- turn_count: 2 ----

---

---- HISTORY ----
[0] system: SUMMARY: Key facts: 
- Travel dates: 12 Oct to 16 Oct
- Number of travelers: 1 (Solo)
- Destination: Goa
---- SUMMARY (cached) ----
Key facts: 
- Travel dates: 12 Oct to 16 Oct
- Number of travelers: 1 (Solo)
- Destination: Goa
---- turn_count: 3 ----

---

---- HISTORY ----
[0] system: SUMMARY: Key facts: 
- Travel dates: 12 Oct to 16 Oct
- Number of travelers: 1 (Solo)
- Destination: Goa
[1] assistant: Great. Do you have a budget and preferred activities?
---- SUMMARY (cached) ----
Key facts: 
- Travel dates: 12 Oct to 16 Oct
- 

In [None]:
# Demonstrate truncation options
print("Before truncation:")
cm.show_history()

print("\nTruncate last 2 turns:")
cm.truncate_by_turns(2)
cm.show_history()

print("\nNow truncate by chars (max 80 chars):")
cm.truncate_by_chars(80)
cm.show_history()


Before truncation:
---- HISTORY ----
[0] system: SUMMARY: SUMMARY: 
- Travel dates: 12 Oct to 16 Oct
- Number of travelers: 1 (Solo)
- Destination: Goa
- Budget: 30k INR
- Interests: Beaches, local food

Next step: Hotel suggestions or activity planning?
[1] user: Yes, please. Also I have a dietary restriction: no shellfish.
---- SUMMARY (cached) ----
SUMMARY: 
- Travel dates: 12 Oct to 16 Oct
- Number of travelers: 1 (Solo)
- Destination: Goa
- Budget: 30k INR
- Interests: Beaches, local food

Next step: Hotel suggestions or activity planning?
---- turn_count: 7 ----

Truncate last 2 turns:
---- HISTORY ----
[0] system: SUMMARY: SUMMARY: 
- Travel dates: 12 Oct to 16 Oct
- Number of travelers: 1 (Solo)
- Destination: Goa
- Budget: 30k INR
- Interests: Beaches, local food

Next step: Hotel suggestions or activity planning?
[1] user: Yes, please. Also I have a dietary restriction: no shellfish.
---- SUMMARY (cached) ----
SUMMARY: 
- Travel dates: 12 Oct to 16 Oct
- Number of travelers: 

# -----------------------------
# **TASK 2: JSON Schema Extraction**
# -----------------------------

In [None]:
import json
import re
json_schema = {
    "name": "extract_user_info",
    "description": "Extract name, email, phone, location, age from user messages",
    "parameters": {
        "type": "object",
        "properties": {
            "name": {"type": "string"},
            "email": {"type": "string"},
            "phone": {"type": "string"},
            "location": {"type": "string"},
            "age": {"type": "integer"}
        },
        "required": ["name", "email", "phone", "location", "age"]
    }
}

def extract_info_fixed(chat_text: str) -> dict:
    try:
        response = client.chat.completions.create(
            model=MODEL_NAME,
            messages=[{"role": "user", "content": chat_text}],
            functions=[json_schema],
            function_call={"name": "extract_user_info"},
            temperature=0,
        )
        msg = response.choices[0].message
        func_call = getattr(msg, "function_call", None)
        if func_call is None:
            return {}
        func_args_str = getattr(func_call, "arguments", "{}")
        return json.loads(func_args_str)
    except Exception:
        # fallback regex extraction
        name = re.search(r"My name is (\w+ \w+)|This is (\w+ \w+)|I'm (\w+ \w+)", chat_text)
        phone = re.search(r"\b\d{10}\b", chat_text)
        email = re.search(r"[\w\.-]+@[\w\.-]+", chat_text)
        location = re.search(r"in (\w+)|from (\w+)|based in (\w+)", chat_text)
        age = re.search(r"(\d{2}) years old|Age (\d{2})", chat_text)
        return {
            "name": next((g for g in name.groups() if g), "") if name else "",
            "email": email.group(0) if email else "",
            "phone": phone.group(0) if phone else "",
            "location": next((g for g in location.groups() if g), "") if location else "",
            "age": int(next((g for g in age.groups() if g), 0)) if age else None
        }

# Demo standalone extraction
print("\n### Standalone JSON Schema Extraction Demo ###\n")
sample_chats = [
    "Hi, my name is Rina Sharma. I live in Mumbai. Contact: rina.sharma@example.com, 9876543210. I am 28 years old.",
    "Hello, I'm John Doe from Delhi. Email: john.doe@example.com, phone: 9123456780. Age 35.",
]
for chat in sample_chats:
    info = extract_info_fixed(chat)
    print("Input:", chat)
    print("Extracted:", info, "\n")



### Standalone JSON Schema Extraction Demo ###

Input: Hi, my name is Rina Sharma. I live in Mumbai. Contact: rina.sharma@example.com, 9876543210. I am 28 years old.
Extracted: {'age': 28, 'email': 'rina.sharma@example.com', 'location': 'Mumbai', 'name': 'Rina Sharma', 'phone': '9876543210'} 

Input: Hello, I'm John Doe from Delhi. Email: john.doe@example.com, phone: 9123456780. Age 35.
Extracted: {'age': 35, 'email': 'john.doe@example.com', 'location': 'Delhi', 'name': 'John Doe', 'phone': '9123456780'} 



# -----------------------------
# **3** - **Integrated Version: Conversation + Extraction**
# -----------------------------

In [4]:
import pandas as pd
from typing import List, Dict, Optional
import re

# -----------------------------
# Conversation Manager (Integrated with Groq API)
# -----------------------------
class ConversationManagerEnhanced:
    def __init__(self, summarize_every_k:int=3, summarization_model:Optional[str]="llama-3.1-8b-instant"):
        self.history: List[Dict[str,str]] = []
        self.summary: Optional[str] = None
        self.turn_count: int = 0
        self.k = summarize_every_k
        self.summarization_model = summarization_model
        self.extracted_info: List[dict] = []

    def add_message(self, role: str, content: str, do_summarize:bool=True):
        assert role in ("user", "assistant", "system")
        self.history.append({"role": role, "content": content})
        self.turn_count += 1

        # Extract info from user messages
        if role == "user":
            info = extract_info_fixed(content)
            if info:
                self.extracted_info.append(info)

        # Summarize every k-th turn
        if do_summarize and self.k > 0 and (self.turn_count % self.k == 0):
            self.summarize_history()

    def summarize_history(self):
        if not self.history:
            return None

        convo_text = "\n".join([f"{m['role']}: {m['content']}" for m in self.history])
        summary_text = ""

        # Try Groq API summarization
        try:
            resp = client.chat.completions.create(
                model=self.summarization_model,
                messages=[
                    {"role": "system", "content": (
                        "You are a strict conversation summarizer.\n"
                        "- Write a single natural paragraph.\n"
                        "- Include only facts explicitly mentioned.\n"
                        "- Merge user details (name, contact, age, location, requests) "
                        "with assistant responses into one flowing summary.\n"
                        "- Do not assume or invent details."
                    )},
                    {"role": "user", "content": convo_text}
                ],
                temperature=0,
                max_tokens=250,
            )
            summary_text = resp.choices[0].message.content.strip()
        except Exception:
            pass

        # Fallback summarizer (no API)
        if not summary_text:
            user_msgs = " ".join([m['content'] for m in self.history if m['role']=='user'])
            assistant_msgs = " ".join([m['content'] for m in self.history if m['role']=='assistant'])
            summary_text = (
                f"The user shared: {user_msgs}. "
                f"The assistant responded with: {assistant_msgs}."
            )

        self.summary = summary_text
        self.history = [{"role": "system", "content": f"SUMMARY: {summary_text}"}]
        return summary_text

    def show_history(self):
        print("---- HISTORY ----")
        for i, m in enumerate(self.history):
            print(f"[{i}] {m['role']}: {m['content'][:400]}")
        print("---- SUMMARY ----")
        print(self.summary)
        print("---- turn_count:", self.turn_count, "----")

    def show_extracted_info(self):
        print("---- EXTRACTED USER INFO ----")
        if self.extracted_info:
            df = pd.DataFrame(self.extracted_info)
            display(df)
        else:
            print("No info extracted yet.")

# -----------------------------
# Example fixed extraction function
# -----------------------------
def extract_info_fixed(text: str) -> dict:
    info = {}
    name_match = re.search(r"(?:my name is|I am)\s+([A-Z][a-z]+\s[A-Z][a-z]+)", text)
    age_match = re.search(r"(\d{1,2})\s*(?:years old|yo|y/o)", text)
    email_match = re.search(r"[\w\.-]+@[\w\.-]+", text)
    phone_match = re.search(r"\b\d{10}\b", text)
    location_match = re.search(r"based in ([A-Za-z ]+)", text)

    if name_match: info["name"] = name_match.group(1)
    if age_match: info["age"] = age_match.group(1)
    if email_match: info["email"] = email_match.group(0)
    if phone_match: info["phone"] = phone_match.group(0)
    if location_match: info["location"] = location_match.group(1)

    return info if info else None

# -----------------------------
# Demo: Integrated Conversation + Extraction
# -----------------------------
cm2 = ConversationManagerEnhanced(summarize_every_k=3)
samples2 = [
    ("user", "Hi, my name is Priya Mehta. I am 22 years old, based in Bangalore. Contact: priya.mehta@example.com, 9988776655."),
    ("assistant", "Hello Priya! Nice to meet you."),
    ("user", "I want to plan a trip to Goa from 10 Nov to 15 Nov."),
    ("assistant", "Got it! Any preferences for hotels or activities?"),
    ("user", "I like beaches and local food."),
]

for role, txt in samples2:
    cm2.add_message(role, txt)
    cm2.show_history()
    print("\n---\n")

cm2.show_extracted_info()





---- HISTORY ----
[0] user: Hi, my name is Priya Mehta. I am 22 years old, based in Bangalore. Contact: priya.mehta@example.com, 9988776655.
---- SUMMARY ----
None
---- turn_count: 1 ----

---

---- HISTORY ----
[0] user: Hi, my name is Priya Mehta. I am 22 years old, based in Bangalore. Contact: priya.mehta@example.com, 9988776655.
[1] assistant: Hello Priya! Nice to meet you.
---- SUMMARY ----
None
---- turn_count: 2 ----

---

---- HISTORY ----
[0] system: SUMMARY: Priya Mehta, a 22-year-old resident of Bangalore, contacted via priya.mehta@example.com and 9988776655, initiated planning a trip to Goa from November 10th to November 15th.
---- SUMMARY ----
Priya Mehta, a 22-year-old resident of Bangalore, contacted via priya.mehta@example.com and 9988776655, initiated planning a trip to Goa from November 10th to November 15th.
---- turn_count: 3 ----

---

---- HISTORY ----
[0] system: SUMMARY: Priya Mehta, a 22-year-old resident of Bangalore, contacted via priya.mehta@example.com and 

Unnamed: 0,name,age,email,phone,location
0,Priya Mehta,22,priya.mehta@example.com,9988776655,Bangalore


In [5]:
chat_scenarios = {
    "Job Application": [
        ("user", "Hello, my name is Rahul Sharma. I am 25 years old and based in Mumbai."),
        ("assistant", "Hi Rahul! How can I assist you today?"),
        ("user", "I want help writing a job application email. My email is rahul.sharma99@gmail.com and phone number is 9876543210."),
        ("assistant", "Sure! Do you want it to be formal or casual?"),
        ("user", "Formal would be better, please."),
        ("assistant", "Alright. Which company are you applying to?"),
        ("user", "It’s for a Python Developer role at Zapare Technologies."),
        ("assistant", "Nice! Do you want me to highlight your skills or projects?"),
        ("user", "Yes, please mention Python, Django, and PostgreSQL."),
        ("assistant", "Perfect, I’ll prepare a professional draft for you."),
    ],

    "Travel Booking": [
        ("user", "Hey, I’m Anjali Verma, 30 years old, from Delhi."),
        ("assistant", "Hello Anjali! How can I help today?"),
        ("user", "I want to book a flight to Singapore from 2nd Oct to 10th Oct."),
        ("assistant", "Got it. Do you have a budget in mind?"),
        ("user", "Yes, around 40,000 INR."),
        ("assistant", "Would you prefer morning or evening flights?"),
        ("user", "Morning flights would be better."),
        ("assistant", "Noted. Do you want me to check hotels too?"),
        ("user", "Yes, please, preferably near Marina Bay."),
        ("assistant", "Understood. I’ll find options within your budget."),
    ],

    "Doctor Appointment": [
        ("user", "Hi, my name is Sameer Khan. I am 28 years old and live in Hyderabad."),
        ("assistant", "Hello Sameer! How can I help you?"),
        ("user", "I want to book a doctor appointment for next Monday."),
        ("assistant", "Okay, do you have a preferred time slot?"),
        ("user", "Yes, between 10 am and 12 pm."),
        ("assistant", "Got it. Is this for general consultation or a specialist?"),
        ("user", "A general physician would be fine."),
        ("assistant", "Do you have insurance coverage?"),
        ("user", "Yes, I do."),
        ("assistant", "Great, I’ll confirm the booking and share details."),
    ],

    "Online Shopping": [
        ("user", "Good morning, I’m Neha Gupta from Pune."),
        ("assistant", "Hi Neha! How can I assist you today?"),
        ("user", "I’m looking for recommendations for a new phone under 20,000 INR."),
        ("assistant", "Sure! Do you prefer Android or iOS?"),
        ("user", "Android, preferably with a good camera."),
        ("assistant", "Got it. Do you care more about battery life or performance?"),
        ("user", "Battery life is more important."),
        ("assistant", "Would you like 5G support?"),
        ("user", "Yes, 5G would be great."),
        ("assistant", "Okay, I’ll suggest models that meet your needs."),
    ],

    "Banking Query": [
        ("user", "Hello, I’m Kiran Kumar, 35 years old, from Chennai."),
        ("assistant", "Hello Kiran! How can I help you today?"),
        ("user", "I forgot my internet banking password."),
        ("assistant", "I can guide you through the reset process. Do you have your registered mobile number?"),
        ("user", "Yes, it’s 9123456789."),
        ("assistant", "Good. Do you also have access to your registered email?"),
        ("user", "Yes, my email is kiran.kumar35@example.com."),
        ("assistant", "Perfect. I’ll send you a reset link."),
        ("user", "Will it expire quickly?"),
        ("assistant", "Yes, the link is valid for 30 minutes only."),
    ]
}

# -----------------------------
# Run scenarios through ConversationManagerEnhanced
# -----------------------------
for scenario_name, messages in chat_scenarios.items():
    print("="*70)
    print(f"SCENARIO: {scenario_name}")
    print("="*70)

    cm = ConversationManagerEnhanced(summarize_every_k=3)

    for role, txt in messages:
        cm.add_message(role, txt)
        cm.show_history()
        print("\n---\n")

    cm.show_extracted_info()
    print("\n\n")


SCENARIO: Job Application
---- HISTORY ----
[0] user: Hello, my name is Rahul Sharma. I am 25 years old and based in Mumbai.
---- SUMMARY ----
None
---- turn_count: 1 ----

---

---- HISTORY ----
[0] user: Hello, my name is Rahul Sharma. I am 25 years old and based in Mumbai.
[1] assistant: Hi Rahul! How can I assist you today?
---- SUMMARY ----
None
---- turn_count: 2 ----

---

---- HISTORY ----
[0] system: SUMMARY: Rahul Sharma, a 25-year-old individual based in Mumbai, sought assistance with writing a job application email. His contact details include the email rahul.sharma99@gmail.com and a phone number 9876543210.
---- SUMMARY ----
Rahul Sharma, a 25-year-old individual based in Mumbai, sought assistance with writing a job application email. His contact details include the email rahul.sharma99@gmail.com and a phone number 9876543210.
---- turn_count: 3 ----

---

---- HISTORY ----
[0] system: SUMMARY: Rahul Sharma, a 25-year-old individual based in Mumbai, sought assistance with 

Unnamed: 0,name,age,location,email,phone
0,Rahul Sharma,25.0,Mumbai,,
1,,,,rahul.sharma99@gmail.com,9876543210.0





SCENARIO: Travel Booking
---- HISTORY ----
[0] user: Hey, I’m Anjali Verma, 30 years old, from Delhi.
---- SUMMARY ----
None
---- turn_count: 1 ----

---

---- HISTORY ----
[0] user: Hey, I’m Anjali Verma, 30 years old, from Delhi.
[1] assistant: Hello Anjali! How can I help today?
---- SUMMARY ----
None
---- turn_count: 2 ----

---

---- HISTORY ----
[0] system: SUMMARY: Anjali Verma, a 30-year-old from Delhi, initiated a conversation to book a flight. She specified her travel dates as between 2nd October and 10th October, with origin and destination being Delhi and Singapore respectively.
---- SUMMARY ----
Anjali Verma, a 30-year-old from Delhi, initiated a conversation to book a flight. She specified her travel dates as between 2nd October and 10th October, with origin and destination being Delhi and Singapore respectively.
---- turn_count: 3 ----

---

---- HISTORY ----
[0] system: SUMMARY: Anjali Verma, a 30-year-old from Delhi, initiated a conversation to book a flight. She sp

Unnamed: 0,age
0,30





SCENARIO: Doctor Appointment
---- HISTORY ----
[0] user: Hi, my name is Sameer Khan. I am 28 years old and live in Hyderabad.
---- SUMMARY ----
None
---- turn_count: 1 ----

---

---- HISTORY ----
[0] user: Hi, my name is Sameer Khan. I am 28 years old and live in Hyderabad.
[1] assistant: Hello Sameer! How can I help you?
---- SUMMARY ----
None
---- turn_count: 2 ----

---

---- HISTORY ----
[0] system: SUMMARY: Hello Sameer Khan, a 28-year-old residing in Hyderabad, you requested assistance with booking a doctor's appointment for next Monday.
---- SUMMARY ----
Hello Sameer Khan, a 28-year-old residing in Hyderabad, you requested assistance with booking a doctor's appointment for next Monday.
---- turn_count: 3 ----

---

---- HISTORY ----
[0] system: SUMMARY: Hello Sameer Khan, a 28-year-old residing in Hyderabad, you requested assistance with booking a doctor's appointment for next Monday.
[1] assistant: Okay, do you have a preferred time slot?
---- SUMMARY ----
Hello Sameer Khan

Unnamed: 0,name,age
0,Sameer Khan,28





SCENARIO: Online Shopping
---- HISTORY ----
[0] user: Good morning, I’m Neha Gupta from Pune.
---- SUMMARY ----
None
---- turn_count: 1 ----

---

---- HISTORY ----
[0] user: Good morning, I’m Neha Gupta from Pune.
[1] assistant: Hi Neha! How can I assist you today?
---- SUMMARY ----
None
---- turn_count: 2 ----

---

---- HISTORY ----
[0] system: SUMMARY: Good morning Neha Gupta from Pune. To address your query, a new phone under 20,000 INR was requested as a recommendation.
---- SUMMARY ----
Good morning Neha Gupta from Pune. To address your query, a new phone under 20,000 INR was requested as a recommendation.
---- turn_count: 3 ----

---

---- HISTORY ----
[0] system: SUMMARY: Good morning Neha Gupta from Pune. To address your query, a new phone under 20,000 INR was requested as a recommendation.
[1] assistant: Sure! Do you prefer Android or iOS?
---- SUMMARY ----
Good morning Neha Gupta from Pune. To address your query, a new phone under 20,000 INR was requested as a recommenda

Unnamed: 0,age,phone,email
0,35.0,,
1,,9123456789.0,
2,,,kiran.kumar35@example.com.





