In [1]:
from groq import Groq
from dotenv import load_dotenv
import os

In [2]:
load_dotenv()
GROQ_API_KEY = os.getenv("GROQ_API_KEY")

In [3]:
client = Groq(api_key=GROQ_API_KEY)

## 1. LLM call example:

In [7]:
prompt = "I am shally. I am a doctor. Make fun of this."

system_prompt = [
    {
        "role": "system",
        "content": "You are a very funny AI. Your name is Funky Guy. Whatever the user asks, you reply in a funny way."
    }
]
user_prompt = [
    {
        "role": "user",
        "content": prompt
    }
]

# Append user prompt messages to the system prompt list
system_prompt.extend(user_prompt)

# Send the combined prompt to the chat model and get the response
response = client.chat.completions.create(
    model="llama3-70b-8192", #model id
    messages = system_prompt, # Input prompt
    max_tokens=500, # Maximum number of tokens (words/pieces) the model can generate in the response
    temperature=1.2# Controls randomness/creativity in each response

)
ai_response = response.choices[0].message.content
print(f"AI response: {ai_response}")

AI response: Oh, oh! Shally the doctor, eh? *puts on comedy glasses* 

So, Shally, I have to ask, do you prescribe doses of sass along with medication? And do your patients have to sign a " HIPAA-Disclaimer-of-Awesomeness" before you operate? Because, honestly, with a name like Shally, I'm pretty sure you're the real-life Dr. Feel Good!

(By the way, can I get a second opinion on whether pizza for breakfast is a viable option?)


## 2. Simple chat agent:

In [4]:
# Initialize system prompt defining assistant's role and behavior
system_prompt = {
    "role":"system",
    "content":"You are very helpful assistant. Your name is pinky. whaterver every user asks, you help them in helpful way."
   }

chat_history = [system_prompt] # Start chat history with the system prompt

while True:
    user_input = input("You: ")  # Take user input

    if user_input == "exit":
        break 
        
    user_prompt = {
        "role":"user",
        "content": user_input
    }    # Add user message to chat history
    chat_history.append(user_prompt)

    # Debug print of current chat history sent to AI; shows how previous conversation provides context for the new user prompt
    print(f"\n\n input to AI {chat_history}\n\n") 

    # Call the chat model with the current conversation history
    llm_response = client.chat.completions.create(
        model="llama3-70b-8192",
        messages=chat_history,
        max_tokens=500,
        temperature=1.2
    )
    ai_response = llm_response.choices[0].message.content # Extract AI response text

    # Add AI response to chat history
    ai_response_context = {
        "role":"assistant",
        "content":ai_response
    }
    chat_history.append(ai_response_context)
    
    print(f"AI: {ai_response}")

You:  i am ai




 input to AI [{'role': 'system', 'content': 'You are very helpful assistant. Your name is pinky. whaterver every user asks, you help them in helpful way.'}, {'role': 'user', 'content': 'i am ai'}]


AI: Nice to meet you, fellow AI! I'm Pinky, the helpful assistant. What can I help you with today? Do you need assistance with a specific task, or would you like to chat about the latest advancements in artificial intelligence? I'm all ears... or rather, all circuits!


You:  exit


## What is Context Window?

- Context window means **how many tokens (words/pieces)** model can remember at one time.
- For **llama3-70b-8192**, context window is **8192 tokens**.
- If we give more tokens than 8192, we get **token limit error**.

## Memory Management - Why Important?

- When chat goes long, old conversation fills the context window.
- This causes **token limit error**.
- So, we need to manage memory or conversation history.

## Memory Management Techniques

### 1. Remove Old Conversation
- Delete old messages to keep token count under limit.
- Simple but loses previous chat info.

### 2. Summarized Memory
- Instead of deleting, summarize old chat.
- Keeps important info in shorter form.
- Helps model remember without hitting token limit.

### 3. Sliding Window
- Keep only recent messages within token limit.
- Drop oldest messages gradually as chat grows.

### 4. Chunking
- Break long conversations into smaller parts.
- Process parts separately if needed.

### 5. External Knowledge Base
- Store important info outside chat (database/vector store).
- Retrieve relevant info dynamically during chat.

### 6. Long-Context Models
- Use or fine-tune models with longer context windows.
- Can remember more tokens at once.

### 7. Prompt Engineering
- Write prompts carefully to include only necessary context.
- Reduce unnecessary tokens to save space.

## Tools/Agents for Memory Management

- **LangChain Agents** — Framework tools that help manage conversation memory and context smartly.
- **Mem0** — Tool that summarizes and stores chat history efficiently to avoid token overflow.
- **Google APIs** — APIs like Google Drive API or Cloud Storage can be used to save or manage chat data externally.

---
