<h1>Chapter 7 - Hard Tasks</h1>
<i>Memory and agent-like behavior.</i>


## Using OpenAI API
- This notebook uses OpenAI via LangChain (`ChatOpenAI`).
- Set `OPENAI_API_KEY` in your environment before running.
- All tasks use `llm`, which is the OpenAI chat model.

In [39]:
#%pip install langchain>=0.1.17 openai>=1.13.3 langchain_openai>=0.1.6 transformers>=4.40.1 datasets>=2.18.0 accelerate>=0.27.2 sentence-transformers>=2.5.1 duckduckgo-search>=5.2.2 langchain_community
#%pip install llama-cpp-python==0.2.69

In [None]:
# OpenAI API setup
import os
from langchain_openai import ChatOpenAI

# Set the API key directly
# os.environ["OPENAI_API_KEY"] = "api-key"

print("✓ API key set")

# Fast model
openai_llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.2)

# Default LLM for the tasks below
llm = openai_llm

✓ API key set


<h2>Hard Tasks: Memory</h2>
<p>LLMs forget by default. Let's attach memory and watch what changes.</p>

<h3>Hard Task 1 - Conversation Buffer (full history)</h3>
<p>Goal: Keep the entire conversation history so the model remembers everything.</p>
<ol>
  <li>Create a prompt that includes chat history.</li>
  <li>Attach a ConversationBufferMemory.</li>
  <li>Tell the model your name, ask a question, then ask it to recall your name.</li>
</ol>

In [41]:
from langchain_core.prompts import PromptTemplate

# Prompt that includes chat history
buffer_template = "Current conversation:\n{chat_history}\n{input_prompt}"
mem_prompt = PromptTemplate(template=buffer_template, input_variables=["input_prompt", "chat_history"])

In [42]:
from langchain_classic.memory import ConversationBufferMemory

memory_full = ConversationBufferMemory(memory_key="chat_history")

In [43]:
from langchain_classic.chains import LLMChain

mem_chain_full = LLMChain(llm=llm, prompt=mem_prompt, memory=memory_full)

In [44]:
# First interaction: introduce yourself
_ = mem_chain_full.invoke({"input_prompt": "Hi! My name is Alex. What is 1 + 1?"})

In [45]:
# Second interaction: ask if it remembers your name
recall = mem_chain_full.invoke({"input_prompt": "What is my name?"})
print(recall)

{'input_prompt': 'What is my name?', 'chat_history': 'Human: Hi! My name is Alex. What is 1 + 1?\nAI: Hi Alex! 1 + 1 equals 2. How can I assist you further?', 'text': 'Your name is Alex. How can I help you today?'}


In [46]:
# Peek at what's stored in memory
print("[Loaded Memory]", memory_full.load_memory_variables({}))

[Loaded Memory] {'chat_history': 'Human: Hi! My name is Alex. What is 1 + 1?\nAI: Hi Alex! 1 + 1 equals 2. How can I assist you further?\nHuman: What is my name?\nAI: Your name is Alex. How can I help you today?'}


<h3>Hard Task 2 - Windowed Buffer (last k turns)</h3>
<p>Goal: Keep only the last few interactions to save tokens.</p>
<ol>
  <li>Keep only the last 2 interactions.</li>
  <li>Share two facts (name + age), then see which one is remembered later.</li>
</ol>

In [47]:
from langchain_classic.memory import ConversationBufferWindowMemory

# Only keep last 2 interactions
memory_k2 = ConversationBufferWindowMemory(k=2, memory_key="chat_history")

In [48]:
# Create chain with windowed memory
mem_chain_k2 = LLMChain(llm=llm, prompt=mem_prompt, memory=memory_k2)

In [49]:
# Turn 1: share name and age
_ = mem_chain_k2.invoke({"input_prompt": "Hi! My name is Casey and I am 33 years old. What is 2 + 2?"})

In [50]:
# Turn 2: ask another math question
_ = mem_chain_k2.invoke({"input_prompt": "What is 3 + 3?"})

In [51]:
# Turn 3 & 4: Ask recall questions
recall_name = mem_chain_k2.invoke({"input_prompt": "What is my name?"})
print("Recall name:", recall_name)

Recall name: {'input_prompt': 'What is my name?', 'chat_history': "Human: Hi! My name is Casey and I am 33 years old. What is 2 + 2?\nAI: Hi Casey! 2 + 2 equals 4. How can I assist you today?\nHuman: What is 3 + 3?\nAI: AI: 3 + 3 equals 6. Is there anything else you'd like to know?", 'text': 'Your name is Casey. How can I help you further?'}


In [52]:
recall_age = mem_chain_k2.invoke({"input_prompt": "What is my age?"})
print("Recall age:", recall_age)

Recall age: {'input_prompt': 'What is my age?', 'chat_history': "Human: What is 3 + 3?\nAI: AI: 3 + 3 equals 6. Is there anything else you'd like to know?\nHuman: What is my name?\nAI: Your name is Casey. How can I help you further?", 'text': "AI: I don't have that information. If you'd like to share your age or any other details, feel free to let me know!"}


<h3>Hard Task 3 - Conversation Summary (compress history)</h3>
<p>Goal: Summarize the growing chat so it fits into fewer tokens.</p>
<ol>
  <li>Create a summary prompt.</li>
  <li>Attach ConversationSummaryMemory that uses the LLM to summarize.</li>
  <li>Observe how the model answers recall questions based on a summary.</li>
</ol>

In [53]:
from langchain_core.prompts import PromptTemplate

summary_template = (
    "Summarize the conversations and update with the new lines.\n\n"
    "Current summary:\n{summary}\n\n"
    "new lines of conversation:\n{new_lines}\n\n"
    "New summary:"
)
sum_prompt = PromptTemplate(input_variables=["new_lines", "summary"], template=summary_template)

In [54]:
from langchain_classic.memory import ConversationSummaryMemory

memory_sum = ConversationSummaryMemory(llm=llm, memory_key="chat_history", prompt=sum_prompt)

In [55]:
from langchain_classic.chains import LLMChain

mem_chain_sum = LLMChain(llm=llm, prompt=mem_prompt, memory=memory_sum)

In [56]:
# First interaction
_ = mem_chain_sum.invoke({"input_prompt": "Hi! My name is Morgan. What is 4 + 4?"})

In [57]:
# Ask for recall
recall = mem_chain_sum.invoke({"input_prompt": "What is my name?"})
print(recall)

{'input_prompt': 'What is my name?', 'chat_history': 'Current summary:\n\nHuman: Hi! My name is Morgan. What is 4 + 4?  \nAI: Hi Morgan! 4 + 4 equals 8.', 'text': 'Your name is Morgan.'}


In [58]:
# View the current summary (not the full history)
print("[Current Summary]", memory_sum.load_memory_variables({}))

[Current Summary] {'chat_history': 'Current summary:\n\nHuman: Hi! My name is Morgan. What is 4 + 4?  \nAI: Hi Morgan! 4 + 4 equals 8.  \nHuman: What is my name?  \nAI: Your name is Morgan.'}


<h2>Hard Task 4 - Simulated Agent (ReAct-style thinking)</h2>
<p>Let's simulate the <b>Thought → Action → Observation</b> pattern and let the model reason step-by-step.</p>
<ol>
  <li>Show a small list of pretend tools.</li>
  <li>Ask a question the model can solve without external data.</li>
  <li>Prompt the model to display its thinking as Thought/Action/Observation and then give a Final Answer.</li>
</ol>

In [59]:
# Simulated ReAct prompt (no external APIs, no real tools)
react_prompt = (
    "Answer the following question as best you can. You have access to the following tools:\n\n"
    "- Calculator: good at basic arithmetic when you provide the numbers.\n"
    "- Notes: a scratchpad where you can keep short facts you derive.\n\n"
    "Use the following format:\n"
    "Question: the input question you must answer\n"
    "Thought: you should always think about what to do\n"
    "Action: the action to take, e.g., Calculator or Notes\n"
    "Action Input: the input to the action\n"
    "Observation: the result of the action\n"
    "... (this Thought/Action/Action Input/Observation can repeat N times)\n"
    "Thought: I now know the final answer\n"
    "Final Answer: the final answer to the original input question\n\n"
    "Begin!\n\n"
    "Question: You have two dogs (each has 4 legs) and one bird (2 legs). How many legs in total?\n"
    "Thought:"
)

In [60]:
# Invoke the simulated agent-style prompt
react_output = llm.invoke(react_prompt)
print(react_output)

content='To find the total number of legs, I need to calculate the legs of the dogs and the bird separately and then sum them up. Each dog has 4 legs, and there are 2 dogs, so I will calculate the total legs from the dogs first. Then I will add the legs from the bird.\n\nAction: Calculator  \nAction Input: (2 dogs * 4 legs) + (1 bird * 2 legs)  \nObservation: 8 + 2 = 10  \n\nThought: I now know the final answer.  \nFinal Answer: 10 legs in total.' additional_kwargs={'refusal': None} response_metadata={'token_usage': {'completion_tokens': 118, 'prompt_tokens': 175, 'total_tokens': 293, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_provider': 'openai', 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_560af6e559', 'id': 'chatcmpl-CV3oK5KI6XnuJRa7uaIHRyOrjEN9k', 'service_tier': 'default', 'finish_reason': 'st

<h3>Reflect</h3>
<ul>
  <li>What did you notice about buffer vs. windowed vs. summary memory?</li>
  <li>Which memory type would be best for a long conversation? Why?</li>
  <li>In the simulated agent step, did the model clearly separate <i>Thought</i>, <i>Action</i>, and <i>Observation</i> before the final answer?</li>
  <li>What happens if you change the ReAct question to something more complex?</li>
</ul>
<p>If any cell feels slow or costly, you can lower the temperature or switch to a smaller model.</p>