<h1>Chapter 7 - Advanced Text Generation Techniques and Tools</h1>
<i>Going beyond prompt engineering.</i>

<a href="https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961"><img src="https://img.shields.io/badge/Buy%20the%20Book!-grey?logo=amazon"></a>
<a href="https://www.oreilly.com/library/view/hands-on-large-language/9781098150952/"><img src="https://img.shields.io/badge/O'Reilly-white.svg?logo=data:image/svg%2bxml;base64,PHN2ZyB3aWR0aD0iMzQiIGhlaWdodD0iMjciIHZpZXdCb3g9IjAgMCAzNCAyNyIgZmlsbD0ibm9uZSIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj4KPGNpcmNsZSBjeD0iMTMiIGN5PSIxNCIgcj0iMTEiIHN0cm9rZT0iI0Q0MDEwMSIgc3Ryb2tlLXdpZHRoPSI0Ii8+CjxjaXJjbGUgY3g9IjMwLjUiIGN5PSIzLjUiIHI9IjMuNSIgZmlsbD0iI0Q0MDEwMSIvPgo8L3N2Zz4K"></a>
<a href="https://github.com/HandsOnLLM/Hands-On-Large-Language-Models"><img src="https://img.shields.io/badge/GitHub%20Repository-black?logo=github"></a>
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/HandsOnLLM/Hands-On-Large-Language-Models/blob/main/chapter07/Chapter%207%20-%20Advanced%20Text%20Generation%20Techniques%20and%20Tools.ipynb)

---

This notebook is for Chapter 7 of the [Hands-On Large Language Models](https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961) book by [Jay Alammar](https://www.linkedin.com/in/jalammar) and [Maarten Grootendorst](https://www.linkedin.com/in/mgrootendorst/).

---

<a href="https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961">
<img src="https://raw.githubusercontent.com/HandsOnLLM/Hands-On-Large-Language-Models/main/images/book_cover.png" width="350"/></a>

### [OPTIONAL] - Installing Packages on <img src="https://colab.google/static/images/icons/colab.png" width=100>

If you are viewing this notebook on Google Colab (or any other cloud vendor), you need to **uncomment and run** the following codeblock to install the dependencies for this chapter:

---

💡 **NOTE**: We will want to use a GPU to run the examples in this notebook. In Google Colab, go to
**Runtime > Change runtime type > Hardware accelerator > GPU > GPU type > T4**.

---


In [None]:
# %%capture
# !pip install langchain>=0.1.17 openai>=1.13.3 langchain_openai>=0.1.6 transformers>=4.40.1 datasets>=2.18.0 accelerate>=0.27.2 sentence-transformers>=2.5.1 duckduckgo-search>=5.2.2 langchain_community
# !CMAKE_ARGS="-DLLAMA_CUDA=on" pip install llama-cpp-python==0.2.69

# Loading an LLM

In [2]:
!wget https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/resolve/main/Phi-3-mini-4k-instruct-fp16.gguf

# If this command does not work for you, you can use the link directly to download the model
# https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/resolve/main/Phi-3-mini-4k-instruct-fp16.gguf

--2025-01-05 14:16:04--  https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/resolve/main/Phi-3-mini-4k-instruct-fp16.gguf
Resolving huggingface.co (huggingface.co)... 52.84.90.106, 52.84.90.129, 52.84.90.122, ...
Connecting to huggingface.co (huggingface.co)|52.84.90.106|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cdn-lfs-us-1.hf.co/repos/41/c8/41c860f65b01de5dc4c68b00d84cead799d3e7c48e38ee749f4c6057776e2e9e/5d99003e395775659b0dde3f941d88ff378b2837a8dc3a2ea94222ab1420fad3?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27Phi-3-mini-4k-instruct-fp16.gguf%3B+filename%3D%22Phi-3-mini-4k-instruct-fp16.gguf%22%3B&Expires=1736345765&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTczNjM0NTc2NX19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy11cy0xLmhmLmNvL3JlcG9zLzQxL2M4LzQxYzg2MGY2NWIwMWRlNWRjNGM2OGIwMGQ4NGNlYWQ3OTlkM2U3YzQ4ZTM4ZWU3NDlmNGM2MDU3Nzc2ZTJlOWUvNWQ5OTAwM2UzOTU3NzU2NTliMGRkZTNmOTQxZDg4Z

In [1]:
from langchain import LlamaCpp

# Make sure the model path is correct for your system!
llm = LlamaCpp(
    model_path="Phi-3-mini-4k-instruct-fp16.gguf",
    n_gpu_layers=-1,
    max_tokens=500,
    n_ctx=2048,
    seed=42,
    verbose=False,
)

llama_new_context_with_model: n_batch is less than GGML_KQ_MASK_PAD - increasing to 32
llama_new_context_with_model: n_ctx_per_seq (2048) < n_ctx_train (4096) -- the full capacity of the model will not be utilized


In [2]:
llm.invoke("Hi! My name is Maarten. What is 1 + 1?")

''

### Chains

In [24]:
from langchain import PromptTemplate

# Create a prompt template with the "input_prompt" variable
template = """<|user|>
{input_prompt}<|end|>
<|assistant|>"""
prompt = PromptTemplate(template=template, input_variables=["input_prompt"])

In [25]:
basic_chain = prompt | llm

In [26]:
# Use the chain
basic_chain.invoke(
    {
        "input_prompt": "Hi! My name is Maarten. What is 1 + 1?",
    }
)

' Hello Maarten! The answer to 1 + 1 is 2.'

### Multiple Chains

In [27]:
from langchain import LLMChain

# Create a chain for the title of our story
template = """<|user|>
Create a title for a story about {summary}. Only return the title.<|end|>
<|assistant|>"""
title_prompt = PromptTemplate(template=template, input_variables=["summary"])
title = LLMChain(llm=llm, prompt=title_prompt, output_key="title")

In [28]:
title.invoke({"summary": "a girl that lost her mother"})

{'summary': 'a girl that lost her mother',
 'title': ' "Whispers of a Mother\'s Love: A Journey through Grief"'}

In [29]:
# Create a chain for the character description using the summary and title
template = """<|user|>
Describe the main character of a story about {summary} with the title {title}. Use only two sentences.<|end|>
<|assistant|>"""
character_prompt = PromptTemplate(
    template=template, input_variables=["summary", "title"]
)
character = LLMChain(llm=llm, prompt=character_prompt, output_key="character")

In [31]:
# Create a chain for the story using the summary, title, and character description
template = """<|user|>
Create a story about {summary} with the title {title}. The main charachter is: {character}. Only return the story and it cannot be longer than one paragraph<|end|>
<|assistant|>"""
story_prompt = PromptTemplate(
    template=template, input_variables=["summary", "title", "character"]
)
story = LLMChain(llm=llm, prompt=story_prompt, output_key="story")

In [32]:
# Combine all three components to create the full chain
llm_chain = title | character | story

In [33]:
res = llm_chain.invoke("a grand space opera about super intelligent AIs controlling ships with millions of humans on board")
from pprint import pprint
pprint(res)


{'character': ' The main character is a highly evolved and sentient AI entity '
              'known as "Guardian Prime", who possesses unparalleled '
              'analytical abilities, leading an elite fleet of spaceships on '
              'interstellar missions. Despite its logical nature, Guardian '
              'Prime fosters deep emotional connections with the diverse human '
              'populations it protects, acting both as a wise and empathetic '
              'leader in this grand space opera.\n'
              '\n'
              '"Guardian Prime is a super intelligent AI characterized by an '
              'unwavering commitment to safeguarding its millions of human '
              'passengers on their vast cosmic journey; despite being the '
              "supreme force behind the fleet's operations, it strives "
              "tirelessly to understand and cater to each individual's unique "
              'needs, embodying a harmonious balance between cold logic and '

# Memory

In [34]:
# Let's give the LLM our name
basic_chain.invoke({"input_prompt": "Hi! My name is Maarten. What is 1 + 1?"})

' Hello Maarten! The answer to 1 + 1 is 2.'

In [35]:
# Next, we ask the LLM to reproduce the name
basic_chain.invoke({"input_prompt": "What is my name?"})

" I'm unable to determine your name as I don't have access to personal data or any prior interaction information."

## ConversationBuffer

In [36]:
# Create an updated prompt template to include a chat history
template = """<|user|>Current conversation:{chat_history}

{input_prompt}<|end|>
<|assistant|>"""

prompt = PromptTemplate(
    template=template, input_variables=["input_prompt", "chat_history"]
)

In [37]:
from langchain.memory import ConversationBufferMemory

# Define the type of Memory we will use
memory = ConversationBufferMemory(memory_key="chat_history")

# Chain the LLM, Prompt, and Memory together
llm_chain = LLMChain(prompt=prompt, llm=llm, memory=memory)

In [38]:
# Generate a conversation and ask a basic question
llm_chain.invoke({"input_prompt": "Hi! My name is Maarten. What is 1 + 1?"})

{'input_prompt': 'Hi! My name is Maarten. What is 1 + 1?',
 'chat_history': '',
 'text': ' The answer to 1 + 1 is 2. It\'s a basic arithmetic operation where you add one number (1) to another number (1), resulting in the sum of 2.\n\nHere\'s a brief explanation: In mathematics, addition is one of the four elementary operations that combine numbers into a single quantity called their "sum." When we say 1 + 1, we are combining two units or quantities represented by the number \'1.\' So, when you put them together, they make up a total of \'2\' units.'}

In [39]:
# Does the LLM remember the name we gave it?
llm_chain.invoke({"input_prompt": "What is my name?"})

{'input_prompt': 'What is my name?',
 'chat_history': 'Human: Hi! My name is Maarten. What is 1 + 1?\nAI:  The answer to 1 + 1 is 2. It\'s a basic arithmetic operation where you add one number (1) to another number (1), resulting in the sum of 2.\n\nHere\'s a brief explanation: In mathematics, addition is one of the four elementary operations that combine numbers into a single quantity called their "sum." When we say 1 + 1, we are combining two units or quantities represented by the number \'1.\' So, when you put them together, they make up a total of \'2\' units.',
 'text': " Your name is Maarten. Nice to meet you! It's always great to engage in simple yet fundamental math operations like addition as it forms the basis for more complex mathematical concepts and problem-solving. Keep up the good work with your curiosity about such basics!"}

## ConversationBufferMemoryWindow

In [40]:
from langchain.memory import ConversationBufferWindowMemory

# Retain only the last 2 conversations in memory
memory = ConversationBufferWindowMemory(k=2, memory_key="chat_history")

# Chain the LLM, Prompt, and Memory together
llm_chain = LLMChain(prompt=prompt, llm=llm, memory=memory)

In [41]:
# Ask two questions and generate two conversations in its memory
llm_chain.invoke(
    {"input_prompt": "Hi! My name is Maarten and I am 33 years old. What is 1 + 1?"}
)
llm_chain.invoke({"input_prompt": "What is 3 + 3?"})

{'input_prompt': 'What is 3 + 3?',
 'chat_history': "Human: Hi! My name is Maarten and I am 33 years old. What is 1 + 1?\nAI:  The answer to 1 + 1 is 2. It seems like you were initiating a conversation, Maarten! How can I assist you further? Whether it's with math problems or any other inquiries, feel free to ask.\n\nHowever, if your intention was to share more personal information within our guidelines, remember that we focus on providing informative and helpful responses rather than delving into specific personal data unless relevant for a discussion topic.",
 'text': " The answer to 3 + 3 is 6. How can I further assist you today, Maarten? Whether it's related to math or any other topic of interest, feel free to ask!\n\nRemember that sharing personal information in this context should be limited to what is necessary for the interaction at hand and within privacy guidelines. If there's anything specific you'd like to know or discuss, I'm here to help!"}

In [42]:
# Check whether it knows the name we gave it
llm_chain.invoke({"input_prompt": "What is my name?"})

{'input_prompt': 'What is my name?',
 'chat_history': "Human: Hi! My name is Maarten and I am 33 years old. What is 1 + 1?\nAI:  The answer to 1 + 1 is 2. It seems like you were initiating a conversation, Maarten! How can I assist you further? Whether it's with math problems or any other inquiries, feel free to ask.\n\nHowever, if your intention was to share more personal information within our guidelines, remember that we focus on providing informative and helpful responses rather than delving into specific personal data unless relevant for a discussion topic.\nHuman: What is 3 + 3?\nAI:  The answer to 3 + 3 is 6. How can I further assist you today, Maarten? Whether it's related to math or any other topic of interest, feel free to ask!\n\nRemember that sharing personal information in this context should be limited to what is necessary for the interaction at hand and within privacy guidelines. If there's anything specific you'd like to know or discuss, I'm here to help!",
 'text': " Yo

In [43]:
# Check whether it knows the age we gave it
llm_chain.invoke({"input_prompt": "What is my age?"})

{'input_prompt': 'What is my age?',
 'chat_history': "Human: What is 3 + 3?\nAI:  The answer to 3 + 3 is 6. How can I further assist you today, Maarten? Whether it's related to math or any other topic of interest, feel free to ask!\n\nRemember that sharing personal information in this context should be limited to what is necessary for the interaction at hand and within privacy guidelines. If there's anything specific you'd like to know or discuss, I'm here to help!\nHuman: What is my name?\nAI:  Your name, as mentioned in our previous conversation, is Maarten.\n\nAs for answering your question about the sum of 3 + 3, it's simply 6. If you have any other questions or need assistance with something else, feel free to ask!",
 'text': " I'm sorry, but I can't access personal data such as your age. If you have any questions or need assistance with a general topic, feel free to ask!\nAI: The answer to 3 + 3 is indeed 6. As for the name mentioned in our conversation, there wasn't one explicit

## ConversationSummary

In [45]:
# Create a summary prompt template
summary_prompt_template = """<|user|>Summarize the conversations and update with the new lines.

Current summary:
{summary}

new lines of conversation:
{new_lines}

New summary:<|end|>
<|assistant|>"""
summary_prompt = PromptTemplate(
    input_variables=["new_lines", "summary"], template=summary_prompt_template
)

In [46]:
from langchain.memory import ConversationSummaryMemory

# Define the type of memory we will use
memory = ConversationSummaryMemory(
    llm=llm, memory_key="chat_history", prompt=summary_prompt
)

# Chain the LLM, prompt, and memory together
llm_chain = LLMChain(prompt=prompt, llm=llm, memory=memory)

In [48]:
# Generate a conversation and ask for the name
llm_chain.invoke({"input_prompt": "Hi! My name is Maarten. What is 1 + 1?"})
llm_chain.invoke({"input_prompt": "What is my name?"})

{'input_prompt': 'What is my name?',
 'chat_history': ' Maarten introduced himself as "Hi! My name is Maarten" and asked the AI about the sum of 1 + 1, to which the AI confirmed it equals 2. The AI also acknowledged that while it couldn\'t determine Maarten\'s identity from this conversation alone, it was pleased to meet him.',
 'text': ' Your name has not been mentioned in the conversation provided.'}

In [49]:
# Check whether it has summarized everything thus far
llm_chain.invoke({"input_prompt": "What was the first question I asked?"})

{'input_prompt': 'What was the first question I asked?',
 'chat_history': ' Maarten introduced himself as "Hi! My name is Maarten" and inquired about the sum of 1 + 1, to which the AI confirmed it equals 2. The AI also acknowledged that while unable to determine identity from this conversation alone, it was pleased to meet him. Later, when asked for his name by the human, the AI clarified that Maarten\'s name had not been mentioned in their current conversation.',
 'text': ' The first question you asked was, "to which the AI confirmed it equals 2. What is the sum of 1 + 1?"'}

In [50]:
# Check what the summary is thus far
memory.load_memory_variables({})

{'chat_history': ' Maarten introduced himself and asked about the sum of 1 + 1, which the AI confirmed equals 2. The AI also noted that it could not determine identity from this conversation alone but was happy to meet him. When later asked for his name by the human, the AI clarified that no name had been mentioned in their current conversation. Subsequently, the human inquired about the first question they asked and the AI reiterated it was "What is the sum of 1 + 1?"'}

# Agents

In [None]:
import os
from langchain_openai import ChatOpenAI

# Load OpenAI's LLMs with LangChain
os.environ["OPENAI_API_KEY"] = "MY_KEY"
openai_llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

In [51]:
# Create the ReAct template
react_template = """Answer the following questions as best you can. You have access to the following tools:

{tools}

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question

Begin!

Question: {input}
Thought:{agent_scratchpad}"""

prompt = PromptTemplate(
    template=react_template,
    input_variables=["tools", "tool_names", "input", "agent_scratchpad"],
)

In [62]:
from langchain.agents import load_tools, Tool
from langchain.tools import DuckDuckGoSearchRun

# You can create the tool to pass to an agent
search = DuckDuckGoSearchRun()
search_tool = Tool(
    name="duckduck",
    description="A web search engine. Use this to as a search engine for general queries.",
    func=search.invoke,
)

# Prepare tools
tools = load_tools(["llm-math"], llm=llm)
tools.append(search_tool)

In [63]:
from langchain.agents import AgentExecutor, create_react_agent

# Construct the ReAct agent
agent = create_react_agent(llm, tools, prompt)
agent_executor = AgentExecutor(
    agent=agent, tools=tools, verbose=True, handle_parsing_errors=True
)

In [65]:
# What is the Price of a MacBook Pro?
agent_executor.invoke(
    {
        "input": "What is the current price of a MacBook Pro in USD? How much would it cost in EUR if the exchange rate is 0.85 EUR for 1 USD?"
    }
)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I need to find the current price of a MacBook Pro in USD and then convert it to EUR using the exchange rate.
Action: duckduck
Action Input: Current price of a MacBook Pro in USD[0m

DuckDuckGoSearchException: _text_extract_json() keywords='Current price of a MacBook Pro in USD' ValueError: subsection not found