# LangChain Chains

in this notebook we will take a look at different langchain chains

In [1]:
# install the packages
!pip install -Uqqq pip --progress-bar off
!pip install -qqq langchain==0.0.149 --progress-bar off
!pip install -qqq openai==0.27.4 --progress-bar off
!pip install -qqq tiktoken==0.3.3 --progress-bar off
!pip install -qqq watermark==2.3.1 --progress-bar off
!pip install -qqq chromadb==0.3.21 --progress-bar off

In [2]:
# import the packages
import os
import textwrap
from getpass import getpass
import chromadb
import langchain
import openai
from langchain.chains import LLMBashChain, LLMChain, RetrievalQA, SimpleSequentialChain
from langchain.chains.summarize import load_summarize_chain
from langchain.chat_models import ChatOpenAI
from langchain.docstore.document import Document
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
import credentials

command for Jupyter Notebook or IPython, which is used to load an extension called "watermark" and display version information for all the installed packages.

In [3]:
%load_ext watermark
%watermark --iversions -v -m

Python implementation: CPython
Python version       : 3.10.10
IPython version      : 8.12.0

Compiler    : Clang 14.0.6 
OS          : Darwin
Release     : 22.2.0
Machine     : x86_64
Processor   : i386
CPU cores   : 10
Architecture: 64bit

langchain: 0.0.149
openai   : 0.27.4
sys      : 3.10.10 (main, Mar 21 2023, 13:41:39) [Clang 14.0.6 ]
chromadb : 0.3.21



# Add a print_response function to wrap the response of the models

In [4]:
def print_response(response: str):
    print("\n".join(textwrap.wrap(response, width=100)))

# Create a new openai api key


In [5]:
OPENAI_API_KEY = getpass()
os.environ["OPENAI_API_KEY"] = credentials.OPENAI_API_KEY

# Create a new chat model

In [6]:
model = ChatOpenAI(temperature=0.3, model_name="gpt-3.5-turbo")

# Create a template for the prompt

In [7]:
template = """
You have to come up with location to shoot (along with a 20-50 word description)
for a new episode of the TV show "The Office" based on the theme.

{theme_suggestion}

ANSWER:
"""
prompt = PromptTemplate(input_variables=["theme_suggestion"], template=template)

location_chain = LLMChain(llm=model, prompt=prompt, verbose=True)

In [8]:
response = location_chain("Visiting Europe")
response



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
You have to come up with location to shoot (along with a 20-50 word description)
for a new episode of the TV show "The Office" based on the theme.

Visiting Europe

ANSWER:
[0m

[1m> Finished chain.[0m


{'theme_suggestion': 'Visiting Europe',
 'text': 'The Office crew heads to Paris, France for a company retreat. While there, they struggle with language barriers, cultural differences, and a surprise visit from a former employee. Will they be able to bond and come together as a team in the City of Love?'}

In [9]:
print_response(response["text"])

The Office crew heads to Paris, France for a company retreat. While there, they struggle with
language barriers, cultural differences, and a surprise visit from a former employee. Will they be
able to bond and come together as a team in the City of Love?


# Sequential chains


[Documentation](https://python.langchain.com/en/latest/modules/chains/generic/sequential_chains.html)


In [10]:
template = """
Generate a short dialogue between Jim and Pam
from the TV show "The Office" for a new episode based on the location

{location}

ANSWER:
"""
prompt = PromptTemplate(input_variables=["location"], template=template)

conversation_chain = LLMChain(llm=model, prompt=prompt, verbose=True)

### SimpleSequentialChain

In [11]:
sequential_chain = SimpleSequentialChain(
    chains=[location_chain, conversation_chain], verbose=True
)
response = sequential_chain.run("Visiting Europe")




[1m> Entering new SimpleSequentialChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
You have to come up with location to shoot (along with a 20-50 word description)
for a new episode of the TV show "The Office" based on the theme.

Visiting Europe

ANSWER:
[0m

[1m> Finished chain.[0m
[36;1m[1;3mThe Office crew heads to Paris for a business trip, but things take a romantic turn when Jim and Pam get lost in the city of love. Meanwhile, Dwight tries to navigate the French language and culture with hilarious results.[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Generate a short dialogue between Jim and Pam
from the TV show "The Office" for a new episode based on the location

The Office crew heads to Paris for a business trip, but things take a romantic turn when Jim and Pam get lost in the city of love. Meanwhile, Dwight tries to navigate the French language and culture with hilarious resu

In [12]:
conversation = response
print(conversation)

Jim: "I can't believe we got lost in Paris, of all places."

Pam: "I know, right? But I have to admit, it's kind of romantic wandering around these streets with you."

Jim: "Yeah, it's like we're in our own little movie."

Pam: "Speaking of movies, we should find a cute little cafe and have some croissants and coffee."

Jim: "Sounds perfect. But first, let's take a selfie in front of the Eiffel Tower."

Pam: "Yes! And then we can send it to Dwight and Michael to make them jealous."

Jim: "Ha! They're probably struggling to order food in French right now."

Pam: "Well, at least we have each other to navigate this city with."

Jim: "Always, Pam. Always."


# Summarization

There are four different chain types: stuff, map_reduce, refine and map-rerank. Each of these is described in more detail below.


### Stuffing
Stuffing is the simplest method, whereby you simply stuff all the related data into the prompt as context to pass to the language model. This is implemented in LangChain as the StuffDocumentsChain.

Pros: Only makes a single call to the LLM. When generating text, the LLM has access to all the data at once.

Cons: Most LLMs have a context length, and for large documents (or many documents) this will not work as it will result in a prompt larger than the context length.

The main downside of this method is that it only works on smaller pieces of data. Once you are working with many pieces of data, this approach is no longer feasible. The next two approaches are designed to help deal with that.


### Map Reduce
This method involves running an initial prompt on each chunk of data (for summarization tasks, this could be a summary of that chunk; for question-answering tasks, it could be an answer based solely on that chunk). Then a different prompt is run to combine all the initial outputs. This is implemented in the LangChain as the MapReduceDocumentsChain.

Pros: Can scale to larger documents (and more documents) than StuffDocumentsChain. The calls to the LLM on individual documents are independent and can therefore be parallelized.

Cons: Requires many more calls to the LLM than StuffDocumentsChain. Loses some information during the final combined call.


### Refine
This method involves running an initial prompt on the first chunk of data, generating some output. For the remaining documents, that output is passed in, along with the next document, asking the LLM to refine the output based on the new document.

Pros: Can pull in more relevant context, and may be less lossy than MapReduceDocumentsChain.

Cons: Requires many more calls to the LLM than StuffDocumentsChain. The calls are also NOT independent, meaning they cannot be paralleled like MapReduceDocumentsChain. There is also some potential dependencies on the ordering of the documents.


### Map-Rerank (not implemented for summarization)
This method involves running an initial prompt on each chunk of data, that not only tries to complete a task but also gives a score for how certain it is in its answer. The responses are then ranked according to this score, and the highest score is returned.

Pros: Similar pros as MapReduceDocumentsChain. Requires fewer calls, compared to MapReduceDocumentsChain.

Cons: Cannot combine information between documents. This means it is most useful when you expect there to be a single simple answer in a single document.

[Documentation](https://docs.langchain.com/docs/components/chains/index_related_chains)
[Example](https://python.langchain.com/en/latest/modules/chains/index_examples/summarize.html)

In [13]:
template = """
Write a concise bullet list summary of the conversation between Jim and Pam from the TV show "The Office":

{text}

Concise summary using markdown:"""

prompt = PromptTemplate(template=template, input_variables=["text"])
summary_chain = load_summarize_chain(
    model, chain_type="stuff", verbose=True, prompt=prompt
)


In [14]:
docs = [Document(page_content=conversation)]
docs

[Document(page_content='Jim: "I can\'t believe we got lost in Paris, of all places."\n\nPam: "I know, right? But I have to admit, it\'s kind of romantic wandering around these streets with you."\n\nJim: "Yeah, it\'s like we\'re in our own little movie."\n\nPam: "Speaking of movies, we should find a cute little cafe and have some croissants and coffee."\n\nJim: "Sounds perfect. But first, let\'s take a selfie in front of the Eiffel Tower."\n\nPam: "Yes! And then we can send it to Dwight and Michael to make them jealous."\n\nJim: "Ha! They\'re probably struggling to order food in French right now."\n\nPam: "Well, at least we have each other to navigate this city with."\n\nJim: "Always, Pam. Always."', metadata={})]

In [15]:
summary_result = summary_chain.run(docs)



[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Write a concise bullet list summary of the conversation between Jim and Pam from the TV show "The Office":

Jim: "I can't believe we got lost in Paris, of all places."

Pam: "I know, right? But I have to admit, it's kind of romantic wandering around these streets with you."

Jim: "Yeah, it's like we're in our own little movie."

Pam: "Speaking of movies, we should find a cute little cafe and have some croissants and coffee."

Jim: "Sounds perfect. But first, let's take a selfie in front of the Eiffel Tower."

Pam: "Yes! And then we can send it to Dwight and Michael to make them jealous."

Jim: "Ha! They're probably struggling to order food in French right now."

Pam: "Well, at least we have each other to navigate this city with."

Jim: "Always, Pam. Always."

Concise summary using markdown:[0m

[1m> Finished chain.[0m

[1m> Finished chain.[0m


In [16]:
print(summary_result)

- Jim and Pam got lost in Paris.
- They find it romantic to wander around the streets together.
- They plan to find a cute cafe and have croissants and coffee.
- They want to take a selfie in front of the Eiffel Tower and send it to Dwight and Michael to make them jealous.
- They are grateful to have each other to navigate the city with.


# Question answering

Here we look at how to use LangChain for question answering over a list of documents. It covers four different types of chains: stuff, map_reduce, refine, map_rerank

[Example](https://python.langchain.com/en/latest/modules/chains/index_examples/question_answering.html)

In [17]:
template = """
You have to come up with a 200-300 word script for a new episode
of the TV show "The Office" based on the theme

{theme_suggestion}

ANSWER:
"""
prompt = PromptTemplate(input_variables=["theme_suggestion"], template=template)

script_chain = LLMChain(llm=model, prompt=prompt, verbose=True)


In [21]:
script_response = script_chain("Going to the moon")



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
You have to come up with a 200-300 word script for a new episode
of the TV show "The Office" based on the theme

Going to the moon

ANSWER:
[0m

[1m> Finished chain.[0m


In [22]:
script = script_response["text"]
print(script)

FADE IN:

INT. DUNDER MIFFLIN SCRANTON - DAY

The employees of Dunder Mifflin are gathered in the conference room for a meeting. Michael Scott is standing at the front of the room, holding a toy rocket ship.

MICHAEL: Good morning, everyone! Today, we’re going to talk about the moon.

JIM: (whispering to Pam) Is he serious?

PAM: (whispering back) I don’t know, but I’m afraid to ask.

MICHAEL: As you all know, NASA is planning a mission to the moon in a few years. And I’ve been thinking, why should they have all the fun?

DWIGHT: (excitedly) Are you suggesting we go to the moon, Michael?

MICHAEL: (nodding) Yes, Dwight. I am.

JIM: (sarcastically) Oh, great. Another one of Michael’s brilliant ideas.

MICHAEL: (ignoring Jim) I’ve already contacted a space travel agency and they’ve agreed to take us to the moon.

PAM: (concerned) Michael, I don’t think that’s a good idea. Going to the moon is dangerous.

MICHAEL: (defensive) Pam, I’m not going to let a little thing like danger stop us fr

### Split the script into chunks

In [23]:
script_docs = [Document(page_content=script)]

text_splitter = CharacterTextSplitter(chunk_size=2048, chunk_overlap=32)
texts = text_splitter.split_documents(script_docs)
len(texts)

2

In [24]:
texts[0]

Document(page_content='FADE IN:\n\nINT. DUNDER MIFFLIN SCRANTON - DAY\n\nThe employees of Dunder Mifflin are gathered in the conference room for a meeting. Michael Scott is standing at the front of the room, holding a toy rocket ship.\n\nMICHAEL: Good morning, everyone! Today, we’re going to talk about the moon.\n\nJIM: (whispering to Pam) Is he serious?\n\nPAM: (whispering back) I don’t know, but I’m afraid to ask.\n\nMICHAEL: As you all know, NASA is planning a mission to the moon in a few years. And I’ve been thinking, why should they have all the fun?\n\nDWIGHT: (excitedly) Are you suggesting we go to the moon, Michael?\n\nMICHAEL: (nodding) Yes, Dwight. I am.\n\nJIM: (sarcastically) Oh, great. Another one of Michael’s brilliant ideas.\n\nMICHAEL: (ignoring Jim) I’ve already contacted a space travel agency and they’ve agreed to take us to the moon.\n\nPAM: (concerned) Michael, I don’t think that’s a good idea. Going to the moon is dangerous.\n\nMICHAEL: (defensive) Pam, I’m not goi

### Create a database of embeddings

In [25]:
embeddings = OpenAIEmbeddings()

In [26]:
db = Chroma.from_documents(texts, embeddings)

Using embedded DuckDB without persistence: data will be transient


### Create a question answering chain

In [27]:
qa_chain = RetrievalQA.from_chain_type(
    llm=model, chain_type="stuff", retriever=db.as_retriever(search_kwargs={"k": 2}, verbose=True)
)

In [28]:
response = qa_chain("What is the place that The Office team is visiting?")
response

{'query': 'What is the place that The Office team is visiting?',
 'result': 'The Office team is visiting the moon.'}

In [29]:
print_response(response["result"])

The Office team is visiting the moon.


#### Lets test if it knows that this is not in the text

In [30]:
response = qa_chain("Why the client doesn't want to deal with Michael?")
print_response(response["result"])

There is no information provided in the context that suggests the client doesn't want to deal with
Michael.


# Bash Chain

The BashChain is a special chain that allows you to run bash commands. It is useful for chaining together bash commands with other chains.
[Documentation](https://python.langchain.com/en/latest/modules/chains/examples/llm_bash.html?highlight=bash%20chain)

In [31]:
text = """
Please write a bash script that prints a single line that Michael G. Scott from "The Office" might say
"""

bash_chain = LLMBashChain(llm=OpenAI(temperature=0), verbose=True)

bash_chain.run(text)



[1m> Entering new LLMBashChain chain...[0m

Please write a bash script that prints a single line that Michael G. Scott from "The Office" might say
[32;1m[1;3m
```bash
echo "That's what she said!"
```[0m['```bash', 'echo "That\'s what she said!"', '```']

Answer: [33;1m[1;3mThat's what she said!
[0m
[1m> Finished chain.[0m


"That's what she said!\n"