# langchain Chains

in this notebook we will take a look at different langchain chains

In [19]:
# install the packages
!pip install -Uqqq pip --progress-bar off
!pip install -qqq langchain==0.0.149 --progress-bar off
!pip install -qqq openai==0.27.4 --progress-bar off
!pip install -qqq tiktoken==0.3.3 --progress-bar off
!pip install -qqq watermark==2.3.1 --progress-bar off
!pip install -qqq chromadb==0.3.21 --progress-bar off

In [1]:
# import the packages
import os
import textwrap
from getpass import getpass
import chromadb
import langchain
import openai
from langchain.chains import LLMBashChain, LLMChain, RetrievalQA, SimpleSequentialChain
from langchain.chains.summarize import load_summarize_chain
from langchain.chat_models import ChatOpenAI
from langchain.docstore.document import Document
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
import credentials

command for Jupyter Notebook or IPython, which is used to load an extension called "watermark" and display version information for all the installed packages.

In [2]:
%load_ext watermark
%watermark --iversions -v -m

Python implementation: CPython
Python version       : 3.10.10
IPython version      : 8.12.0

Compiler    : Clang 14.0.6 
OS          : Darwin
Release     : 22.2.0
Machine     : x86_64
Processor   : i386
CPU cores   : 10
Architecture: 64bit

langchain: 0.0.149
chromadb : 0.3.21
openai   : 0.27.4
sys      : 3.10.10 (main, Mar 21 2023, 13:41:39) [Clang 14.0.6 ]



# Add a print_response function to wrap the response of the models

In [3]:
def print_response(response: str):
    print("\n".join(textwrap.wrap(response, width=100)))

# Create a new openai api key


In [8]:
OPENAI_API_KEY = getpass()
os.environ["OPENAI_API_KEY"] = credentials.OPENAI_API_KEY

# create a new chat model

In [9]:
chat_gpt = ChatOpenAI(temperature=0.2, model_name="gpt-3.5-turbo")

# Create a template for the prompt

In [84]:
template = """
You have to come up with location to shoot (along with a 20-50 word description)
for a new episode of the TV show "The Office" based on the theme.

{theme_suggestion}

ANSWER:
"""
prompt = PromptTemplate(input_variables=["theme_suggestion"], template=template)

location_chain = LLMChain(llm=chat_gpt, prompt=prompt, verbose=True)

In [51]:
response = location_chain("Visiting Europe")
response



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
You have to come up with location to shoot (along with a 20-50 word description)
for a new episode of the TV show "The Office" based on the theme.

Visiting Europe

ANSWER:
[0m

[1m> Finished chain.[0m


{'theme_suggestion': 'Visiting Europe',
 'text': 'The Office crew heads to Paris, France for a company-wide conference. But when Michael gets lost in the city of love, chaos ensues as the rest of the team tries to find him before the conference begins. Will they make it in time? Tune in to find out.'}

In [52]:
print_response(response["text"])

The Office crew heads to Paris, France for a company-wide conference. But when Michael gets lost in
the city of love, chaos ensues as the rest of the team tries to find him before the conference
begins. Will they make it in time? Tune in to find out.


# Sequential chains
The next step after calling a language model is make a series of calls to a language model. This is particularly useful when you want to take the output from one call and use it as the input to another.

In this notebook we will walk through some examples for how to do this, using sequential chains. Sequential chains are defined as a series of chains, called in deterministic order. There are two types of sequential chains:

SimpleSequentialChain: The simplest form of sequential chains, where each step has a singular input/output, and the output of one step is the input to the next.

SequentialChain: A more general form of sequential chains, allowing for multiple inputs/outputs.

[Documentation](https://python.langchain.com/en/latest/modules/chains/generic/sequential_chains.html)


In [53]:
template = """
Generate a short dialogue between Jim and Pam
from the TV show "The Office" for a new episode based on the location

{location}

ANSWER:
"""
prompt = PromptTemplate(input_variables=["location"], template=template)

conversation_chain = LLMChain(llm=chat_gpt, prompt=prompt, verbose=True)

### SimpleSequentialChain

In [54]:
sequential_chain = SimpleSequentialChain(
    chains=[location_chain, conversation_chain], verbose=True
)
response = sequential_chain.run("Visiting Europe")




[1m> Entering new SimpleSequentialChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
You have to come up with location to shoot (along with a 20-50 word description)
for a new episode of the TV show "The Office" based on the theme.

Visiting Europe

ANSWER:
[0m

[1m> Finished chain.[0m
[36;1m[1;3mThe Office crew heads to Paris, France for a business trip. While there, they get lost in the city of love, indulge in French cuisine, and experience cultural differences that lead to hilarious misunderstandings.[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Generate a short dialogue between Jim and Pam
from the TV show "The Office" for a new episode based on the location

The Office crew heads to Paris, France for a business trip. While there, they get lost in the city of love, indulge in French cuisine, and experience cultural differences that lead to hilarious misunderstandings.

ANSWER:
[0m



In [57]:
conversation = response
print(conversation)

Jim: "I can't believe we're in Paris, Pam. This is amazing!"

Pam: "I know, right? I can't wait to try all the French food and see the Eiffel Tower."

Jim: "Speaking of food, do you know what escargot is?"

Pam: "Yeah, it's snails. Why?"

Jim: "I accidentally ordered it for lunch. I thought it was some kind of fancy pasta dish."

Pam: "Oh no, Jim. Did you eat it?"

Jim: "I tried one, but it was too slimy and gross. I don't think I can eat anything else now."

Pam: "Well, at least you tried it. And we still have plenty of time to explore the city."

Jim: "Yeah, but I think we should stick together. I don't want to get lost in this maze of streets."

Pam: "Agreed. Let's make sure we always have each other's backs."

Jim: "Always, Pam. Always."


# Summarization

Summarization covers three different chain types: stuff, map_reduce, and refine

### Stuffing
Stuffing is the simplest method, whereby you simply stuff all the related data into the prompt as context to pass to the language model. This is implemented in LangChain as the StuffDocumentsChain.

Pros: Only makes a single call to the LLM. When generating text, the LLM has access to all the data at once.

Cons: Most LLMs have a context length, and for large documents (or many documents) this will not work as it will result in a prompt larger than the context length.

The main downside of this method is that it only works on smaller pieces of data. Once you are working with many pieces of data, this approach is no longer feasible. The next two approaches are designed to help deal with that.

### Map Reduce
This method involves running an initial prompt on each chunk of data (for summarization tasks, this could be a summary of that chunk; for question-answering tasks, it could be an answer based solely on that chunk). Then a different prompt is run to combine all the initial outputs. This is implemented in the LangChain as the MapReduceDocumentsChain.

Pros: Can scale to larger documents (and more documents) than StuffDocumentsChain. The calls to the LLM on individual documents are independent and can therefore be parallelized.

Cons: Requires many more calls to the LLM than StuffDocumentsChain. Loses some information during the final combined call.

### Refine
This method involves running an initial prompt on the first chunk of data, generating some output. For the remaining documents, that output is passed in, along with the next document, asking the LLM to refine the output based on the new document.

Pros: Can pull in more relevant context, and may be less lossy than MapReduceDocumentsChain.

Cons: Requires many more calls to the LLM than StuffDocumentsChain. The calls are also NOT independent, meaning they cannot be paralleled like MapReduceDocumentsChain. There is also some potential dependencies on the ordering of the documents.

### Map-Rerank (not implemented for summarization)
This method involves running an initial prompt on each chunk of data, that not only tries to complete a task but also gives a score for how certain it is in its answer. The responses are then ranked according to this score, and the highest score is returned.

Pros: Similar pros as MapReduceDocumentsChain. Requires fewer calls, compared to MapReduceDocumentsChain.

Cons: Cannot combine information between documents. This means it is most useful when you expect there to be a single simple answer in a single document.

[Documentation](https://docs.langchain.com/docs/components/chains/index_related_chains)
[Example](https://python.langchain.com/en/latest/modules/chains/index_examples/summarize.html)

In [58]:
template = """
Write a concise bullet list summary of the conversation between Jim and Pam from the TV show "The Office":

{text}

Concise summary using markdown:"""

prompt = PromptTemplate(template=template, input_variables=["text"])
summary_chain = load_summarize_chain(
    chat_gpt, chain_type="stuff", verbose=True, prompt=prompt
)


In [59]:
docs = [Document(page_content=conversation)]
docs

[Document(page_content='Jim: "I can\'t believe we\'re in Paris, Pam. This is amazing!"\n\nPam: "I know, right? I can\'t wait to try all the French food and see the Eiffel Tower."\n\nJim: "Speaking of food, do you know what escargot is?"\n\nPam: "Yeah, it\'s snails. Why?"\n\nJim: "I accidentally ordered it for lunch. I thought it was some kind of fancy pasta dish."\n\nPam: "Oh no, Jim. Did you eat it?"\n\nJim: "I tried one, but it was too slimy and gross. I don\'t think I can eat anything else now."\n\nPam: "Well, at least you tried it. And we still have plenty of time to explore the city."\n\nJim: "Yeah, but I think we should stick together. I don\'t want to get lost in this maze of streets."\n\nPam: "Agreed. Let\'s make sure we always have each other\'s backs."\n\nJim: "Always, Pam. Always."', metadata={})]

In [60]:
summary_result = summary_chain.run(docs)



[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Write a concise bullet list summary of the conversation between Jim and Pam from the TV show "The Office":

Jim: "I can't believe we're in Paris, Pam. This is amazing!"

Pam: "I know, right? I can't wait to try all the French food and see the Eiffel Tower."

Jim: "Speaking of food, do you know what escargot is?"

Pam: "Yeah, it's snails. Why?"

Jim: "I accidentally ordered it for lunch. I thought it was some kind of fancy pasta dish."

Pam: "Oh no, Jim. Did you eat it?"

Jim: "I tried one, but it was too slimy and gross. I don't think I can eat anything else now."

Pam: "Well, at least you tried it. And we still have plenty of time to explore the city."

Jim: "Yeah, but I think we should stick together. I don't want to get lost in this maze of streets."

Pam: "Agreed. Let's make sure we always have each other's backs."

Jim: "Always, Pam. Always."

Con

In [61]:
print(summary_result)

- Jim and Pam are in Paris and excited to explore the city
- Jim accidentally ordered escargot for lunch, thinking it was a pasta dish
- He tried one but found it slimy and gross
- They plan to stick together to avoid getting lost in the city
- They agree to always have each other's backs.


# Question answering

Here we look at how to use LangChain for question answering over a list of documents. It covers four different types of chains: stuff, map_reduce, refine, map_rerank

[Example](https://python.langchain.com/en/latest/modules/chains/index_examples/question_answering.html)

In [65]:
template = """
You have to come up with a 200-300 word script for a new episode
of the TV show "The Office" based on the theme

{theme_suggestion}

ANSWER:
"""
prompt = PromptTemplate(input_variables=["theme_suggestion"], template=template)

script_chain = LLMChain(llm=chat_gpt, prompt=prompt, verbose=True)


In [66]:
script_response = script_chain("Visiting Europe")



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
You have to come up with a 200-300 word script for a new episode
of the TV show "The Office" based on the theme

Visiting Europe

ANSWER:
[0m

[1m> Finished chain.[0m


In [67]:
script = script_response["text"]
print(script)

Title: The Office Goes to Europe

INT. DUNDER MIFFLIN SCRANTON OFFICE - DAY

Michael Scott is sitting at his desk, scrolling through his social media feed. He stops at a picture of his ex-girlfriend, Jan, who is currently on a European vacation.

MICHAEL: (to the camera) Jan is in Europe? That's it. We're going to Europe.

CUT TO:

INT. CONFERENCE ROOM - DAY

Michael is addressing the staff.

MICHAEL: Attention, everyone. We are going to Europe! We'll be visiting our sister branches in London, Paris, and Berlin.

JIM: (to the camera) I'm excited to see Europe, but I'm also a little worried about Michael's planning skills.

CUT TO:

INT. AIRPORT - DAY

The staff is gathered at the airport, waiting for their flight.

DWIGHT: (to the camera) I've packed everything I need for survival in Europe. Including a Swiss Army knife, a map, and a book on European history.

PAM: (to Jim) Do you think Michael knows we need passports to get into Europe?

JIM: (shrugs) Who knows?

CUT TO:

INT. LONDON 

### Split the script into chunks

In [85]:
script_docs = [Document(page_content=script)]

text_splitter = CharacterTextSplitter(chunk_size=2048, chunk_overlap=32)
texts = text_splitter.split_documents(script_docs)
len(texts)

2

In [72]:
texts[0]

Document(page_content="Title: The Office Goes to Europe\n\nINT. DUNDER MIFFLIN SCRANTON OFFICE - DAY\n\nMichael Scott is sitting at his desk, scrolling through his social media feed. He stops at a picture of his ex-girlfriend, Jan, who is currently on a European vacation.\n\nMICHAEL: (to the camera) Jan is in Europe? That's it. We're going to Europe.\n\nCUT TO:\n\nINT. CONFERENCE ROOM - DAY\n\nMichael is addressing the staff.\n\nMICHAEL: Attention, everyone. We are going to Europe! We'll be visiting our sister branches in London, Paris, and Berlin.\n\nJIM: (to the camera) I'm excited to see Europe, but I'm also a little worried about Michael's planning skills.\n\nCUT TO:\n\nINT. AIRPORT - DAY\n\nThe staff is gathered at the airport, waiting for their flight.\n\nDWIGHT: (to the camera) I've packed everything I need for survival in Europe. Including a Swiss Army knife, a map, and a book on European history.\n\nPAM: (to Jim) Do you think Michael knows we need passports to get into Europe?

### Create a database of embeddings

In [73]:
embeddings = OpenAIEmbeddings()

In [74]:
db = Chroma.from_documents(texts, embeddings)

Using embedded DuckDB without persistence: data will be transient


### Create a question answering chain

In [77]:
qa_chain = RetrievalQA.from_chain_type(
    llm=chat_gpt, chain_type="stuff", retriever=db.as_retriever(search_kwargs={"k": 2}, verbose=True)
)

In [78]:
response = qa_chain("What is the location that The Office team is visiting?")
response

{'query': 'What is the location that The Office team is visiting?',
 'result': 'The Office team is visiting their sister branches in London, Paris, and Berlin.'}

In [79]:
print_response(response["result"])

The Office team is visiting their sister branches in London, Paris, and Berlin.


#### Lest test if it knows that this is not in the text

In [80]:
response = qa_chain("Why the client doesn't want to deal with Michael?")
print_response(response["result"])

There is no information provided in the context that suggests the client doesn't want to deal with
Michael.


# Bash Chain

The BashChain is a special chain that allows you to run bash commands. It is useful for chaining together bash commands with other chains.
[Documentation](https://python.langchain.com/en/latest/modules/chains/examples/llm_bash.html?highlight=bash%20chain)

In [86]:
text = """
Please write a bash script that prints a single line that Michael G. Scott from "The Office" might say
"""

bash_chain = LLMBashChain(llm=OpenAI(temperature=0), verbose=True)

bash_chain.run(text)



[1m> Entering new LLMBashChain chain...[0m

Please write a bash script that prints a single line that Michael G. Scott from "The Office" might say
[32;1m[1;3m
```bash
echo "That's what she said!"
```[0m['```bash', 'echo "That\'s what she said!"', '```']

Answer: [33;1m[1;3mThat's what she said!
[0m
[1m> Finished chain.[0m


"That's what she said!\n"