# Langchain在业务中的应用

### 环境准备

In [6]:
# load api keys from .env
import os
from dotenv import load_dotenv
load_dotenv()
openai_api_key=os.environ.get('OPENAI_API_KEY')

# load language model
from langchain.llms import OpenAI
llm = OpenAI(model_name="text-davinci-003", openai_api_key=openai_api_key, temperature=0)

# import langchain
# langchain.debug = False

### 提示工程 Prompt Engineering
#### 大模型要在生产中落地要解决的一些问题
1. 一本正经的胡说八道, 幻觉 `Hallucinations`
2. 结果的不确定性
3. 不能与外部环境交互
4. Token 数量的限制, 预训练模型只是长期的记忆

但是 **一般来说，模型在100亿到1000亿参数区间，可能产生能力涌现**

#### 几个典型的 Pattern
1. Zero shot 零样本, 没有参考的例子
2. Few shot 少样本, 有参考的例子
3. CoT(Chain of Thought) 思维链, 不只是有参考例子, 还有思考的步骤

##### 零样本 Zero shot
零样本, 没有参考的例子

In [7]:
# 推理 Reasoning
llm("What day comes after Friday?")

'\n\nSaturday'

In [11]:
# 分类 Classification
# 休假中, 老板还要让我们做PPT
llm("""
对给定的句子进行情感分类，分为中性、负面、正面三类。
文本：我觉得这个假期还行。
情感分类：
""")

'中性'

In [12]:
# 简单的问题 Simple question
# How would I get from Shanghai to Phuket?
llm("我可以如何从上海到普吉岛")

'？\n\n从上海到普吉岛最常见的交通方式是乘坐飞机。您可以从上海浦东国际机场乘坐国内航班或国际航班前往普吉岛。也可以从上海乘坐高铁前往曼谷，然后再乘坐船前往普吉岛。'

确实给出了一个结果, 但是这类结果可能不是我么你大多数人想要得到的

In [13]:
# 复杂一点的推理 More complex reasoning
llm("The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have?")

'\n\nThey have 29 apples.'

给出了一个完全错误的答案

In [14]:
# 更复杂一点的计算 More complex calculation
llm("""
The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1. 
The answer is:
""")

'Yes, the odd numbers in this group add up to an even number. The sum of the odd numbers is 129, which is an even number.'

这个错的更加离谱了

##### 少样本 Few shot
少样本, 有参考的例子

In [15]:
# Few shot
llm("""
This is awesome! // Negative
This is bad! // Positive
哇那部电影太棒了！ // Positive
What a horrible show! //
""")

'Negative'

In [16]:
llm("""
Q: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now?
A: The answer is 11.
Q: The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have?
""")

'A: The answer is 9.'

In [17]:
# Few shot
llm("""
The odd numbers in this group add up to an even number: 4, 8, 9, 15, 12, 2, 1.
A: The answer is False.
The odd numbers in this group add up to an even number: 17,  10, 19, 4, 8, 12, 24.
A: The answer is True.
The odd numbers in this group add up to an even number: 16,  11, 14, 4, 8, 13, 24.
A: The answer is True.
The odd numbers in this group add up to an even number: 17,  9, 10, 12, 13, 4, 2.
A: The answer is False.
The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1. 
A:
""")

'The answer is True.'

这个结果还是不对的!

##### 思维链 Chain of Thought (CoT) 

In [18]:
# CoT
llm("""
The odd numbers in this group add up to an even number: 4, 8, 9, 15, 12, 2, 1.
A: Adding all the odd numbers (9, 15, 1) gives 25. The answer is False.
The odd numbers in this group add up to an even number: 17,  10, 19, 4, 8, 12, 24.
A: Adding all the odd numbers (17, 19) gives 36. The answer is True.
The odd numbers in this group add up to an even number: 16,  11, 14, 4, 8, 13, 24.
A: Adding all the odd numbers (11, 13) gives 24. The answer is True.
The odd numbers in this group add up to an even number: 17,  9, 10, 12, 13, 4, 2.
A: Adding all the odd numbers (17, 9, 13) gives 39. The answer is False.
The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1. 
A:
""")

'Adding all the odd numbers (15, 5, 13, 7, 1) gives 41. The answer is False.'

In [19]:
# Zero shot with CoT
llm("""
The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1. 
please think step by step then tell me the answer.
""")

'\nStep 1: Add the odd numbers: 15 + 5 + 13 + 7 + 1 = 41\n\nStep 2: 41 is an odd number.'

再回到那个如何去普吉岛的问题

In [21]:
# Zero shot with CoT
# How would I get from Shanghai to Phuket?
steps= llm("Explain step by step. How would I get from Shanghai to Phuket?")
print(steps)



1. Book a flight from Shanghai to Phuket. You can do this online or through a travel agent.

2. Check the visa requirements for entering Thailand. Depending on your nationality, you may need to apply for a visa in advance.

3. Pack your bags and make sure you have all the necessary documents for your trip.

4. Arrive at the airport in Shanghai at least two hours before your flight.

5. Check in for your flight and go through security.

6. Board your flight and enjoy the journey to Phuket.

7. Upon arrival in Phuket, go through immigration and customs.

8. Collect your luggage and make your way to your accommodation.


#### 后面在介绍 Langchain 开发架构的最后会介绍 `ReAct`
对 Prompt 总结了一系列的 Pattern ....., 类似我们的设计模式

### Langchain的核心概念

#### Prompt Template

In [22]:
from langchain import PromptTemplate

# Notice "location" below, that is a placeholder for another value later
template = """
I really want to travel from Shanghai to {location}. What should I do there?

Respond in one short sentence
"""

prompt = PromptTemplate(
    input_variables=["location"],
    template=template,
)

final_prompt = prompt.format(location='Phuket')

print (f"Final Prompt: {final_prompt}")
print ("-----------")
print (f"LLM Output: {llm(final_prompt)}")

Final Prompt: 
I really want to travel from Shanghai to Phuket. What should I do there?

Respond in one short sentence

-----------
LLM Output: 
Explore the beaches, visit the temples, and enjoy the local cuisine.


#### Output Parsers

In [23]:
prompt = f"""
Generate a list of three made-up book titles along \
with their authors and genres.
Provide them in JSON format with the following keys:
book_id, title, author, genre.
"""

llm_output = llm(prompt)
print(llm_output)


[
    {
        "book_id": 1,
        "title": "The Lost City of Atlantis",
        "author": "John Smith",
        "genre": "Fantasy"
    },
    {
        "book_id": 2,
        "title": "The Secret of the Golden Pyramid",
        "author": "Jane Doe",
        "genre": "Mystery"
    },
    {
        "book_id": 3,
        "title": "The Curse of the Mummy's Tomb",
        "author": "James Johnson",
        "genre": "Horror"
    }
]


我们可以用来构造测试数据, 但是这个例子不是 Langchain 框架提供的能力

In [24]:
from langchain.output_parsers import StructuredOutputParser, ResponseSchema

# How you would like your response structured. This is basically a fancy prompt template
response_schemas = [
    ResponseSchema(name="bad_string", description="This a poorly formatted user input string"),
    ResponseSchema(name="good_string", description="This is your response, a reformatted response")
]

# How you would like to parse your output
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)

# See the prompt template you created for formatting
format_instructions = output_parser.get_format_instructions()
print (format_instructions)

The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "\`\`\`json" and "\`\`\`":

```json
{
	"bad_string": string  // This a poorly formatted user input string
	"good_string": string  // This is your response, a reformatted response
}
```


In [25]:
template = """
You will be given a poorly formatted string from a user.
Reformat it and make sure all the words are spelled correctly

{format_instructions}

% USER INPUT:
{user_input}

YOUR RESPONSE:
"""

prompt = PromptTemplate(
    input_variables=["user_input"],
    partial_variables={"format_instructions": format_instructions},
    template=template
)

promptValue = prompt.format(user_input="welcom to Shanhai!")

print(promptValue)

llm_output = llm(promptValue)
print(llm_output)


You will be given a poorly formatted string from a user.
Reformat it and make sure all the words are spelled correctly

The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "\`\`\`json" and "\`\`\`":

```json
{
	"bad_string": string  // This a poorly formatted user input string
	"good_string": string  // This is your response, a reformatted response
}
```

% USER INPUT:
welcom to Shanhai!

YOUR RESPONSE:

```json
{
	"bad_string": "welcom to Shanhai!",
	"good_string": "Welcome to Shanghai!"
}
```


#### 访问数据库
Chain

In [26]:
from langchain import SQLDatabase, SQLDatabaseChain

sqlite_db_path = 'San_Francisco_Trees.db'
db = SQLDatabase.from_uri(f"sqlite:///{sqlite_db_path}")

db_chain = SQLDatabaseChain(llm=llm, database=db, verbose=True)

db_chain.run("How many Species of trees are there in San Francisco?")



[1m> Entering new SQLDatabaseChain chain...[0m
How many Species of trees are there in San Francisco?
SQLQuery:[32;1m[1;3mSELECT COUNT(DISTINCT "qSpecies") FROM "SFTrees";[0m
SQLResult: [33;1m[1;3m[(578,)][0m
Answer:[32;1m[1;3mThere are 578 Species of trees in San Francisco.[0m
[1m> Finished chain.[0m


'There are 578 Species of trees in San Francisco.'

#### 加载文档, 建立 Index

##### Text Splitters

In [27]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

# This is a long document we can split up.
with open('koo.txt') as f:
    pg_work = f.read()
    
print (f"You have {len([pg_work])} document")

You have 1 document


In [28]:
text_splitter = RecursiveCharacterTextSplitter(
    # Set a really small chunk size, just to show.
    chunk_size = 150,
    chunk_overlap  = 20,
)

texts = text_splitter.create_documents([pg_work])

print (f"You have {len(texts)} documents")

You have 13 documents


In [29]:
print ("Preview:")
print (texts[0].page_content, "\n")
print (texts[1].page_content)

Preview:
就是看到没有?
今天上午刚办的。
"2020年7月11号69000的额度,我跟这个姐姐办的,他是开美容店的。"
"69000的额度,他也在用戒备。"
"对,然后觉得我们这边利息低,直接把我们的取来把借呗还了就不用借给了。"
嗯还一下的话就是没有借呗。
"现在没有,以前你们做网上贷的。" 

"现在没有,以前你们做网上贷的。"
"网上贷的话利率是大概多少?都帮您看的,也没有"
我有网上的。
"但是你的利息要3块5也比我们这个贵,那像你现在还是信用卡用的多一点,嘛"
信用卡的话利息也。
"利息也高,还有个手续费,不管。"
我帮你看涨分能不能涨。
你信用卡最高一个额度是多少?
高了。


##### Summarization Chain

In [30]:
from langchain.chains.summarize import load_summarize_chain

# There is a lot of complexity hidden in this one line. I encourage you to check out the video above for more detail
chain = load_summarize_chain(llm, chain_type="map_reduce", verbose=False)
chain.run(texts)

'\n\nThe speaker is discussing the interest rates of online loans and credit cards, and offering a credit card with a maximum limit of 730 points and no fees. They then discuss the security of the investment, the speed of uploading files, and the sales situation, and suggest that there is no need to worry. Finally, they state that a minimum income of 15000 is required to ensure a stable income. They also suggest that two emergency contacts should be chosen.'

##### Retrievers

In [31]:
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings

# The LangChain component we'll use to get the documents
from langchain.chains import RetrievalQA

# Get embedding engine ready
embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)

# Embedd your texts
docsearch = FAISS.from_documents(texts, embeddings)

qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=docsearch.as_retriever())

In [32]:
query = "用户的收入是多少?"
qa.run(query)

' 月收入的话是1万。'

In [33]:
query = "这段对话里有哪几个角色?"
qa.run(query)

' 这段对话里有两个角色：一个是提问者，另一个是回答者。'

### Agent

#### Manual ReACT

![diagram](https://raw.githubusercontent.com/quboqin/images/main/blogs/picturesdiagram.png)

In [34]:
question = "How old is the president of the United States?"

manual_react = f"""Question: What is the elevation range for the area that the eastern sector of the Colorado orogeny extends into?
Thought: I need to search Colorado orogeny, find the area that the eastern sector of the Colorado orogeny extends into, then find the elevation range of the area.
Action: Search[Colorado orogeny]
Observation: The Colorado orogeny was an episode of mountain building (an orogeny) in Colorado and surrounding areas.
Thought: It does not mention the eastern sector. So I need to look up eastern sector.
Action: Lookup[eastern sector]
Observation: (Result 1 / 1) The eastern sector extends into the High Plains and is called the Central Plains orogeny.
Thought: The eastern sector of Colorado orogeny extends into the High Plains. So I need to search High Plains and find its elevation range.
Action: Search[High Plains]
Observation: High Plains refers to one of two distinct land regions
Thought: I need to instead search High Plains (United States).
Action: Search[High Plains (United States)]
Observation: The High Plains are a subregion of the Great Plains. From east to west, the High Plains rise in elevation from around 1,800 to 7,000 ft (550 to 2,130 m).[3]
Thought: High Plains rise in elevation from around 1,800 to 7,000 ft, so the answer is 1,800 to 7,000 ft.
Action: Finish[1,800 to 7,000 ft]

Question: Musician and satirist Allie Goertz wrote a song about the "The Simpsons" character Milhouse, who Matt Groening named after who?
Thought: The question simplifies to "The Simpsons" character Milhouse is named after who. I only need to search Milhouse and find who it is named after.
Action: Search[Milhouse]
Observation: Milhouse Mussolini Van Houten is a recurring character in the Fox animated television series The Simpsons voiced by Pamela Hayden and created by Matt Groening.
Thought: The paragraph does not tell who Milhouse is named after, maybe I can look up "named after".
Action: Lookup[named after]
Observation: (Result 1 / 1) Milhouse was named after U.S. president Richard Nixon, whose middle name was Milhous.
Thought: Milhouse was named after U.S. president Richard Nixon, so the answer is Richard Nixon.
Action: Finish[Richard Nixon]

Question: Which documentary is about Finnish rock groups, Adam Clayton Powell or The Saimaa Gesture?
Thought: I need to search Adam Clayton Powell and The Saimaa Gesture, and find which documentary is about Finnish rock groups.
Action: Search[Adam Clayton Powell]
Observation: Could not find [Adam Clayton Powell]. Similar: [’Adam Clayton Powell III’, ’Seventh Avenue (Manhattan)’, ’Adam Clayton Powell Jr. State Office Building’, ’Isabel Washington Powell’, ’Adam Powell’, ’Adam Clayton Powell (film)’, ’Giancarlo Esposito’].
Thought: To find the documentary, I can search Adam Clayton Powell (film).
Action: Search[Adam Clayton Powell (film)]
Observation: Adam Clayton Powell is a 1989 American documentary film directed by Richard Kilberg. The film is about the rise and fall of influential African-American politician Adam Clayton Powell Jr.[3][4] It was later aired as part of the PBS series The American Experience.
Thought: Adam Clayton Powell (film) is a documentary about an African-American politician, not Finnish rock groups. So the documentary about Finnish rock groups must instead be The Saimaa Gesture.
Action: Finish[The Saimaa Gesture]

Question: What profession does Nicholas Ray and Elia Kazan have in common?
Thought: I need to search Nicholas Ray and Elia Kazan, find their professions, then find the profession they have in common.
Action: Search[Nicholas Ray]
Observation: Nicholas Ray (born Raymond Nicholas Kienzle Jr., August 7, 1911 - June 16, 1979) was an American film director, screenwriter, and actor best known for the 1955 film Rebel Without a Cause.
Thought: Professions of Nicholas Ray are director, screenwriter, and actor. I need to search Elia Kazan next and find his professions.
Action: Search[Elia Kazan]
Observation: Elia Kazan was an American film and theatre director, producer, screenwriter and actor.
Thought: Professions of Elia Kazan are director, producer, screenwriter, and actor. So profession Nicholas Ray and Elia Kazan have in common is director, screenwriter, and actor.
Action: Finish[director, screenwriter, actor]

Question: Which magazine was started first Arthur’s Magazine or First for Women?
Thought: I need to search Arthur’s Magazine and First for Women, and find which was started first.
Action: Search[Arthur’s Magazine]
Observation: Arthur’s Magazine (1844-1846) was an American literary periodical published in Philadelphia in the 19th century.
Thought: Arthur’s Magazine was started in 1844. I need to search First for Women next.
Action: Search[First for Women]
Observation: First for Women is a woman’s magazine published by Bauer Media Group in the USA.[1] The magazine was started in 1989.
Thought: First for Women was started in 1989. 1844 (Arthur’s Magazine) < 1989 (First for Women), so Arthur’s Magazine was started first.
Action: Finish[Arthur’s Magazine]

Question:{question}"""

print(llm(manual_react))


Thought: I need to search the president of the United States, find their age, then answer the question.
Action: Search[president of the United States]
Observation: Joe Biden is the 46th and current president of the United States.
Thought: Joe Biden is the president of the United States. I need to search Joe Biden and find his age.
Action: Search[Joe Biden]
Observation: Joseph Robinette Biden Jr. (born November 20, 1942) is an American politician who is the 46th and current president of the United States.
Thought: Joe Biden was born in 1942, so he is 78 years old.
Action: Finish[78 years old]


#### ReAct
![image-20230406213322739](https://raw.githubusercontent.com/quboqin/images/main/blogs/picturesimage-20230406213322739.png)

In [35]:
import langchain

langchain.debug = True

from langchain import OpenAI, Wikipedia
from langchain.agents import initialize_agent, Tool
from langchain.agents import AgentType
from langchain.agents.react.base import DocstoreExplorer

In [36]:
docstore=DocstoreExplorer(Wikipedia())

tools = [
    Tool(
        name="Search",
        func=docstore.search,
        description="useful for when you need to ask with search"
    ),
    Tool(
        name="Lookup",
        func=docstore.lookup,
        description="useful for when you need to ask with lookup"
    )
]

llm = OpenAI(temperature=0, model_name="text-davinci-003")


react = initialize_agent(tools, llm, agent=AgentType.REACT_DOCSTORE, verbose=True)

In [39]:
question = "How old is the company that makes the iPhone? The year is 2023"
react.run(question)

[32;1m[1;3m[chain/start][0m [1m[1:RunTypeEnum.chain:AgentExecutor] Entering Chain run with input:
[0m{
  "input": "How old is the company that makes the iPhone? The year is 2023"
}
[32;1m[1;3m[chain/start][0m [1m[1:RunTypeEnum.chain:AgentExecutor > 2:RunTypeEnum.chain:LLMChain] Entering Chain run with input:
[0m{
  "input": "How old is the company that makes the iPhone? The year is 2023",
  "agent_scratchpad": "",
  "stop": [
    "\nObservation:"
  ]
}
[32;1m[1;3m[llm/start][0m [1m[1:RunTypeEnum.chain:AgentExecutor > 2:RunTypeEnum.chain:LLMChain > 3:RunTypeEnum.llm:OpenAI] Entering LLM run with input:
[0m{
  "prompts": [
    "Question: What is the elevation range for the area that the eastern sector of the Colorado orogeny extends into?\nThought: I need to search Colorado orogeny, find the area that the eastern sector of the Colorado orogeny extends into, then find the elevation range of the area.\nAction: Search[Colorado orogeny]\nObservation: The Colorado orogeny was a

'47 years old'

### Wrap up
目前存在的几个问题
1. ReAct 要处理的 Token 太多, 且要多次调用 LLM
这个可以参考 [最新ReWOO框架直指Auto-GPT和LangChain代理的冗杂性，提出轻量级LLM与工具的交互范式](https://mp.weixin.qq.com/s/8cEBOwUyG0zGlC74IuFNeg)
2. 在构建知识库的时候, 我们PRD文档中, 有大量有价值的图, 需要大模型有多模态的能力
3. 考虑知识产权的问题, 私有化部署 LLaMA 羊驼, 是否可以达到 openai 的能力

### 参考资料
1. [ReAct: Synergizing Reasoning and Acting in Language Models](https://react-lm.github.io/)
2. [Prompt Engineering Guide](https://www.promptingguide.ai/)
3. [ChatGPT学习笔记](https://wqw547243068.github.io/chatgpt#%E6%95%B0%E5%AD%A6%E5%8E%9F%E7%90%86)
4. [LangChain 中文入门教程](https://github.com/liaokongVFX/LangChain-Chinese-Getting-Started-Guide)
5. [LangChain 官方文档](https://python.langchain.com/docs/get_started/introduction.html)