Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: query analysis use case #17766

Merged
merged 56 commits into from
Feb 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
fcbe42f
WIP docs: query analysis
baskaryan Feb 19, 2024
adf29ea
fmt
baskaryan Feb 20, 2024
189d01f
fmt
baskaryan Feb 20, 2024
1c37464
fmt
baskaryan Feb 20, 2024
2add1c7
fmt
baskaryan Feb 20, 2024
e28b6eb
Merge branch 'master' into bagatur/query_analysis_docs
baskaryan Feb 20, 2024
8403103
fmt
baskaryan Feb 20, 2024
23e4d00
fmt
baskaryan Feb 20, 2024
d1d8dc2
fmt
baskaryan Feb 21, 2024
78c1b8e
fmt
baskaryan Feb 21, 2024
f09255a
fmt
baskaryan Feb 21, 2024
2ed7736
fmt
baskaryan Feb 21, 2024
6a511d2
Merge branch 'master' into bagatur/query_analysis_docs
baskaryan Feb 22, 2024
58f623e
fmt
baskaryan Feb 22, 2024
9aff9df
fmt
baskaryan Feb 22, 2024
b1b6233
Merge branch 'master' into bagatur/query_analysis_docs
baskaryan Feb 22, 2024
8dd8c0a
fmt
baskaryan Feb 22, 2024
2c6ffb2
fmt
baskaryan Feb 22, 2024
2be47ec
fmt
baskaryan Feb 22, 2024
46bd4b7
Merge branch 'master' into bagatur/query_analysis_docs
baskaryan Feb 23, 2024
03b845f
fmt
baskaryan Feb 23, 2024
13a748e
Merge branch 'master' into bagatur/query_analysis_docs
baskaryan Feb 26, 2024
3cd4aa6
fmt
baskaryan Feb 26, 2024
74a9356
Merge branch 'master' into bagatur/query_analysis_docs
baskaryan Feb 27, 2024
c3422b5
fmt
baskaryan Feb 27, 2024
bc1395a
fmt
baskaryan Feb 27, 2024
7fbe433
fmt
baskaryan Feb 28, 2024
b81fa68
fmt
baskaryan Feb 28, 2024
a2ba63a
fmt
baskaryan Feb 28, 2024
ca8cf8c
fmt
baskaryan Feb 28, 2024
e1f2098
fmt
baskaryan Feb 28, 2024
acb15cf
fmt
baskaryan Feb 28, 2024
5942066
fmt
baskaryan Feb 28, 2024
f72a1f7
fmt
baskaryan Feb 28, 2024
f52cad6
fmt
baskaryan Feb 28, 2024
28bf76b
fmt
baskaryan Feb 28, 2024
27cb77f
fmt
baskaryan Feb 28, 2024
22b7972
Merge branch 'master' into bagatur/query_analysis_docs
baskaryan Feb 28, 2024
a1ed230
fmt
baskaryan Feb 29, 2024
cda0fc3
fmt
baskaryan Feb 29, 2024
11d249f
fmt
baskaryan Feb 29, 2024
ec8e66e
fmt
baskaryan Feb 29, 2024
7d859d8
fmt
baskaryan Feb 29, 2024
284a8a0
fmt
baskaryan Feb 29, 2024
3d324c5
fmt
baskaryan Feb 29, 2024
4be57f5
fmt
baskaryan Feb 29, 2024
8241dcd
fmt
baskaryan Feb 29, 2024
dffb19f
fmt
baskaryan Feb 29, 2024
fe148bf
fmt
baskaryan Feb 29, 2024
e87e1c7
fmt
baskaryan Feb 29, 2024
12699b1
cr
hwchase17 Feb 29, 2024
93a8ca4
cr
hwchase17 Feb 29, 2024
92f162a
Merge branch 'master' into bagatur/query_analysis_docs
baskaryan Feb 29, 2024
c8fd699
fmt
baskaryan Feb 29, 2024
1d1ebce
fmt
baskaryan Feb 29, 2024
d390edd
fmt
baskaryan Feb 29, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
385 changes: 385 additions & 0 deletions docs/docs/use_cases/query_analysis/few_shot.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,385 @@
{
"cells": [
{
"cell_type": "raw",
"id": "df7d42b9-58a6-434c-a2d7-0b61142f6d3e",
"metadata": {},
"source": [
"---\n",
"sidebar_position: 2\n",
"---"
]
},
{
"cell_type": "markdown",
"id": "f2195672-0cab-4967-ba8a-c6544635547d",
"metadata": {},
"source": [
"# Adding examples to the prompt\n",
"\n",
"As our query analysis becomes more complex, adding examples to the prompt can meaningfully improve performance.\n",
"\n",
"Let's take a look at how we can add examples for the LangChain YouTube video query analyzer we built in the [Quickstart](/docs/use_cases/query_analysis/quickstart)."
]
},
{
"cell_type": "markdown",
"id": "a4079b57-4369-49c9-b2ad-c809b5408d7e",
"metadata": {},
"source": [
"## Setup\n",
"#### Install dependencies"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "e168ef5c-e54e-49a6-8552-5502854a6f01",
"metadata": {},
"outputs": [],
"source": [
"# %pip install -qU langchain-core langchain-openai"
]
},
{
"cell_type": "markdown",
"id": "79d66a45-a05c-4d22-b011-b1cdbdfc8f9c",
"metadata": {},
"source": [
"#### Set environment variables\n",
"\n",
"We'll use OpenAI in this example:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "40e2979e-a818-4b96-ac25-039336f94319",
"metadata": {},
"outputs": [],
"source": [
"import getpass\n",
"import os\n",
"\n",
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass()\n",
"\n",
"# Optional, uncomment to trace runs with LangSmith. Sign up here: https://smith.langchain.com.\n",
"# os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\"\n",
"# os.environ[\"LANGCHAIN_API_KEY\"] = getpass.getpass()"
]
},
{
"cell_type": "markdown",
"id": "57396e23-c192-4d97-846b-5eacea4d6b8d",
"metadata": {},
"source": [
"## Query schema\n",
"\n",
"We'll define a query schema that we want our model to output. To make our query analysis a bit more interesting, we'll add a `sub_queries` field that contains more narrow questions derived from the top level question."
]
},
{
"cell_type": "code",
"execution_count": 37,
"id": "0b51dd76-820d-41a4-98c8-893f6fe0d1ea",
"metadata": {},
"outputs": [],
"source": [
"from typing import List, Optional\n",
"\n",
"from langchain_core.pydantic_v1 import BaseModel, Field\n",
"\n",
"sub_queries_description = \"\"\"\\\n",
"If the original question contains multiple distinct sub-questions, \\\n",
"or if there are more generic questions that would be helpful to answer in \\\n",
"order to answer the original question, write a list of all relevant sub-questions. \\\n",
"Make sure this list is comprehensive and covers all parts of the original question. \\\n",
"It's ok if there's redundancy in the sub-questions. \\\n",
"Make sure the sub-questions are as narrowly focused as possible.\"\"\"\n",
"\n",
"\n",
"class Search(BaseModel):\n",
" \"\"\"Search over a database of tutorial videos about a software library.\"\"\"\n",
"\n",
" query: str = Field(\n",
" ...,\n",
" description=\"Primary similarity search query applied to video transcripts.\",\n",
" )\n",
" sub_queries: List[str] = Field(\n",
" default_factory=list, description=sub_queries_description\n",
" )\n",
" publish_year: Optional[int] = Field(None, description=\"Year video was published\")"
]
},
{
"cell_type": "markdown",
"id": "f8b08c52-1ce9-4d8b-a779-cbe8efde51d1",
"metadata": {},
"source": [
"## Query generation"
]
},
{
"cell_type": "code",
"execution_count": 64,
"id": "783c03c3-8c72-4f88-9cf4-5829ce6745d6",
"metadata": {},
"outputs": [],
"source": [
"from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder\n",
"from langchain_core.runnables import RunnablePassthrough\n",
"from langchain_openai import ChatOpenAI\n",
"\n",
"system = \"\"\"You are an expert at converting user questions into database queries. \\\n",
"You have access to a database of tutorial videos about a software library for building LLM-powered applications. \\\n",
"Given a question, return a list of database queries optimized to retrieve the most relevant results.\n",
"\n",
"If there are acronyms or words you are not familiar with, do not try to rephrase them.\"\"\"\n",
"\n",
"prompt = ChatPromptTemplate.from_messages(\n",
" [\n",
" (\"system\", system),\n",
" MessagesPlaceholder(\"examples\", optional=True),\n",
" (\"human\", \"{question}\"),\n",
" ]\n",
")\n",
"llm = ChatOpenAI(model=\"gpt-3.5-turbo-0125\", temperature=0)\n",
"structured_llm = llm.with_structured_output(Search)\n",
"query_analyzer = {\"question\": RunnablePassthrough()} | prompt | structured_llm"
]
},
{
"cell_type": "markdown",
"id": "f403517a-b8e3-44ac-b0a6-02f8305635a2",
"metadata": {},
"source": [
"Let's try out our query analyzer without any examples in the prompt:"
]
},
{
"cell_type": "code",
"execution_count": 65,
"id": "0bcfce06-6f0c-4f9d-a1fc-dc29342d2aae",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Search(query='web voyager vs reflection agents', sub_queries=['difference between web voyager and reflection agents', 'do web voyager and reflection agents use langgraph'], publish_year=None)"
]
},
"execution_count": 65,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"query_analyzer.invoke(\n",
" \"what's the difference between web voyager and reflection agents? do both use langgraph?\"\n",
")"
]
},
{
"cell_type": "markdown",
"id": "00962b08-899c-465c-9a41-6459b207e0f2",
"metadata": {},
"source": [
"## Adding examples and tuning the prompt\n",
"\n",
"This works pretty well, but we probably want it to decompose the question even further to separate the queries about Web Voyager and Reflection Agents.\n",
"\n",
"To tune our query generation results, we can add some examples of inputs questions and gold standard output queries to our prompt."
]
},
{
"cell_type": "code",
"execution_count": 53,
"id": "15b4923d-a08e-452d-8889-9a09a57d1095",
"metadata": {},
"outputs": [],
"source": [
"examples = []"
]
},
{
"cell_type": "code",
"execution_count": 54,
"id": "da5330e6-827a-40e5-982b-b23b6286b758",
"metadata": {},
"outputs": [],
"source": [
"question = \"What's chat langchain, is it a langchain template?\"\n",
"query = Search(\n",
" query=\"What is chat langchain and is it a langchain template?\",\n",
" sub_queries=[\"What is chat langchain\", \"What is a langchain template\"],\n",
")\n",
"examples.append({\"input\": question, \"tool_calls\": [query]})"
]
},
{
"cell_type": "code",
"execution_count": 55,
"id": "580e857a-27df-4ecf-a19c-458dc9244ec8",
"metadata": {},
"outputs": [],
"source": [
"question = \"How to build multi-agent system and stream intermediate steps from it\"\n",
"query = Search(\n",
" query=\"How to build multi-agent system and stream intermediate steps from it\",\n",
" sub_queries=[\n",
" \"How to build multi-agent system\",\n",
" \"How to stream intermediate steps from multi-agent system\",\n",
" \"How to stream intermediate steps\",\n",
" ],\n",
")\n",
"\n",
"examples.append({\"input\": question, \"tool_calls\": [query]})"
]
},
{
"cell_type": "code",
"execution_count": 56,
"id": "fa63310d-69e3-4701-825c-fbb01f8a5a16",
"metadata": {},
"outputs": [],
"source": [
"question = \"LangChain agents vs LangGraph?\"\n",
"query = Search(\n",
" query=\"What's the difference between LangChain agents and LangGraph? How do you deploy them?\",\n",
" sub_queries=[\n",
" \"What are LangChain agents\",\n",
" \"What is LangGraph\",\n",
" \"How do you deploy LangChain agents\",\n",
" \"How do you deploy LangGraph\",\n",
" ],\n",
")\n",
"examples.append({\"input\": question, \"tool_calls\": [query]})"
]
},
{
"cell_type": "markdown",
"id": "bd21389c-f862-44e6-9d51-92db10979525",
"metadata": {},
"source": [
"Now we need to update our prompt template and chain so that the examples are included in each prompt. Since we're working with OpenAI function-calling, we'll need to do a bit of extra structuring to send example inputs and outputs to the model. We'll create a `tool_example_to_messages` helper function to handle this for us:"
]
},
{
"cell_type": "code",
"execution_count": 57,
"id": "68b03709-9a60-4acf-b96c-cafe1056c6f3",
"metadata": {},
"outputs": [],
"source": [
"import uuid\n",
"from typing import Dict\n",
"\n",
"from langchain_core.messages import (\n",
" AIMessage,\n",
" BaseMessage,\n",
" HumanMessage,\n",
" SystemMessage,\n",
" ToolMessage,\n",
")\n",
"\n",
"\n",
"def tool_example_to_messages(example: Dict) -> List[BaseMessage]:\n",
" messages: List[BaseMessage] = [HumanMessage(content=example[\"input\"])]\n",
" openai_tool_calls = []\n",
" for tool_call in example[\"tool_calls\"]:\n",
" openai_tool_calls.append(\n",
" {\n",
" \"id\": str(uuid.uuid4()),\n",
" \"type\": \"function\",\n",
" \"function\": {\n",
" \"name\": tool_call.__class__.__name__,\n",
" \"arguments\": tool_call.json(),\n",
" },\n",
" }\n",
" )\n",
" messages.append(\n",
" AIMessage(content=\"\", additional_kwargs={\"tool_calls\": openai_tool_calls})\n",
" )\n",
" tool_outputs = example.get(\"tool_outputs\") or [\n",
" \"You have correctly called this tool.\"\n",
" ] * len(openai_tool_calls)\n",
" for output, tool_call in zip(tool_outputs, openai_tool_calls):\n",
" messages.append(ToolMessage(content=output, tool_call_id=tool_call[\"id\"]))\n",
" return messages\n",
"\n",
"\n",
"example_msgs = [msg for ex in examples for msg in tool_example_to_messages(ex)]"
]
},
{
"cell_type": "code",
"execution_count": 58,
"id": "d9bf9f87-3e6b-4fc2-957b-949b077fab54",
"metadata": {},
"outputs": [],
"source": [
"from langchain_core.prompts import MessagesPlaceholder\n",
"\n",
"query_analyzer_with_examples = (\n",
" {\"question\": RunnablePassthrough()}\n",
" | prompt.partial(examples=example_msgs)\n",
" | structured_llm\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 62,
"id": "e565ccb0-3530-4782-b56b-d1f6d0a8e559",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Search(query='Difference between web voyager and reflection agents, do they both use LangGraph?', sub_queries=['What is Web Voyager', 'What are Reflection agents', 'Do Web Voyager and Reflection agents use LangGraph'], publish_year=None)"
]
},
"execution_count": 62,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"query_analyzer_with_examples.invoke(\n",
" \"what's the difference between web voyager and reflection agents? do both use langgraph?\"\n",
")"
]
},
{
"cell_type": "markdown",
"id": "e5ea49ff-be53-4072-8c25-08682bb31a19",
"metadata": {},
"source": [
"Thanks to our examples we get a slightly more decomposed search query. With some more prompt engineering and tuning of our examples we could improve query generation even more.\n",
"\n",
"You can see that the examples are passed to the model as messages in the [LangSmith trace](https://smith.langchain.com/public/aeaaafce-d2b1-4943-9a61-bc954e8fc6f2/r)."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading
Loading