# Retrieval chains

Now, we'll combine the basic Expression Language building blocks, document loading and splitting, and vectorstores to create a retrieval chain that can
perform the last two steps of the RAG process:

![](./static/images/rag_diagram.png)

This chain will retrieve chunks that are most similar to the input query via vector similarity search, then will present them to the LLM as context to ground the LLM's generation of a final answer.

To start, let's split and load the CS229 lesson PDF transcript from earlier. For brevity, I've factored out the vectorstore initialization code into a helper function that takes arguments for chunk size and chunk overlap. We'll use bigger chunks this time:

In [1]:
import "dotenv/config";

[Module: null prototype] { default: {} }

In [2]:
import { loadAndSplitChunks } from "./lib/helpers.ts";

const splitDocs = await loadAndSplitChunks({
  chunkSize: 1536,
  chunkOverlap: 128,
});

Now, let's load those docs into a vectorstore the same way we did in the previous lesson using OpenAI embeddings:

In [3]:
import { initializeVectorstoreWithDocuments } from "./lib/helpers.ts";

const vectorstore = await initializeVectorstoreWithDocuments({
  documents: splitDocs,
});

Then, we'll create a `retriever` as before from that vectorstore that will fetch the documents for a given natural language query.

In [4]:
const retriever = vectorstore.asRetriever();

Now we're ready to start constructing our retrieval chain!

## Document retrieval in a chain

Retrievers take a string as direct input to the retriever, but we often find it convenient to have chains take an object parameter for flexibility.
So let's start by creating a simple sequence that will take an object with a field called `question` as an input, and formats the resulting documents' page content as strings. We'll use <doc></doc> XML-esque tags to separate the contents of each for clarity:

In [5]:
import { RunnableSequence } from "langchain/schema/runnable";
import { Document } from "langchain/document";

const convertDocsToString = (documents: Document[]): string => {
  return documents.map((document) => {
    return `<doc>\n${document.pageContent}\n</doc>`
  }).join("\n");
};

const documentRetrievalChain = RunnableSequence.from([
  (input) => input.question,
  retriever,
  convertDocsToString,
]);

const results = await documentRetrievalChain.invoke({
  question: "What are the prerequisites for this course?"
});

console.log(results);

<doc>
course information handout. So let me just say a few words about parts of these. On the 
third page, there's a section that says Online Resources.  
Oh, okay. Louder? Actually, could you turn up the volume? Testing. Is this better? 
Testing, testing. Okay, cool. Thanks.
</doc>
<doc>
of this class will not be very programming intensive, although we will do some 
programming, mostly in either MATLAB or Octave. I'll say a bit more about that later.  
I also assume familiarity with basic probability and statistics. So most undergraduate 
statistics class, like Stat 116 taught here at Stanford, will be more than enough. I'm gonna 
assume all of you know what random variables are, that all of you know what expectation 
is, what a variance or a random variable is. And in case of some of you, it's been a while 
since you've seen some of this material. At some of the discussion sections, we'll actually 
go over some of the prerequisites, sort of as a refresher course under prerequisite cl

Looks like that contains the raw information! Now, let's construct a chain that synthesizes that information into a human-legible response. We'll start with a prompt:

In [6]:
import { ChatPromptTemplate } from "langchain/prompts";

const TEMPLATE_STRING = `You are an experienced researcher, 
expert at interpreting and answering questions based on provided sources.
Using the provided context, answer the user's question 
to the best of your ability using only the resources provided. 
Be verbose!

<context>

{context}

</context>

Now, answer this question using the above context:

{question}`;

const answerGenerationPrompt = ChatPromptTemplate.fromTemplate(TEMPLATE_STRING);

One thing to note here is that our prompt requires an object with a `context` property as argument, while our previously defined `documentRetrievalChain` outputs a string. To make the arguments match, we use a `RunnableMap`. When a `RunnableMap` is invoked, it calls all runnables or runnable-like objects that it has as properties in parallel, invoking each with the input to the map. Then outputs an object whose properties are the results of those calls:

In [7]:
import { RunnableMap } from "langchain/schema/runnable";

const runnableMap = RunnableMap.from({
  context: documentRetrievalChain,
  question: (input) => input.question,
});

await runnableMap.invoke({ question: "What are the prerequisites for this course?" });

{
  question: [32m"What are the prerequisites for this course?"[39m,
  context: [32m"<doc>\n"[39m +
    [32m"course information handout. So let me just say a few words about parts of these. On the \n"[39m +
    [32m"third"[39m... 3063 more characters
}

Above, `documentRetrievalChain` is invoked with the object `{ question: "What are the prerequisites for this course?" }`, which results in the `documentRetrievalChain`'s output as a property names `context`. 

## Augmented generation

And that's the format we need to pass to our prompt! Let's see what this looks like with our document retrieval sequence!

In [8]:
import { ChatOpenAI } from "langchain/chat_models/openai";
import { StringOutputParser } from "langchain/schema/output_parser";

const model = new ChatOpenAI({
  modelName: "gpt-3.5-turbo-1106",
});

const retrievalChain = RunnableSequence.from([
  {
    context: documentRetrievalChain,
    question: (input) => input.question,
  },
  answerGenerationPrompt,
  model,
  new StringOutputParser(),
]);

A few things to note: 

- In the `RunnableSequence.from` method, objects are automatically coerced into `RunnableMap`s, so there is no need to use the initializer method.
- Because we want to pass `question` into both the `documentRetrievalChain` to fetch relevant documents as well as the `answerGenerationPrompt`, we add a second property to the `RunnableMap` called `question` that extracts just the `question` field from the original input to the map. This means that the answer generation prompt gets an object with both properties.

Because this pattern is so common, there is a helper called `RunnablePassthrough.assign()` that assigns new values to a `RunnableMap` while passing through through all existing properties. You could thus rewrite the above like this:

In [9]:
import { RunnablePassthrough } from "langchain/runnables";

const retrievalChain = RunnableSequence.from([
  RunnablePassthrough.assign({
    context: documentRetrievalChain,
  }),
  answerGenerationPrompt,
  model,
  new StringOutputParser(),
]);

Finally, let's try it end-to-end!

In [10]:
const answer = await retrievalChain.invoke({
  question: "What are the prerequisites for this course?"
});

console.log(answer);

Based on the course information provided in the context, the instructor assumes that students are familiar with basic probability and statistics, as well as basic linear algebra. For probability and statistics, it is mentioned that undergraduate statistics classes, like Stat 116 at Stanford, would provide sufficient preparation. Additionally, for linear algebra, undergraduate courses such as Math 51, Math 103, Math 113, or CS205 at Stanford are considered adequate prerequisites. The instructor mentions that students should be familiar with concepts such as random variables, expectation, variance, matrices, vectors, matrix multiplication, matrix inversion, and eigenvectors. However, the instructor also acknowledges that some students may need a refresher on these topics, and review sessions will be held to cover the prerequisites as needed. Overall, the prerequisites for this course include a basic understanding of probability, statistics, and linear algebra concepts.


Sweet! That looks pretty good.

But what if we want to ask a followup question?

In [11]:
const followupAnswer = await retrievalChain.invoke({
  question: "Can you list them in bullet point form?"
});

console.log(followupAnswer);

Based on the provided context, I am unable to identify a specific question or request for a list that can be answered in bullet point form. The context primarily consists of a course information handout and a lecture on machine learning, but there is no explicit question or request present. If there is a specific question or request you would like me to address, please provide it and I will do my best to assist you.


It didn't do so well there!

This occurs because LLMs do not have an innate sense of memory, and since we're not passing in any chat history as context, the LLM doesn't know what "them" referes to. We can update our prompt to take chat history into account as well, but we have a more fundamental problem: our vectorstore needs to return relevant documents too. Here's what happens if we try to query our vectorstore with the current followup question:

In [12]:
const docs = await documentRetrievalChain.invoke({
  question: "Can you list them in bullet point form?"
});

console.log(docs);

<doc>
course information handout. So let me just say a few words about parts of these. On the 
third page, there's a section that says Online Resources.  
Oh, okay. Louder? Actually, could you turn up the volume? Testing. Is this better? 
Testing, testing. Okay, cool. Thanks.
</doc>
<doc>
into four major sections. We're gonna talk about four major topics in this class, the first 
of which is supervised learning. So let me give you an example of that.  
So suppose you collect a data set of housing prices. And one of the TAs, Dan Ramage, 
actually collected a data set for me last week to use in the example later. But suppose that 
you go to collect statistics about how much houses cost in a certain geographic area. And 
Dan, the TA, collected data from housing prices in Portland, Oregon. So what you can do 
is let's say plot the square footage of the house against the list price of the house, right, so 
you collect data on a bunch of houses. And let's say you get a data set like this wit

You can see that we don't get anything relevant to the prerequisites of CS229.

## Adding history

The solution is to dereference the user's question into a rephrased standalone question. How? With an LLM of course!

First, we'll construct a new prompt with a `MessagesPlaceholder` where we can later inject or pass chat history messages:

In [13]:
import { MessagesPlaceholder } from "langchain/prompts";

const REPHRASE_QUESTION_SYSTEM_TEMPLATE = 
  `Given the following conversation and a follow up question, 
rephrase the follow up question to be a standalone question.`;

const rephraseQuestionChainPrompt = ChatPromptTemplate.fromMessages([
  ["system", REPHRASE_QUESTION_SYSTEM_TEMPLATE],
  new MessagesPlaceholder("history"),
  [
    "human", 
    "Rephrase the following question as a standalone question:\n{question}"
  ],
]);

And then we'll create a simple chain that uses this prompt:

In [14]:
const rephraseQuestionChain = RunnableSequence.from([
  rephraseQuestionChainPrompt,
  new ChatOpenAI({ temperature: 0.1, modelName: "gpt-3.5-turbo-1106" }),
  new StringOutputParser(),
]);

Let's try running this chain on our followup question. Note that `MessagesPlaceholder` is itself a parameter in our prompt that accepts a list of chat messages:

In [15]:
import { HumanMessage, AIMessage } from "langchain/schema";

const originalQuestion = "What are the prerequisites for this course?";

const originalAnswer = await retrievalChain.invoke({
  question: originalQuestion
});

console.log(originalAnswer);

const chatHistory = [
  new HumanMessage(originalQuestion),
  new AIMessage(originalAnswer),
];

await rephraseQuestionChain.invoke({
  question: "Can you list them in bullet point form?",
  history: chatHistory,
});

The prerequisites for this course include familiarity with basic probability and statistics, as well as basic linear algebra. The instructor assumes that students are already familiar with concepts such as random variables, expectation, variance, matrices, vectors, matrix multiplication, and matrix inverse. It is also mentioned that undergraduate statistics classes such as Stat 116 and undergraduate linear algebra courses like Math 51, 103, Math 113, or CS205 taught at Stanford would provide sufficient background for this course. Additionally, the ability to understand big O notation and knowledge of data structures such as linked lists, queues, and binary treatments is considered more important than specific programming language knowledge such as C or Java.


[32m"Could you please list them in bullet point form?"[39m

Great! That makes sense on its own. Now, let's put it all together with a new chain!

First, here's our document retrieval and formatting chain again:

In [16]:
const convertDocsToString = (documents: Document[]): string => {
  return documents.map((document) => `<doc>\n${document.pageContent}\n</doc>`).join("\n");
};

const documentRetrievalChain = RunnableSequence.from([
  (input) => input.standalone_question,
  retriever,
  convertDocsToString,
]);

Then, let's redefine our answer generation prompt to also have a `MessagesPlaceholder` for history messages:

In [17]:
const ANSWER_CHAIN_SYSTEM_TEMPLATE = `You are an experienced researcher, 
expert at interpreting and answering questions based on provided sources.
Using the below provided context and chat history, 
answer the user's question to the best of 
your ability 
using only the resources provided. Be verbose!

<context>
{context}
</context>`;

const answerGenerationChainPrompt = ChatPromptTemplate.fromMessages([
  ["system", ANSWER_CHAIN_SYSTEM_TEMPLATE],
  new MessagesPlaceholder("history"),
  [
    "human", 
    "Now, answer this question using the previous context and chat history:\n{standalone_question}"
  ]
]);

This prompt requires three inputs: `context` from the retriever, a `standalone_question` from the rephrasing chain, and an array of chat messages as `history`. We can invoke it with dummy values to get a sense of what's needed:

In [20]:
import { HumanMessage, AIMessage } from "langchain/schema";
await answerGenerationChainPrompt.formatMessages({
  context: "fake retrieved content",
  standalone_question: "Why is the sky blue?",
  history: [
    new HumanMessage("How are you?"),
    new AIMessage("Fine, thank you!")
  ]
});

InputFormatError: Error: Field "history" in prompt uses a MessagesPlaceholder, which expects an array of BaseMessages as an input value. Received: [
  [
    "human",
    "How are you?"
  ],
  [
    "ai",
    "Fine, thank you!"
  ]
]

Next let's assemble our conversation-capable retrieval chain by passing history cleanly through until the final generation prompt:

In [None]:
const conversationalRetrievalChain = RunnableSequence.from([
  RunnablePassthrough.assign({
    standalone_question: rephraseQuestionChain,
  }),
  RunnablePassthrough.assign({
    context: documentRetrievalChain,
  }),
  answerGenerationChainPrompt,
  new ChatOpenAI({ modelName: "gpt-3.5-turbo" }),
  new StringOutputParser(),
]);

We could pass history back and forth here, but we can instead streamline chat history tracking and sessions using a `MessageHistory` object, then wrap our chain in a manager that will automatically update the history, called `RunnableWithMessageHistory`:

In [18]:
import { RunnableWithMessageHistory } from "langchain/runnables";
import { ChatMessageHistory } from "langchain/stores/message/in_memory";

const messageHistory = new ChatMessageHistory();

const finalRetrievalChain = new RunnableWithMessageHistory({
  runnable: conversationalRetrievalChain,
  getMessageHistory: (_sessionId) => messageHistory,
  historyMessagesKey: "history",
  inputMessagesKey: "question",
});

The `RunnableWithMessageHistory` class wraps a runnable and automatically adds an additional property given by `historyMessagesKey` as the runnable's input. After it is invoked, it also updates the chat history with the value passed as `inputMessagesKey`, in this case, `question`.

`getMessageHistory` is a function that returns a new chat history object based on the passed session id. In the above demo case, we reuse the same history object for all calls, but in production environments, you'll want to assign a new object for each session to avoid mixing conversation histories up.

Let's try out the finished version!

In [19]:
const originalQuestion = "What are the prerequisites for this course?";

const originalAnswer = await finalRetrievalChain.invoke({
  question: originalQuestion,
}, {
  configurable: { sessionId: "test" }
});

const finalResult = await finalRetrievalChain.invoke({
  question: "Can you list them in bullet point form?",
}, {
  configurable: { sessionId: "test" }
});

console.log(finalResult);

- Familiarity with basic probability and statistics
- Knowledge of random variables, expectation, variance, and probability
- Completion of an undergraduate statistics class, such as Stat 116 at Stanford
- Familiarity with basic linear algebra
- Understanding of matrices, vectors, matrix multiplication, matrix inverse, and eigenvectors
- Completion of an undergraduate linear algebra course, such as Math 51, 103, Math 113, or CS205 at Stanford
- Basic programming skills, preferably in MATLAB or Octave
- Familiarity with big-O notation and basic computer skills


You can peruse this trace for an interactive example of the internals of the chain: https://smith.langchain.com/public/601c9879-54f3-4b5e-a09b-7b51d3c96757/r

Retrieval is a very deep topic, and there's no one-size fits all approach for loading, splitting, and querying your data. We encourage you to modify the above prompts and parameters for different models and data types.

In the final section, we'll show how to put this retrieval chain into production, including some interactions with web APIs and streaming intermediate steps.