# Retrieval chains

Now, we'll combine the basic Expression Language building blocks, document loading and splitting, and vectorstores to create a retrieval chain that can
perform the last two steps of the RAG process:

![](./static/images/rag_diagram.png)

This chain will retrieve chunks that are most similar to the input query, then will present them to the LLM as context to ground the LLM's generation of a final answer.

// Start with high level steps + overview of what's to come

To start, let's split and load the CS229 lesson PDF transcript from earlier. We'll use bigger chunks this time:

In [1]:
import "npm:dotenv/config";

[Module: null prototype] { default: {} }

In [2]:
// Peer dependency
import * as parse from "npm:pdf-parse";
import { PDFLoader } from "npm:langchain@0.0.201/document_loaders/fs/pdf";
import { RecursiveCharacterTextSplitter } from "npm:langchain@0.0.201/text_splitter";

const loader = new PDFLoader("./static/docs/MachineLearning-Lecture01.pdf");

const rawCS229Docs = await loader.load();

const splitter = new RecursiveCharacterTextSplitter({
  chunkSize: 1536,
  chunkOverlap: 128,
});

const splitDocs = await splitter.splitDocuments(rawCS229Docs);

// Make sure you explain chunkOverlap

Now, let's load those docs into a vectorstore the same way we did in the previous lesson using OpenAI embeddings:

In [3]:
import { MemoryVectorStore } from "npm:langchain@0.0.201/vectorstores/memory";
import { OpenAIEmbeddings } from "npm:langchain@0.0.201/embeddings/openai";

const embeddings = new OpenAIEmbeddings();

const vectorstore = new MemoryVectorStore(embeddings);

await vectorstore.addDocuments(splitDocs);

Then, we'll create a `retriever` from that vectorstore that will fetch the documents for a given natural language query. There are other types of document fetchers other than vectorstores, which is why it's nicer to have a more abstract interface:

// Let's mention it's a runnable here

In [4]:
const retriever = vectorstore.asRetriever();

Now we're ready to start constructing our retrieval chain!

Retrievers take a string as direct input to the retriever, but we often find it convenient to have chains take an object parameter for flexibility.
So let's start by creating a simple sequence that will take an object with a field called `question`, and formats the resulting documents' page content as strings. We'll use <doc></doc> XML-esque tags to separate the contents of each for clarity:

In [5]:
import { RunnableSequence } from "npm:langchain@0.0.201/schema/runnable";
import { Document } from "npm:langchain@0.0.201/document";

const convertDocsToString = (documents: Document[]): string => {
  return documents.map((document) => `<doc>\n${document.pageContent}\n</doc>`).join("\n");
};

const documentRetrievalChain = RunnableSequence.from([
  (input) => input.question,
  retriever,
  convertDocsToString,
]);

const results = await documentRetrievalChain.invoke({
  question: "What are the prerequisites for this course?"
});

console.log(results);

<doc>
of this class will not be very program ming intensive, although we will do some  
programming, mostly in either MATLAB or Octa ve. I'll say a bit more about that later.   
I also assume familiarity with basic proba bility and statistics. So most undergraduate  
statistics class, like Stat 116 taught here at  Stanford, will be more than enough. I'm gonna  
assume all of you know what ra ndom variables are,  that all of you know  what expectation  
is, what a variance or a random variable is.  And in case of some of you, it's been a while  
since you've seen some of this material. At  some of the discussion sections, we'll actually  
go over some of the prerequisites, sort of as  a refresher course under prerequisite class.  
I'll say a bit more about  that later as well.   
Lastly, I also assume familiarity with basi c linear algebra. And again, most undergraduate  
linear algebra courses are more than enough.  So if you've taken courses like Math 51,  
103, Math 113 or CS205 at S

Looks like that contains the raw information! Now, let's construct a chain that synthesizes that information into a concise response. We'll start with a prompt:

In [6]:
import { ChatPromptTemplate } from "npm:langchain@0.0.200/prompts";

const TEMPLATE_STRING = `You are an experienced researcher, expert at interpreting and answering questions based on provided sources.
Using the provided context, answer the user's question to the best of your ability using only the resources provided. Be concise!

<context>

{context}

</context>

Now, answer this question using the above context:

{question}`;

const answerGenerationPrompt = ChatPromptTemplate.fromTemplate(TEMPLATE_STRING);

Now let's put that together with our document retrieval sequence!

One thing to note here is that our prompt requires an object with a `context` property as argument, while our previously defined `documentRetrievalChain` outputs a string. To make the arguments match, we use a `RunnableMap`. When a `RunnableMap` is invoked, it calls all runnables or runnable-like objects that it has as properties in parallel, invoking each with the input to the map. Then outputs an object whose properties are the results of those calls:

// Add picture

In [7]:
import { RunnableMap } from "npm:langchain@0.0.201/schema/runnable";

const runnableMap = RunnableMap.from({
  context: documentRetrievalChain,
  question: (input) => input.question,
});

await runnableMap.invoke({ question: "What are the prerequisites for this course?" });

{
  question: [32m"What are the prerequisites for this course?"[39m,
  context: [32m"<doc>\n"[39m +
    [32m"of this class will not be very program ming intensive, although we will do some  \n"[39m +
    [32m"programming,"[39m... 4728 more characters
}

Above, `documentRetrievalChain` is invoked with the object `{ question: "What are the prerequisites for this course?" }`, which results in the `documentRetrievalChain`'s output as a property names `context`. And that's the format we need to pass to our prompt! Let's see what this looks like:

In [8]:
import { ChatOpenAI } from "npm:langchain@0.0.201/chat_models/openai";
import { StringOutputParser } from "npm:langchain@0.0.201/schema/output_parser";

const model = new ChatOpenAI({
  modelName: "gpt-3.5-turbo-1106",
});

const retrievalChain = RunnableSequence.from([
  {
    context: documentRetrievalChain,
    question: (input) => input.question,
  },
  answerGenerationPrompt,
  model,
  new StringOutputParser(),
]);

A few things to note: 

- In the `RunnableSequence.from` method, objects are automatically coerced into `RunnableMap`s, so there is no need to use the initializer method.
- Because we want to pass `question` into both the `documentRetrievalChain` to fetch relevant documents as well as the `answerGenerationPrompt`, we add a second property to the `RunnableMap` called `question` that extracts just the `question` field from the original input to the map. This means that the answer generation prompt gets an object with both properties.

Because this pattern is so common, there is a helper called `RunnablePassthrough.assign()` that we can use which adds new properties to a `RunnableMap` while passing through all existing properties. You could thus rewrite the above like this:

In [9]:
import { RunnablePassthrough } from "npm:langchain@0.0.201/runnables";

const retrievalChain = RunnableSequence.from([
  RunnablePassthrough.assign({
    context: documentRetrievalChain,
  }),
  answerGenerationPrompt,
  model,
  new StringOutputParser(),
]);

Finally, let's try it end-to-end!

// RunnableContext might be nice to add here

In [10]:
const answer = await retrievalChain.invoke({
  question: "What are the prerequisites for this course?"
});

console.log(answer);

The prerequisites for this course are familiarity with basic probability and statistics, basic linear algebra, and some programming knowledge, mostly in MATLAB or Octave. Familiarity with random variables, expectation, variance, matrix operations, and vectors is also assumed. Additional review of prerequisites will be provided during the course.


Sweet! That looks pretty good.

This is great, but what if we want to ask a followup question?

In [11]:
const followupAnswer = await retrievalChain.invoke({
  question: "Can you list them in bullet point form?"
});

console.log(followupAnswer);

- Students from various backgrounds including statistics, iCME, synthesis, aero/astro, and MSNE
- Goal of the project is to produce a publishable piece of research in machine learning
- Previous student projects included applications of learning algorithms to control a snake robot, improving learning algorithms, flying autonomous aircraft, computer vision algorithms, Netflix rankings, medical robots, neuroscience, fMRI data analysis, market makings, and more


It listed items in bullet point form, but none of them were prerequisites to the course! 

This occurs because LLMs do not have an innate sense of memory, and since we're not passing in any chat history as context, the LLM doesn't know what "them" referes to. We can update our prompt to take chat history into account as well, but we have a more fundamental problem: our vectorstore needs to return relevant documents too. Here's what happens if we try to query our vectorstore with the current followup question:

In [12]:
const docs = await documentRetrievalChain.invoke({
  question: "Can you list them in bullet point form?"
});

console.log(docs);

<doc>
at me, and that won't show, okay?   
Let's see. I also handed out this — ther e were two handouts I hope most of you have,  
course information handout. So let me just sa y a few words about parts of these. On the  
third page, there's a section that says Online Resources.   
Oh, okay. Louder? Actually, could you turn  up the volume? Testing. Is this better?  
Testing, testing. Okay, cool. Thanks.
</doc>
<doc>
and write out, was learned using one of  these reinforcement learning algorithms.   
Just a word about that: The basic idea behi nd a reinforcement learning algorithm is this  
idea of what's called a reward  function. What we have to  think about is imagine you're  
trying to train a dog. So every time y our dog does something good, you say, "Good dog,"  
and you reward the dog. Every time your dog does something bad, you go, "Bad dog,"  
right? And hopefully, over time, your dog will lear n to do the right things to get more of  
the positive rewards, to get mo re of the 

You can see that we don't get anything relevant to the prerequisites of CS229.

The solution is to dereference the user's question into a rephrased standalone question. How? With an LLM of course!

First, we'll construct a new prompt with a `MessagesPlaceholder` where we can later inject or pass chat history messages:

In [13]:
import { MessagesPlaceholder } from "npm:langchain@0.0.201/prompts";

const REPHRASE_QUESTION_SYSTEM_TEMPLATE = `Using the provided chat history as context, rephrase the following question to be a standalone question that has no external references.

Do not respond with anything other than a rephrased standalone question.`;

const rephraseQuestionChainPrompt = ChatPromptTemplate.fromMessages([
  ["system", REPHRASE_QUESTION_SYSTEM_TEMPLATE],
  new MessagesPlaceholder("history"),
  ["human", "Now, answer the following question:\n{question}"],
]);

And then we'll create a simple chain:

In [14]:
const rephraseQuestionChain = RunnableSequence.from([
  rephraseQuestionChainPrompt,
  new ChatOpenAI({ temperature: 0, modelName: "gpt-3.5-turbo-1106" }),
  new StringOutputParser(),
]);

Let's try running this chain on our followup question. Note that `MessagesPlaceholder` is itself a parameter in our prompt that accepts a list of chat messages:

In [15]:
import { HumanMessage, AIMessage } from "npm:langchain@0.0.201/schema";

const originalQuestion = "What are the prerequisites for this course?";

const originalAnswer = await retrievalChain.invoke({
  question: originalQuestion
});

const chatHistory = [
  new HumanMessage(originalQuestion),
  new AIMessage(originalAnswer),
];

await rephraseQuestionChain.invoke({
  question: "Can you list them in bullet point form?",
  history: chatHistory,
});

[32m"What are the prerequisites for this course?"[39m

Great! That question makes sense on its own. Now, let's put it all together with a new chain that takes chat history as input!

First, here's our document retrieval and formatting chain again:

In [16]:
const convertDocsToString = (documents: Document[]): string => {
  return documents.map((document) => `<doc>\n${document.pageContent}\n</doc>`).join("\n");
};

const documentRetrievalChain = RunnableSequence.from([
  (input) => input.standalone_question,
  retriever,
  convertDocsToString,
]);

To properly track and inject chat history, we're going to use a `MessageHistory` object, then wrap our chain in a manager that will automatically update the history, called `RunnableWithMessageHistory`:

In [17]:
import { RunnableWithMessageHistory } from "npm:langchain@0.0.201/runnables";
import { ChatMessageHistory } from "npm:langchain@0.0.201/memory";

const ANSWER_CHAIN_SYSTEM_TEMPLATE = `You are an experienced researcher, expert at interpreting and answering questions based on provided sources.
Using the below provided context and chat history, answer the user's question to the best of your ability using only the resources provided. Be concise!

<context>
{context}
</context>`;

const answerGenerationChainPrompt = ChatPromptTemplate.fromMessages([
  ["system", ANSWER_CHAIN_SYSTEM_TEMPLATE],
  new MessagesPlaceholder("history"),
  ["human", "Now, answer this question using the previous context and chat history:\n{standalone_question}"]
]);

const messageHistory = new ChatMessageHistory();

const conversationalRetrievalChain = RunnableSequence.from([
  RunnablePassthrough.assign({
    standalone_question: rephraseQuestionChain,
  }),
  RunnablePassthrough.assign({
    context: documentRetrievalChain,
  }),
  answerGenerationChainPrompt,
  new ChatOpenAI({ modelName: "gpt-3.5-turbo" }),
  new StringOutputParser(),
]);

// Maybe simplify with explicit history passing
const conversationalRetrievalChainWithHistory = new RunnableWithMessageHistory({
  runnable: conversationalRetrievalChain,
  getMessageHistory: (_sessionId) => messageHistory,
  inputMessagesKey: "question",
  historyMessagesKey: "history",
});

And let's try it out!

In [18]:
const originalQuestion = "What are the prerequisites for this course?";

const originalAnswer = await conversationalRetrievalChainWithHistory.invoke({
  question: originalQuestion,
}, {
  configurable: { sessionId: "unused" }
});

const finalResult = await conversationalRetrievalChainWithHistory.invoke({
  question: "Can you list them in bullet point form?",
}, {
  configurable: { sessionId: "unused" }
});

console.log(finalResult);

The prerequisites for this course include basic probability and statistics knowledge, as well as familiarity with linear algebra concepts. Students should be familiar with random variables, expectation, variance, matrices, vectors, matrix multiplication, and matrix inverse. Previous undergraduate courses in statistics and linear algebra should be sufficient preparation.


Retrieval is a very deep topic, and there's no one-size fits all approach for loading, splitting, and querying your data. We encourage you to modify the above prompts and parameters for different models and data types.

In the final section, we'll show how to put this retrieval chain into production, including some interactions with web APIs and streaming intermediate steps.