# Lesson 4: Question answering

![](./images/rag_diagram.png)

In [1]:
import "dotenv/config";

[Module: null prototype] { default: {} }

In [2]:
import { loadAndSplitChunks } from "./lib/helpers.ts";

const splitDocs = await loadAndSplitChunks({
    chunkSize: 1536,
    chunkOverlap: 128
});

In [3]:
import { initializeVectorstoreWithDocuments } from "./lib/helpers.ts";

const vectorstore = await initializeVectorstoreWithDocuments({
  documents: splitDocs,
});

In [4]:
const retriever = vectorstore.asRetriever();

# Document retrieval in a chain

In [5]:
import { RunnableSequence } from "langchain/schema/runnable";
import { Document } from "langchain/document";

const convertDocsToString = (documents: Document[]): string => {
  return documents.map((document) => {
    return `<doc>\n${document.pageContent}\n</doc>`
  }).join("\n");
};

/*
{
question: "What is deep learning?"
}
*/

const documentRetrievalChain = RunnableSequence.from([
    (input) => input.question,
    retriever,
    convertDocsToString
]);

In [6]:
const results = await documentRetrievalChain.invoke({
  question: "What are the prerequisites for this course?"
});
console.log(results);

<doc>
course information handout. So let me just say a few words about parts of these. On the 
third page, there's a section that says Online Resources.  
Oh, okay. Louder? Actually, could you turn up the volume? Testing. Is this better? 
Testing, testing. Okay, cool. Thanks.
</doc>
<doc>
of this class will not be very programming intensive, although we will do some 
programming, mostly in either MATLAB or Octave. I'll say a bit more about that later.  
I also assume familiarity with basic probability and statistics. So most undergraduate 
statistics class, like Stat 116 taught here at Stanford, will be more than enough. I'm gonna 
assume all of you know what random variables are, that all of you know what expectation 
is, what a variance or a random variable is. And in case of some of you, it's been a while 
since you've seen some of this material. At some of the discussion sections, we'll actually 
go over some of the prerequisites, sort of as a refresher course under prerequisite cl

# Synthesizing a response

In [7]:
import { ChatPromptTemplate } from "langchain/prompts";

const TEMPLATE_STRING = `You are an experienced researcher, 
expert at interpreting and answering questions based on provided sources.
Using the provided context, answer the user's question 
to the best of your ability using only the resources provided. 
Be verbose!

<context>

{context}

</context>

Now, answer this question using the above context:

{question}`;

const answerGenerationPrompt = ChatPromptTemplate.fromTemplate(
    TEMPLATE_STRING
);

In [8]:
import { RunnableMap } from "langchain/schema/runnable";

const runnableMap = RunnableMap.from({
  context: documentRetrievalChain,
  question: (input) => input.question,
});

await runnableMap.invoke({
    question: "What are the prerequisites for this course?"
})

{
  question: [32m"What are the prerequisites for this course?"[39m,
  context: [32m"<doc>\n"[39m +
    [32m"course information handout. So let me just say a few words about parts of these. On the \n"[39m +
    [32m"third"[39m... 3063 more characters
}

# Augmented generation

In [9]:
import { ChatOpenAI } from "langchain/chat_models/openai";
import { StringOutputParser } from "langchain/schema/output_parser";

const model = new ChatOpenAI({
    modelName: "gpt-3.5-turbo-1106"
});

In [10]:
const retrievalChain = RunnableSequence.from([
  {
    context: documentRetrievalChain,
    question: (input) => input.question,
  },
  answerGenerationPrompt,
  model,
  new StringOutputParser(),
]);

In [11]:
const answer = await retrievalChain.invoke({
  question: "What are the prerequisites for this course?"
});

console.log(answer);

Based on the provided context, the prerequisites for this course include familiarity with basic probability and statistics, as well as basic linear algebra. The instructor mentions that most undergraduate statistics classes and linear algebra courses at Stanford are more than enough to meet these prerequisites. Additionally, students are expected to know about random variables, expectations, variances, matrices, vectors, matrix multiplication, and matrix inverses. The instructor also mentions that knowledge of big-O notation and understanding of data structures like linked lists, queues, and binary trees is important for the course, rather than a specific programming language like C or Java. The instructor also alludes to the fact that some material will be reviewed in the discussion sections for those who may need a refresher.


In [12]:
const followupAnswer = await retrievalChain.invoke({
  question: "Can you list them in bullet point form?"
});

console.log(followupAnswer);

Based on the provided context, the question seems to be asking for a list of items. Unfortunately, the provided context does not include any specific items or topics to list in bullet point form. Therefore, it is not possible to create a bullet point list based solely on the provided context. If there are specific items or topics you would like listed in bullet point form, please provide that information and I can certainly assist with creating the list.


In [13]:
const docs = await documentRetrievalChain.invoke({
  question: "Can you list them in bullet point form?"
});

console.log(docs);

<doc>
course information handout. So let me just say a few words about parts of these. On the 
third page, there's a section that says Online Resources.  
Oh, okay. Louder? Actually, could you turn up the volume? Testing. Is this better? 
Testing, testing. Okay, cool. Thanks.
</doc>
<doc>
into four major sections. We're gonna talk about four major topics in this class, the first 
of which is supervised learning. So let me give you an example of that.  
So suppose you collect a data set of housing prices. And one of the TAs, Dan Ramage, 
actually collected a data set for me last week to use in the example later. But suppose that 
you go to collect statistics about how much houses cost in a certain geographic area. And 
Dan, the TA, collected data from housing prices in Portland, Oregon. So what you can do 
is let's say plot the square footage of the house against the list price of the house, right, so 
you collect data on a bunch of houses. And let's say you get a data set like this wit

# Adding history

In [14]:
import { MessagesPlaceholder } from "langchain/prompts";

const REPHRASE_QUESTION_SYSTEM_TEMPLATE = 
  `Given the following conversation and a follow up question, 
rephrase the follow up question to be a standalone question.`;

const rephraseQuestionChainPrompt = ChatPromptTemplate.fromMessages([
  ["system", REPHRASE_QUESTION_SYSTEM_TEMPLATE],
  new MessagesPlaceholder("history"),
  [
    "human", 
    "Rephrase the following question as a standalone question:\n{question}"
  ],
]);

In [15]:
const rephraseQuestionChain = RunnableSequence.from([
      rephraseQuestionChainPrompt,
      new ChatOpenAI({ temperature: 0.1, modelName: "gpt-3.5-turbo-1106" }),
      new StringOutputParser(),
])

In [16]:
import { HumanMessage, AIMessage } from "langchain/schema";

const originalQuestion = "What are the prerequisites for this course?";

const originalAnswer = await retrievalChain.invoke({
  question: originalQuestion
});

console.log(originalAnswer);

The prerequisites for this course, as mentioned by the instructor, include familiarity with basic probability and statistics, as well as basic linear algebra. It is assumed that students have knowledge of random variables, expectation, variance, matrices, vectors, matrix multiplication, matrix inverse, and possibly eigenvectors of a matrix. The instructor also mentions that most undergraduate statistics and linear algebra courses, such as Stat 116, Math 51, 103, 113, or CS205 at Stanford, will provide the necessary background for the course. Additionally, the ability to understand big-O notation and knowledge of data structures like linked lists, queues, or binary treatments is more important than specific programming language knowledge. The instructor also emphasizes that for those who may need a refresher on these topics, there will be discussion sections to review the prerequisites.


In [17]:
const chatHistory = [
      new HumanMessage(originalQuestion),
      new AIMessage(originalAnswer),
];

await rephraseQuestionChain.invoke({
    question: "Can you list them in bullet point form?",
    history: chatHistory,
});

[32m"Could you please list the prerequisites for this course in bullet point form?"[39m

# Putting it all together

In [18]:
const convertDocsToString = (documents: Document[]): string => {
  return documents.map((document) => `<doc>\n${document.pageContent}\n</doc>`).join("\n");
};

const documentRetrievalChain = RunnableSequence.from([
  (input) => input.standalone_question,
  retriever,
  convertDocsToString,
]);

In [19]:
const ANSWER_CHAIN_SYSTEM_TEMPLATE = `You are an experienced researcher, 
expert at interpreting and answering questions based on provided sources.
Using the below provided context and chat history, 
answer the user's question to the best of 
your ability 
using only the resources provided. Be verbose!

<context>
{context}
</context>`;

const answerGenerationChainPrompt = ChatPromptTemplate.fromMessages([
  ["system", ANSWER_CHAIN_SYSTEM_TEMPLATE],
  new MessagesPlaceholder("history"),
  [
    "human", 
    "Now, answer this question using the previous context and chat history:\n{standalone_question}"
  ]
]);

In [20]:
import { HumanMessage, AIMessage } from "langchain/schema";
await answerGenerationChainPrompt.formatMessages({
  context: "fake retrieved content",
  standalone_question: "Why is the sky blue?",
  history: [
    new HumanMessage("How are you?"),
    new AIMessage("Fine, thank you!")
  ]
});

[
  SystemMessage {
    lc_serializable: [33mtrue[39m,
    lc_kwargs: {
      content: [32m"You are an experienced researcher, \n"[39m +
        [32m"expert at interpreting and answering questions based on provided"[39m... 210 more characters,
      additional_kwargs: {}
    },
    lc_namespace: [ [32m"langchain_core"[39m, [32m"messages"[39m ],
    content: [32m"You are an experienced researcher, \n"[39m +
      [32m"expert at interpreting and answering questions based on provided"[39m... 210 more characters,
    name: [90mundefined[39m,
    additional_kwargs: {}
  },
  HumanMessage {
    lc_serializable: [33mtrue[39m,
    lc_kwargs: { content: [32m"How are you?"[39m, additional_kwargs: {} },
    lc_namespace: [ [32m"langchain_core"[39m, [32m"messages"[39m ],
    content: [32m"How are you?"[39m,
    name: [90mundefined[39m,
    additional_kwargs: {}
  },
  AIMessage {
    lc_serializable: [33mtrue[39m,
    lc_kwargs: { content: [32m"Fine, thank you!"[39m

In [21]:
import { RunnablePassthrough } from "langchain/runnables";

const conversationalRetrievalChain = RunnableSequence.from([
  RunnablePassthrough.assign({
    standalone_question: rephraseQuestionChain,
  }),
  RunnablePassthrough.assign({
    context: documentRetrievalChain,
  }),
  answerGenerationChainPrompt,
  new ChatOpenAI({ modelName: "gpt-3.5-turbo" }),
  new StringOutputParser(),
]);

In [22]:
import { RunnableWithMessageHistory } from "langchain/runnables";
import { ChatMessageHistory } from "langchain/stores/message/in_memory";

In [23]:
const messageHistory = new ChatMessageHistory();

const finalRetrievalChain = new RunnableWithMessageHistory({
  runnable: conversationalRetrievalChain,
  getMessageHistory: (_sessionId) => messageHistory,
  historyMessagesKey: "history",
  inputMessagesKey: "question",
});

In [24]:
const originalQuestion = "What are the prerequisites for this course?";

const originalAnswer = await finalRetrievalChain.invoke({
  question: originalQuestion,
}, {
  configurable: { sessionId: "test" }
});

const finalResult = await finalRetrievalChain.invoke({
  question: "Can you list them in bullet point form?",
}, {
  configurable: { sessionId: "test" }
});

console.log(finalResult);

Sure, here are the prerequisites for the course listed in bullet point form:

- Familiarity with basic probability and statistics
- Familiarity with basic linear algebra
- Some programming experience

It is important to refer to the course information handout for more detailed information on these prerequisites.


https://smith.langchain.com/public/fca11abd-c0ec-456f-800e-6edfaf6bcf68/r