# Lesson 4: Question answering

![](./images/rag_diagram.png)

In [1]:
import "dotenv/config";

[Module: null prototype] { default: {} }

In [2]:
import { loadAndSplitChunks } from "./lib/helpers.ts";

const splitDocs = await loadAndSplitChunks({
    chunkSize: 1536,
    chunkOverlap: 128
});

In [3]:
import { initializeVectorstoreWithDocuments } from "./lib/helpers.ts";

const vectorstore = await initializeVectorstoreWithDocuments({
  documents: splitDocs,
});

In [4]:
const retriever = vectorstore.asRetriever();

# Document retrieval in a chain

In [5]:
import { RunnableSequence } from "@langchain/core/runnables";
import { Document } from "@langchain/core/documents";

const convertDocsToString = (documents: Document[]): string => {
  return documents.map((document) => {
    return `<doc>\n${document.pageContent}\n</doc>`
  }).join("\n");
};

/*
{
question: "What is deep learning?"
}
*/

const documentRetrievalChain = RunnableSequence.from([
    (input) => input.question,
    retriever,
    convertDocsToString
]);

In [6]:
const results = await documentRetrievalChain.invoke({
  question: "What are the prerequisites for this course?"
});
console.log(results);

<doc>
course information handout. So let me just say a few words about parts of these. On the 
third page, there's a section that says Online Resources.  
Oh, okay. Louder? Actually, could you turn up the volume? Testing. Is this better? 
Testing, testing. Okay, cool. Thanks.
</doc>
<doc>
of this class will not be very programming intensive, although we will do some 
programming, mostly in either MATLAB or Octave. I'll say a bit more about that later.  
I also assume familiarity with basic probability and statistics. So most undergraduate 
statistics class, like Stat 116 taught here at Stanford, will be more than enough. I'm gonna 
assume all of you know what random variables are, that all of you know what expectation 
is, what a variance or a random variable is. And in case of some of you, it's been a while 
since you've seen some of this material. At some of the discussion sections, we'll actually 
go over some of the prerequisites, sort of as a refresher course under prerequisite cl

# Synthesizing a response

In [7]:
import { ChatPromptTemplate } from "@langchain/core/prompts";

const TEMPLATE_STRING = `You are an experienced researcher, 
expert at interpreting and answering questions based on provided sources.
Using the provided context, answer the user's question 
to the best of your ability using only the resources provided. 
Be verbose!

<context>

{context}

</context>

Now, answer this question using the above context:

{question}`;

const answerGenerationPrompt = ChatPromptTemplate.fromTemplate(
    TEMPLATE_STRING
);

In [8]:
import { RunnableMap } from "@langchain/core/runnables";

const runnableMap = RunnableMap.from({
  context: documentRetrievalChain,
  question: (input) => input.question,
});

await runnableMap.invoke({
    question: "What are the prerequisites for this course?"
})

{
  question: [32m"What are the prerequisites for this course?"[39m,
  context: [32m"<doc>\n"[39m +
    [32m"course information handout. So let me just say a few words about parts of these. On the \n"[39m +
    [32m"third page, there's a section that says Online Resources.  \n"[39m +
    [32m"Oh, okay. Louder? Actually, could you turn up the volume? Testing. Is this better? \n"[39m +
    [32m"Testing, testing. Okay, cool. Thanks.\n"[39m +
    [32m"</doc>\n"[39m +
    [32m"<doc>\n"[39m +
    [32m"of this class will not be very programming intensive, although we will do some \n"[39m +
    [32m"programming, mostly in either MATLAB or Octave. I'll say a bit more about that later.  \n"[39m +
    [32m"I also assume familiarity with basic probability and statistics. So most undergraduate \n"[39m +
    [32m"statistics class, like Stat 116 taught here at Stanford, will be more than enough. I'm gonna \n"[39m +
    [32m"assume all of you know what random variables are, th

# Augmented generation

In [9]:
import { ChatOpenAI } from "@langchain/openai";
import { StringOutputParser } from "@langchain/core/output_parsers";

const model = new ChatOpenAI({
    modelName: "gpt-3.5-turbo-1106"
});

In [10]:
const retrievalChain = RunnableSequence.from([
  {
    context: documentRetrievalChain,
    question: (input) => input.question,
  },
  answerGenerationPrompt,
  model,
  new StringOutputParser(),
]);

In [11]:
const answer = await retrievalChain.invoke({
  question: "What are the prerequisites for this course?"
});

console.log(answer);

The prerequisites for this course include familiarity with basic probability and statistics, as well as basic linear algebra. The instructor assumes that students are familiar with random variables, expectation, variance, matrices, vectors, matrix multiplication, matrices inverses, and possibly eigenvectors. The instructor also mentioned that most undergraduate statistics classes and linear algebra courses would be sufficient preparation for the course. Additionally, there may be refresher courses offered during discussion sections for those who need to brush up on these topics.


In [12]:
const followupAnswer = await retrievalChain.invoke({
  question: "Can you list them in bullet point form?"
});

console.log(followupAnswer);

Based on the provided context, the information does not give a specific list or details to be outlined in bullet point form. It primarily consists of a course lecture discussing supervised learning, machine learning applications, and logistics. If you have a specific topic or information you'd like to list in bullet points, please provide additional context or details for me to work with.


In [13]:
const docs = await documentRetrievalChain.invoke({
  question: "Can you list them in bullet point form?"
});

console.log(docs);

<doc>
course information handout. So let me just say a few words about parts of these. On the 
third page, there's a section that says Online Resources.  
Oh, okay. Louder? Actually, could you turn up the volume? Testing. Is this better? 
Testing, testing. Okay, cool. Thanks.
</doc>
<doc>
into four major sections. We're gonna talk about four major topics in this class, the first 
of which is supervised learning. So let me give you an example of that.  
So suppose you collect a data set of housing prices. And one of the TAs, Dan Ramage, 
actually collected a data set for me last week to use in the example later. But suppose that 
you go to collect statistics about how much houses cost in a certain geographic area. And 
Dan, the TA, collected data from housing prices in Portland, Oregon. So what you can do 
is let's say plot the square footage of the house against the list price of the house, right, so 
you collect data on a bunch of houses. And let's say you get a data set like this wit