# Deployment

We've designed a nice conversational retrieval chain over our docs - now let's put it into production! We'll go over how our newly constructed chain will interface with core native JavaScript web primitives used in popular frameworks like Express and Next.js: the global `Response` object.

In [1]:
import "npm:dotenv/config";

[Module: null prototype] { default: {} }

Let's go back through our document preparation.

In [2]:
// Peer dependency
import * as parse from "npm:pdf-parse";
import { PDFLoader } from "npm:langchain@0.0.202/document_loaders/fs/pdf";
import { RecursiveCharacterTextSplitter } from "npm:langchain@0.0.202/text_splitter";
import { MemoryVectorStore } from "npm:langchain@0.0.202/vectorstores/memory";
import { OpenAIEmbeddings } from "npm:langchain@0.0.202/embeddings/openai";

const loader = new PDFLoader("./static/docs/MachineLearning-Lecture01.pdf");

const rawCS229Docs = await loader.load();

const splitter = new RecursiveCharacterTextSplitter({
  chunkSize: 1536,
  chunkOverlap: 128,
});

const splitDocs = await splitter.splitDocuments(rawCS229Docs);

const embeddings = new OpenAIEmbeddings();

const vectorstore = new MemoryVectorStore(embeddings);

await vectorstore.addDocuments(splitDocs);

In [3]:
const retriever = vectorstore.asRetriever();

Let's recreate the chain, starting with our familiar document retrieval component:

In [4]:
import { RunnableSequence } from "npm:langchain@0.0.202/runnables";

const convertDocsToString = (documents: Document[]): string => {
  return documents.map((document) => `<doc>\n${document.pageContent}\n</doc>`).join("\n");
};

const documentRetrievalChain = RunnableSequence.from([
  (input) => input.standalone_question,
  retriever,
  convertDocsToString,
]);

Next, the rephrase question step to handle followup questions:

In [5]:
import { ChatOpenAI } from "npm:langchain@0.0.202/chat_models/openai";
import { StringOutputParser } from "npm:langchain@0.0.202/schema/output_parser";
import { ChatPromptTemplate, MessagesPlaceholder } from "npm:langchain@0.0.202/prompts";

const REPHRASE_QUESTION_SYSTEM_TEMPLATE = `Using the provided chat history as context, rephrase the following question to be a standalone question that has no external references.

Do not respond with anything other than a rephrased standalone question.`;

const rephraseQuestionChainPrompt = ChatPromptTemplate.fromMessages([
  ["system", REPHRASE_QUESTION_SYSTEM_TEMPLATE],
  new MessagesPlaceholder("history"),
  ["human", "Now, answer the following question:\n{question}"],
]);

const rephraseQuestionChain = RunnableSequence.from([
  rephraseQuestionChainPrompt,
  new ChatOpenAI({ temperature: 0, modelName: "gpt-3.5-turbo-1106" }),
  new StringOutputParser(),
]);

And then, the final answer generation chain:

In [6]:
const ANSWER_CHAIN_SYSTEM_TEMPLATE = `You are an experienced researcher, expert at interpreting and answering questions based on provided sources.
Using the below provided context and chat history, answer the user's question to the best of your ability using only the resources provided. Be concise!

<context>
{context}
</context>`;

const answerGenerationChainPrompt = ChatPromptTemplate.fromMessages([
  ["system", ANSWER_CHAIN_SYSTEM_TEMPLATE],
  new MessagesPlaceholder("history"),
  ["human", "Now, answer this question using the previous context and chat history:\n{standalone_question}"]
]);

Now, before the final construction of our chain, we need to a make a slight change - the global JavaScript Web response object allows you to pass a stream, but it must be a binary stream instead of a stream of string chunks.

In [7]:
// const responseObject = new Response(stream, {
//   headers: {}
// });

We will use a different output parser to format our chain's response the way that this native object expects:

In [8]:
import { RunnablePassthrough } from "npm:langchain@0.0.202/runnables";

const conversationalRetrievalChain = RunnableSequence.from([
  RunnablePassthrough.assign({
    standalone_question: rephraseQuestionChain,
  }),
  RunnablePassthrough.assign({
    context: documentRetrievalChain,
  }),
  answerGenerationChainPrompt,
  new ChatOpenAI({ modelName: "gpt-3.5-turbo" }),
]);

await conversationalRetrievalChain.invoke({
  question: "What are the prerequisites for this course?",
  history: []
});

AIMessage {
  lc_serializable: [33mtrue[39m,
  lc_kwargs: {
    content: [32m"The requirements for this course include familiarity with basic probability and statistics, as well "[39m... 365 more characters,
    additional_kwargs: { function_call: [90mundefined[39m, tool_calls: [90mundefined[39m }
  },
  lc_namespace: [ [32m"langchain_core"[39m, [32m"messages"[39m ],
  content: [32m"The requirements for this course include familiarity with basic probability and statistics, as well "[39m... 365 more characters,
  name: [90mundefined[39m,
  additional_kwargs: { function_call: [90mundefined[39m, tool_calls: [90mundefined[39m }
}

In [9]:
import { HttpResponseOutputParser } from "npm:langchain@0.0.202/output_parsers";
import { RunnableWithMessageHistory } from "npm:langchain@0.0.202/runnables"; 
import { ChatMessageHistory } from "npm:langchain@0.0.202/memory";

const httpResponseOutputParser = new HttpResponseOutputParser({
  contentType: "text/plain"
});

const messageHistory = new ChatMessageHistory();

const conversationalRetrievalChainWithHistory = new RunnableWithMessageHistory({
  runnable: conversationalRetrievalChain,
  getMessageHistory: (_sessionId) => messageHistory,
  inputMessageKey: "question",
  historyMessagesKey: "history"
}).pipe(httpResponseOutputParser);

In [10]:
const port = 8080;

const handler = async (request: Request): Response => {
  const body = await request.json();
  const question = body.question;
  const sessionId = body.session_id;
  const stream = await conversationalRetrievalChainWithHistory.stream({
    question
  }, { configurable: { sessionId } });
  
  return new Response(stream, {
    status: 200,
    headers: {
      "Content-Type": "text/plain"
    }
  });
};

Deno.serve({ port }, handler);

Listening on http://localhost:8080/


{
  finished: Promise { [36m<pending>[39m },
  shutdown: [36m[AsyncFunction: shutdown][39m,
  ref: [36m[Function: ref][39m,
  unref: [36m[Function: unref][39m
}

Now that our server is live, let's try calling it!

In [11]:
const decoder = new TextDecoder();

// readChunks() reads from the provided reader and yields the results into an async iterable
function readChunks(reader) {
  return {
    async* [Symbol.asyncIterator]() {
      let readResult = await reader.read();
      while (!readResult.done) {
        yield decoder.decode(readResult.value);
        readResult = await reader.read();
      }
    },
  };
}

In [12]:
const response = await fetch("http://localhost:8080", {
  method: "POST",
  body: JSON.stringify({
    question: "What are the prerequisites for this course?",
    session_id: "1",
  }),
  headers: {
    "content-type": "application/json"
  }
});

const reader = response.body?.getReader();

for await (const chunk of readChunks(reader)) {
  console.log("CHUNK:", chunk);
}