First, we need to run the following command to load the relevant environment variables

In [None]:
import os
import dotenv
dotenv.load_dotenv()

The first step is to collect and load your data — For this example, we will use President Biden’s State of the Union Address from 2022 as additional context. The raw text document is available in LangChain’s GitHub repository. To load the data, we can use one of LangChain’s many built-in DocumentLoaders. A Document is a dictionary with text and metadata. To load text, we will use LangChain’s TextLoader.

In [None]:
import requests
from langchain.document_loaders import TextLoader

url = "https://raw.githubusercontent.com/langchain-ai/langchain/master/docs/docs/modules/state_of_the_union.txt"
res = requests.get(url)
with open("state_of_the_union.txt", "w") as f:
    f.write(res.text)

loader = TextLoader('./state_of_the_union.txt')
documents = loader.load()

Next, chunk the documents — Because the Document, in its original state, is too long to fit into the LLM’s context window, we need to chunk it into smaller pieces. LangChain comes with many built-in text splitters for this purpose. For this simple example, we can use the CharacterTextSplitter with a chunk_size of about 500 and a chunk_overlap of 50 to preserve text continuity between the chunks.

In [None]:
from langchain.text_splitter import CharacterTextSplitter
text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = text_splitter.split_documents(documents)

Lastly, let's embed and store the chunks — To enable semantic search across the text chunks, we need to generate the vector embeddings for each chunk and then store them together with their embeddings. To generate the vector embeddings, we can use the OpenAI embedding model, and to store them, we can use the Weaviate vector database. By calling .from_documents() the vector database is automatically populated with the chunks.

In [None]:
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Weaviate
import weaviate
from weaviate.embedded import EmbeddedOptions

client = weaviate.Client(
  embedded_options = EmbeddedOptions()
)

vectorstore = Weaviate.from_documents(
    client = client,    
    documents = chunks,
    embedding = OpenAIEmbeddings(),
    by_text = False
)

Once the vector database is populated, we can define it as the retriever component, which fetches the additional context based on the semantic similarity between the user query and the embedded chunks.

In [None]:
retriever = vectorstore.as_retriever()

Next, to augment the prompt with the additional context, we need to prepare a prompt template. The prompt can be easily customized from a prompt template, as shown below.

In [None]:
from langchain.prompts import ChatPromptTemplate

template = """You are an assistant for question-answering tasks. 
Use the following pieces of retrieved context to answer the question. 
If you don't know the answer, just say that you don't know. 
Use three sentences maximum and keep the answer concise.
Question: {question} 
Context: {context} 
Answer:
"""
prompt = ChatPromptTemplate.from_template(template)

print(prompt)

Finally, we can build a chain for the RAG pipeline, chaining together the retriever, the prompt template and the LLM. Once the RAG chain is defined, we can invoke it.

In [None]:
from langchain.chat_models import ChatOpenAI
from langchain.schema.runnable import RunnablePassthrough
from langchain.schema.output_parser import StrOutputParser

llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

rag_chain = (
    {"context": retriever,  "question": RunnablePassthrough()} 
    | prompt 
    | llm
    | StrOutputParser() 
)

query = "What did the president say about Justice Breyer"
rag_chain.invoke(query)