# RAG On JFK Speeches: Part 2


__[1. Introduction](#first-bullet)__

__[2. Retriving Documents With Contextual Search](#second-bullet)__

__[3. Building A RAG Pipeline](#third-bullet)__

__[4. A CI/CD Pipeline For RAG](#fourth-bullet)__

__[4. Deploying And Monitoring A RAG Application](#fifth-bullet)__

__[5. Next Steps](#sixth-bullet)__



### 1. Introduction <a class="anchor" id="first-bullet"></a>
------------------------------

In this post, I will continue from my last post 

In [2]:
from langchain_core.documents import Document

In [21]:
# LangChain
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_pinecone import PineconeVectorStore
from langchain.chains.retrieval import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.output_parsers import StrOutputParser

from langchain import hub

# Pinecone VectorDB
from pinecone import Pinecone
from pinecone import ServerlessSpec

import os

# API Keys
from dotenv import load_dotenv
load_dotenv()


True

## 2. Retriving Documents With Contextual Search <a class="anchor" id="second-bullet"></a>

Create the connection and list out the indices

In [4]:
index_name = "prez-speeches"

pc = Pinecone(api_key=os.getenv("PINECONE_API_KEY"))
pc.list_indexes()

{'indexes': [{'deletion_protection': 'disabled',
              'dimension': 1536,
              'host': 'prez-speeches-2307pwa.svc.aped-4627-b74a.pinecone.io',
              'metric': 'cosine',
              'name': 'prez-speeches',
              'spec': {'serverless': {'cloud': 'aws', 'region': 'us-east-1'}},
              'status': {'ready': True, 'state': 'Ready'}}]}

Instantiate the embeddings

In [5]:
embedding = OpenAIEmbeddings(model='text-embedding-ada-002')


Create the initial connection to the Vector database:

In [6]:
vectordb = PineconeVectorStore(
                    pinecone_api_key=os.getenv("PINECONE_API_KEY"),
                    embedding=embedding,
                    index_name=index_name
)

In [7]:
question = "How did President Kennedy feel about the Berlin Wall?"

In [8]:
results = await vectordb.asimilarity_search(query=question, k=5)

In [9]:
results

 Document(id='0fa5431f-a374-429e-a622-a1ed1c2b0a21', metadata={'filename': 'berlin-w-germany-rudolph-wilde-platz-19630626', 'seq_num': 1.0, 'source': 'gs://prezkennedyspeches/berlin-w-germany-rudolph-wilde-platz-19630626.json', 'title': 'Remarks of President John F. Kennedy at the Rudolph Wilde Platz, Berlin, June 26, 1963', 'url': 'https://www.jfklibrary.org//archives/other-resources/john-f-kennedy-speeches/berlin-w-germany-rudolph-wilde-platz-19630626'}, page_content='Listen to speech. \xa0\xa0 View related documents. \nPresident John F. Kennedy\nWest Berlin\nJune 26, 1963\n[This version is published in the Public Papers of the Presidents: John F. Kennedy, 1963. Both the text and the audio versions omit the words of the German translator. The audio file was edited by the White House Signal Agency (WHSA) shortly after the speech was recorded. The WHSA was charged with recording only the words of the President. The Kennedy Library has an audiotape of a network broadcast of the full spe

## 3. Building A RAG Pipeline <a class="anchor" id="third-bullet"></a>
--------------------------------

Now we can use the vector database as a [retriever](https://python.langchain.com/docs/integrations/retrievers/). A retriever is a special Langchain [Runnable](https://python.langchain.com/api_reference/core/runnables.html) object that takes in a string (query) and returns a list of [Documents](https://python.langchain.com/api_reference/core/documents/langchain_core.documents.base.Document.html). We can see this in action,

In [14]:
retriever = vectordb.as_retriever()
print(type(retriever))

<class 'langchain_core.vectorstores.base.VectorStoreRetriever'>


In [17]:
documents = retriever.invoke(input=question)
documents

 Document(id='0fa5431f-a374-429e-a622-a1ed1c2b0a21', metadata={'filename': 'berlin-w-germany-rudolph-wilde-platz-19630626', 'seq_num': 1.0, 'source': 'gs://prezkennedyspeches/berlin-w-germany-rudolph-wilde-platz-19630626.json', 'title': 'Remarks of President John F. Kennedy at the Rudolph Wilde Platz, Berlin, June 26, 1963', 'url': 'https://www.jfklibrary.org//archives/other-resources/john-f-kennedy-speeches/berlin-w-germany-rudolph-wilde-platz-19630626'}, page_content='Listen to speech. \xa0\xa0 View related documents. \nPresident John F. Kennedy\nWest Berlin\nJune 26, 1963\n[This version is published in the Public Papers of the Presidents: John F. Kennedy, 1963. Both the text and the audio versions omit the words of the German translator. The audio file was edited by the White House Signal Agency (WHSA) shortly after the speech was recorded. The WHSA was charged with recording only the words of the President. The Kennedy Library has an audiotape of a network broadcast of the full spe

Next lets talk about our prompt for rag. I used the classic [rlm/rag-prompt](https://smith.langchain.com/hub/rlm/rag-prompt) from [LangSmith](https://www.langchain.com/langsmith). I couldn't use the original one as the function [create_retrieval_chain](https://python.langchain.com/api_reference/langchain/chains/langchain.chains.retrieval.create_retrieval_chain.html) expects the human input to be a variable `input` while the original prompt has the input be `question`. The whole prompt is,

In [16]:
from langchain.prompts import PromptTemplate

template = """You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: {input} 
Context: {context} 
Answer:
"""

prompt = PromptTemplate(
    template=template,
    input_variables=["input", "context"],
)

I can now give an example of how to use the prompt with the documents retrieved from Pinecone and the question from the user.

In [25]:
print(
    prompt.invoke({
        "input": question,
        "context": documents
    }).text
)

You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: How did President Kennedy feel about the Berlin Wall? 
Answer:



We'll use this more later.

Now we'll move on to create our LLM Chat Model as this will be needed to write the response from the context and query into `Answer` above.

In [26]:
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

The LLM will be used as the generative part RAG pipeline in a function called [create_stuff_documents_chain](https://python.langchain.com/api_reference/langchain/chains/langchain.chains.combine_documents.stuff.create_stuff_documents_chain.html). We'll call this the `generative_chain`:

In [27]:
generate_chain = create_stuff_documents_chain(llm=llm, prompt=prompt)

We can see the Runnable and the components of the chain:

In [258]:
print(stuff_documents_chain)

bound=RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableLambda(format_docs)
}), kwargs={}, config={'run_name': 'format_inputs'}, config_factories=[])
| PromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, template="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: {input} \nContext: {context} \nAnswer:\n")
| ChatOpenAI(client=<openai.resources.chat.completions.completions.Completions object at 0x168f11890>, async_client=<openai.resources.chat.completions.completions.AsyncCompletions object at 0x168a2e010>, root_client=<openai.OpenAI object at 0x169946590>, root_async_client=<openai.AsyncOpenAI object at 0x168f13310>, model_name='gpt-4o-mini', temperature=0.0, model_kwargs={}, openai_api_key=SecretStr('**********'))
| StrOutputParser() kwa

Now we can can call it using the invoke function and see the answer. We can see that the chain takes in the prompt, passes to the LLM and then the String outpur parser, so we expect to obtain a string as a return type.

In [29]:
answer = generate_chain.invoke(
       {
        'context': documents,
        "input": question
      }
)

In [30]:
print(answer)

President Kennedy viewed the Berlin Wall as a significant symbol of the failures of the Communist system and an offense against humanity, as it separated families and divided people. He expressed pride in the resilience of West Berlin and emphasized the importance of freedom and the right to make choices. Kennedy's speeches reflected a commitment to supporting the people of Berlin and a broader struggle for freedom worldwide.


Now we can put this all together as a RAG chain by passing the Pinecone Vector database retriever and the generative chain. The retriever will take in the input question and perform similarity search and return the documents. These documents along with the input question will be passed to the `generate_chain` to return the output. The full RAG chain is below:

In [31]:
rag_chain = create_retrieval_chain(
                    retriever=retriever, 
                    combine_docs_chain=generate_chain)

Now we can see prompts:

In [32]:
rag_chain.get_prompts()

[PromptTemplate(input_variables=['page_content'], input_types={}, partial_variables={}, template='{page_content}'),
 PromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, template="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: {input} \nContext: {context} \nAnswer:\n")]

Now we can test his out:

In [33]:
response = rag_chain.invoke({"input": question})

In [34]:
response

{'input': 'How did President Kennedy feel about the Berlin Wall?',
  Document(id='0fa5431f-a374-429e-a622-a1ed1c2b0a21', metadata={'filename': 'berlin-w-germany-rudolph-wilde-platz-19630626', 'seq_num': 1.0, 'source': 'gs://prezkennedyspeches/berlin-w-germany-rudolph-wilde-platz-19630626.json', 'title': 'Remarks of President John F. Kennedy at the Rudolph Wilde Platz, Berlin, June 26, 1963', 'url': 'https://www.jfklibrary.org//archives/other-resources/john-f-kennedy-speeches/berlin-w-germany-rudolph-wilde-platz-19630626'}, page_content='Listen to speech. \xa0\xa0 View related documents. \nPresident John F. Kennedy\nWest Berlin\nJune 26, 1963\n[This version is published in the Public Papers of the Presidents: John F. Kennedy, 1963. Both the text and the audio versions omit the words of the German translator. The audio file was edited by the White House Signal Agency (WHSA) shortly after the speech was recorded. The WHSA was charged with recording only the words of the President. The Ken

In [None]:
The response will be a dictionary

In [248]:
references = [(doc.metadata["title"],
               doc.page_content, doc.metadata["url"]) 
               for doc in response['context']]

In [249]:
references

[('Radio and Television Report to the American People on the Berlin Crisis, July 25, 1961',
  'https://www.jfklibrary.org//archives/other-resources/john-f-kennedy-speeches/berlin-crisis-19610725'),
 ('Remarks of President John F. Kennedy at the Rudolph Wilde Platz, Berlin, June 26, 1963',
  'Listen to speech. \xa0\xa0 View related documents. \nPresident John F. Kennedy\nWest Berlin\nJune 26, 1963\n[This version is published in the Public Papers of the Presidents: John F. Kennedy, 1963. Both the text and the audio versions omit the words of the German translator. The audio file was edited by the White House Signal Agency (WHSA) shortly after the speech was recorded. The WHSA was charged with recording only the words of the President. The Kennedy Library has an audiotape of a network broadcast of the full speech, with the translator\'s words, and a journalist\'s commentary. Because of copyright restrictions, it is only available for listening at the Library.]\nI am proud to come to this 

In [264]:
from langchain_core.load import dumpd, dumps, load, loads

string_representation = dumps(rag_chain, pretty=True)

In [265]:
import json

with open("rag_chain.json", "w") as fp:
    json.dump(string_representation, fp)

In [266]:
with open("rag_chain.json", "r") as fp:
    chain = loads(json.load(fp), secrets_map={"OPENAI_API_KEY": os.getenv("OPENAI_API_KEY"),
                                             "PINECONE_API_KEY": os.getenv("PINECONE_API_KEY")})

NotImplementedError: Trying to load an object that doesn't implement serialization: {'lc': 1, 'type': 'not_implemented', 'id': ['langchain_core', 'runnables', 'base', 'RunnableLambda'], 'repr': "RunnableLambda(lambda x: x['input'])"}

## 4. A CI/CD Pipeline For RAG <a class="anchor" id="fourth-bullet"></a>
-------------------

## 5. Deploying A RAG Application <a class="anchor" id="fifth-bullet"></a>
-------------------

## 6. Conclusions  <a class="anchor" id="sixth-bullet"></a>
-------------

In [75]:
from langchain_core.output_parsers import PydanticOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_openai import OpenAI
from pydantic import BaseModel, Field, model_validator

model = OpenAI(model_name="gpt-3.5-turbo-instruct", temperature=0.0)


# Define your desired data structure.
class Joke(BaseModel):
    setup: str = Field(description="question to set up a joke")
    punchline: str = Field(description="answer to resolve the joke")

    # You can add custom validation logic easily with Pydantic.
    @model_validator(mode="before")
    @classmethod
    def question_ends_with_question_mark(cls, values: dict) -> dict:
        setup = values.get("setup")
        if setup and setup[-1] != "?":
            raise ValueError("Badly formed question!")
        return values


# Set up a parser + inject instructions into the prompt template.
parser = PydanticOutputParser(pydantic_object=Joke)

prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

# And a query intended to prompt a language model to populate the data structure.
prompt_and_model = prompt | model
output = prompt_and_model.invoke({"query": "Tell me a joke."})
parser.invoke(output)

Joke(setup='Why did the tomato turn red?', punchline='Because it saw the salad dressing!')