# Building a RAG application from scratch

Let's start by loading the environment variables we need to use.

In [None]:
pip install -r requirements.txt

Collecting git+https://github.com/openai/whisper.git (from -r requirements.txt (line 15))
  Cloning https://github.com/openai/whisper.git to /tmp/pip-req-build-tvtoxq1c
  Running command git clone --filter=blob:none --quiet https://github.com/openai/whisper.git /tmp/pip-req-build-tvtoxq1c
  Resolved https://github.com/openai/whisper.git to commit ba3f3cd54b0e5b8ce1ab3de13e32122d0d5f98ab
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting openai (from -r requirements.txt (line 1))
  Downloading openai-1.13.3-py3-none-any.whl (227 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m227.4/227.4 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting pypdf (from -r requirements.txt (line 2))
  Downloading pypdf-4.1.0-py3-none-any.whl (286 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m286.1/286.1 kB[0m [31m10.2 MB/s

In [None]:
import os
from google.colab import userdata

os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')
os.environ['PINECONE_API_KEY'] = userdata.get('PINECONE_API_KEY')
os.environ['PINECONE_API_ENV'] = userdata.get('PINECONE_API_ENV')



# This is the YouTube video we're going to use.
YOUTUBE_VIDEO = "https://www.youtube.com/watch?v=cdiD-9MMpb0"

## Setting up the model
Let's define the LLM model that we'll use as part of the workflow.

In [None]:
from langchain_openai.chat_models import ChatOpenAI

model = ChatOpenAI(openai_api_key= userdata.get('OPENAI_API_KEY'), model="gpt-3.5-turbo")

We can test the model by asking a simple question.

In [None]:
model.invoke("What MLB team won the World Series during the COVID-19 pandemic?")

AIMessage(content='The Los Angeles Dodgers won the World Series during the COVID-19 pandemic in 2020.')

In [None]:
from langchain_core.output_parsers import

parser = StrOutputParser()

chain = model | parser
chain.invoke("What MLB team won the World Series during the COVID-19 pandemic?")

'The Los Angeles Dodgers won the World Series during the COVID-19 pandemic in 2020.'

## Introducing prompt templates

We want to provide the model with some context and the question. [Prompt templates](https://python.langchain.com/docs/modules/model_io/prompts/quick_start) are a simple way to define and reuse prompts.

In [None]:
from langchain.prompts import ChatPromptTemplate

template = """
Answer the question based on the context below. If you can't
answer the question, reply "I don't know".

Context: {context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)
prompt.format(context="Mary's sister is Susana", question="Who is Mary's sister?")

'Human: \nAnswer the question based on the context below. If you can\'t\nanswer the question, reply "I don\'t know".\n\nContext: Mary\'s sister is Susana\n\nQuestion: Who is Mary\'s sister?\n'

We can now chain the prompt with the model and the output parser.

In [None]:
chain = prompt | model | parser
chain.invoke({
    "context": "Mary's sister is Susana",
    "question": "Who is Mary's sister?"
})

'Susana'

In [None]:
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("/content/Jean-Louis Vincent, Jesse B. Hall (eds.) - Encyclopedia of Intensive Care Medicine-Springer-Verlag Berlin Heidelberg (2012).pdf")
transcription = loader.load()

In [None]:
try:
    chain.invoke({
        "context": transcription,
        "question": "Is reading papers a good idea?"
    })
except Exception as e:
    print(e)

## Splitting the transcription

Since we can't use the entire transcription as the context for the model, a potential solution is to split the transcription into smaller chunks. We can then invoke the model using only the relevant chunks to answer a particular question:

In [None]:
'''from langchain_community.document_loaders import TextLoader

loader = TextLoader("transcription.txt")
text_documents = loader.load()
text_documents'''

There are many different ways to split a document. For this example, we'll use a simple splitter that splits the document into chunks of a fixed size. Check [Text Splitters](https://python.langchain.com/docs/modules/data_connection/document_transformers/) for more information about different approaches to splitting documents.

For illustration purposes, let's split the transcription into chunks of 100 characters with an overlap of 20 characters and display the first few chunks:

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=20)
text_splitter.split_documents(transcription)[:5]

[Document(page_content='Encyclopedia of Intensive Care Medicine', metadata={'source': '/content/Jean-Louis Vincent, Jesse B. Hall (eds.) - Encyclopedia of Intensive Care Medicine-Springer-Verlag Berlin Heidelberg (2012).pdf', 'page': 0}),
 Document(page_content='Jean-Louis Vincent and Jesse B. Hall (Eds)\nEncyclopedia of\nIntensive Care Medicine', metadata={'source': '/content/Jean-Louis Vincent, Jesse B. Hall (eds.) - Encyclopedia of Intensive Care Medicine-Springer-Verlag Berlin Heidelberg (2012).pdf', 'page': 2}),
 Document(page_content='With 716 Figures and 450 Tables', metadata={'source': '/content/Jean-Louis Vincent, Jesse B. Hall (eds.) - Encyclopedia of Intensive Care Medicine-Springer-Verlag Berlin Heidelberg (2012).pdf', 'page': 2}),
 Document(page_content='Editors\nJean-Louis VincentHead Dept of Intensive CareErasme Hospital (Free University of Brussels)', metadata={'source': '/content/Jean-Louis Vincent, Jesse B. Hall (eds.) - Encyclopedia of Intensive Care Medicine-Springe

For our specific application, let's use 1000 characters instead:

In [None]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)
documents = text_splitter.split_documents(transcription)

## Finding the relevant chunks

Given a particular question, we need to find the relevant chunks from the transcription to send to the model. Here is where the idea of **embeddings** comes into play.

An embedding is a mathematical representation of the semantic meaning of a word, sentence, or document. It's a projection of a concept in a high-dimensional space. Embeddings have a simple characteristic: The projection of related concepts will be close to each other, while concepts with different meanings will lie far away. We can use the [Cohere's Embed Playground](https://dashboard.cohere.com/playground/embed) to visualize embeddings in two dimensions.

To provide with the most relevant chunks, we can use the embeddings of the question and the chunks of the transcription to compute the similarity between them. We can then select the chunks with the highest similarity to the question and use them as the context for the model:

Let's generate embeddings for an arbitrary query:

In [None]:
from langchain_openai.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)
embedded_query = embeddings.embed_query("Who is Mary's sister?")

print(f"Embedding length: {len(embedded_query)}")
print(embedded_query[:10])

To illustrate how embeddings work, let's first generate the embeddings for two different sentences:

In [None]:
sentence1 = embeddings.embed_query("Mary's sister is Susana")
sentence2 = embeddings.embed_query("Pedro's mother is a teacher")

We can now compute the similarity between the query and each of the two sentences. The closer the embeddings are, the more similar the sentences will be.

We can use [Cosine Similarity](https://en.wikipedia.org/wiki/Cosine_similarity) to calculate the similarity between the query and each of the sentences:

In [None]:
from sklearn.metrics.pairwise import cosine_similarity

query_sentence1_similarity = cosine_similarity([embedded_query], [sentence1])[0][0]
query_sentence2_similarity = cosine_similarity([embedded_query], [sentence2])[0][0]

query_sentence1_similarity, query_sentence2_similarity

## Setting up a Vector Store

We need an efficient way to store document chunks, their embeddings, and perform similarity searches at scale. To do this, we'll use a **vector store**.

A vector store is a database of embeddings that specializes in fast similarity searches.


To understand how a vector store works, let's create one in memory and add a few embeddings to it:

In [None]:
from langchain_community.vectorstores import DocArrayInMemorySearch

vectorstore1 = DocArrayInMemorySearch.from_texts(
    [
        "Mary's sister is Susana",
        "John and Tommy are brothers",
        "Patricia likes white cars",
        "Pedro's mother is a teacher",
        "Lucia drives an Audi",
        "Mary has two siblings",
    ],
    embedding=embeddings,
)

We can now query the vector store to find the most similar embeddings to a given query:

In [None]:
vectorstore1.similarity_search_with_score(query="Who is Mary's sister?", k=3)

## Connecting the vector store to the chain

We can use the vector store to find the most relevant chunks from the transcription to send to the model. Here is how we can connect the vector store to the chain:

We need to configure a [Retriever](https://python.langchain.com/docs/modules/data_connection/retrievers/). The retriever will run a similarity search in the vector store and return the most similar documents back to the next step in the chain.

We can get a retriever directly from the vector store we created before:

In [None]:
retriever1 = vectorstore1.as_retriever()
retriever1.invoke("Who is Mary's sister?")

Our prompt expects two parameters, "context" and "question." We can use the retriever to find the chunks we'll use as the context to answer the question.

We can create a map with the two inputs by using the [`RunnableParallel`](https://python.langchain.com/docs/expression_language/how_to/map) and [`RunnablePassthrough`](https://python.langchain.com/docs/expression_language/how_to/passthrough) classes. This will allow us to pass the context and question to the prompt as a map with the keys "context" and "question."

In [None]:
from langchain_core.runnables import RunnableParallel, RunnablePassthrough

setup = RunnableParallel(context=retriever1, question=RunnablePassthrough())
setup.invoke("What color is Patricia's car?")

Let's now add the setup map to the chain and run it:



In [None]:
chain = setup | prompt | model | parser
chain.invoke("What color is Patricia's car?")

Let's invoke the chain using another example:

In [None]:
chain.invoke("What car does Lucia drive?")

## Loading transcription into the vector store

We initialized the vector store with a few random strings. Let's create a new vector store using the chunks from the video transcription.

Let's set up a new chain using the correct vector store. This time we are using a different equivalent syntax to specify the [`RunnableParallel`](https://python.langchain.com/docs/expression_language/how_to/map) portion of the chain:

In [None]:
chain = (
    {"context": vectorstore2.as_retriever(), "question": RunnablePassthrough()}
    | prompt
    | model
    | parser
)
chain.invoke("What is synthetic intelligence?")

## Setting up Pinecone

So far we've used an in-memory vector store. In practice, we need a vector store that can handle large amounts of data and perform similarity searches at scale. For this example, we'll use [Pinecone](https://www.pinecone.io/).

The first step is to create a Pinecone account, set up an index, get an API key, and set it as an environment variable `PINECONE_API_KEY`.

Then, we can load the transcription documents into Pinecone:

In [None]:
from langchain_pinecone import PineconeVectorStore
from langchain_openai.embeddings import OpenAIEmbeddings
from langchain_core.runnables import RunnableParallel, RunnablePassthrough

embeddings = OpenAIEmbeddings(openai_api_key=userdata.get('OPENAI_API_KEY'))
pinecone = PineconeVectorStore.from_documents(documents, embeddings, index_name = userdata.get('PINECONE_API_ENV'))

Let's now run a similarity search on pinecone to make sure everything works:

In [None]:
pinecone.similarity_search("What is pseudoaneurysm ?")[:3]

[Document(page_content='muscle protein. Crit Care Med 35(9 Suppl):S630–S634\nProtein-Energy Wasting\n▶Metabolic Disorders, Other\nPSB\nProtected Specimen Brush (PSB).\nPseudoaneurysm\nSCOTT E. B ELL1,KATHRYN M. B EAUCHAMP2\n1Department of Neurosurgery, School of Medicine,\nUniversity of Colorado Health Sciences Center,\nDenver, CO, USA\n2Department of Neurosurgery, Denver Health Medical\nCenter, University of Colorado School of Medicine,Denver, CO, USA\nDefinition\nPseudoaneurysm, or false aneurysm, describes a communi-\ncation between a lumen and the surrounding soft tissue. It is\ncaused by direct or indirect injury to a vessel, but may alsooccur with ischemic injury to the heart. This is a condition ofiatrogenic, traumatic, anastomotic, infected, or ischemic', metadata={'page': 1916.0, 'source': '/content/Jean-Louis Vincent, Jesse B. Hall (eds.) - Encyclopedia of Intensive Care Medicine-Springer-Verlag Berlin Heidelberg (2012).pdf'}),
 Document(page_content='pseudoaneurysms. One dan

Let's setup the new chain using Pinecone as the vector store:

In [None]:
chain = (
    {"context": pinecone.as_retriever(), "question": RunnablePassthrough()}
    | prompt
    | model
    | parser
)


chain.invoke("Give me a set of clinical guidelines to treat a pseudoaneurysm")

AIMessage(content='- Observation without treatment is reasonable for pseudoaneurysms <3 cm in size, non-expansile, non-painful, and without post-procedure anticoagulation\n- Surgical remediation is the gold standard for difficult-to-access lesions\n- Endovascular treatment is reserved for select circumstances\n- Early identification and treatment of pseudoaneurysms is crucial to prevent complications\n- Use of anticoagulation by the patient is important for nonsurgical treatments\n- Activity restrictions post-treatment are recommended to allow for proper healing and prevent complications')