### Build RAG Application

In [2]:
import os
from dotenv import load_dotenv
load_dotenv()
YOUTUBE_VIDEO = "https://www.youtube.com/watch?v=cdiD-9MMpb0"

## Setup the Model

In [3]:
from langchain_openai.chat_models import ChatOpenAI
model = ChatOpenAI(model="gpt-3.5-turbo", temperature=0.0)

# Test the model
model.invoke("Who is Elon Musk?")

AIMessage(content='Elon Musk is a billionaire entrepreneur and CEO of multiple companies, including Tesla, SpaceX, Neuralink, and The Boring Company. He is known for his ambitious vision for the future, including the colonization of Mars, the development of sustainable energy solutions, and the advancement of artificial intelligence. Musk is also known for his outspoken and sometimes controversial statements on social media.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 73, 'prompt_tokens': 12, 'total_tokens': 85, 'completion_tokens_details': {'audio_tokens': 0, 'reasoning_tokens': 0, 'accepted_prediction_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-aa057e9d-45a7-49c5-b5e2-9f373ad0d083-0', usage_metadata={'input_tokens': 12, 'output_tokens': 73, 'total_toke

In [4]:
# Let's use StrOutputParser to extract the answer as a sstring
from langchain_core.output_parsers import StrOutputParser
parser = StrOutputParser()
chain = model | parser
chain.invoke("Who is Elon Musk?")

'Elon Musk is a billionaire entrepreneur and CEO of multiple companies, including Tesla, SpaceX, Neuralink, and The Boring Company. He is known for his work in the fields of electric vehicles, space exploration, and renewable energy. Musk is also a prominent figure in popular culture and is often referred to as a visionary and innovator.'

## Introduce Prompts Template

In [5]:
from langchain.prompts import ChatPromptTemplate
template = """
Answer the question based on the context below. If you can't answer the question, 
reply "My apologies, but I have no clue".env
Context: {context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
mycontext = "Britney's sister is Alyssia"
myquestion = "Is Alyssia's sister?"
prompt.format(context = mycontext, question=myquestion)

'Human: \nAnswer the question based on the context below. If you can\'t answer the question, \nreply "My apologies, but I have no clue".env\nContext: Britney\'s sister is Alyssia\n\nQuestion: Is Alyssia\'s sister?\n'

#### Let's chain the prompt with the model and output parser

In [6]:
chain = prompt | model | parser
chain.invoke({
    "context" :  mycontext, 
    "question" : myquestion
})

"Yes, Britney is Alyssia's sister."

#### Concatenate / combining Chains

In [7]:
# Let's create a new prompt template for translating the output into Spanish/ French
translation_prompt = ChatPromptTemplate.from_template(
    "Translate the {answer} to {language}"
)

In [8]:
# Let's create a new translation chain that combines the first chain with the second one (translation prompt)
from operator import itemgetter
translation_chain = (
    {"answer": chain, "language": itemgetter("language")} | translation_prompt | model | parser
)
translation_chain.invoke(
    {
        "context": "Sarra's sister is Cerine. She does not have any more siblings.",
        "question": "How many sisters does Sarra have?",
        "language": ["Spanish", "Portuguese", "French"],
    }
)

'Spanish: Sarra tiene una hermana, Cerine.\nPortuguese: Sarra tem uma irmã, Cerine.\nFrench: Sarra a une sœur, Cerine.'

### Transcribing the YouTube Video

In [9]:
# # We want to send a context to the model from YouTUBE. Let's OpenAI Whisper
# import tempfile
# import whisper
# from pytubefix import YouTube

# YOUTUBE_VIDEO = "https://www.youtube.com/watch?v=u47GtXwePms"
# # Check if file not exist then create ...
# if not os.path.exists("transcription.txt"):
#     youtube = YouTube(YOUTUBE_VIDEO)
#     audio = youtube.streams.filter(only_audio=True).first()
    
#     # Let's loas the base model. Not accurate
#     whisper_model = whisper.load_model("base")
    
#     with tempfile.TemporaryDirectory() as tmpdir:
#         file = audio.download(output_path=tmpdir)
#         transcription = whisper_model.transcribe(file, fp16=False)["text"].strip()
        
#         with open("transcription.txt", "w") as file:
#             file.write(transcription)
    


#### Let's read the transcription and display the first few characters to ensure all is working

In [10]:
with open("transcription.txt") as file:
    transcription = file.read()
transcription[: 120]

"Let's talk about RAG versus fine-tuning. Now, they're both powerful ways to enhance the capabilities of large language m"

### Using the entire transcription as context

In [11]:
try:
    response = chain.invoke({
        "context": transcription,
        "question": "What are some challenges of LLM? "
    })
    print(response)
except Exception as e:
    print(e)

Some challenges of LLM include limitations in providing accurate or up-to-date information for specific queries, being very generalistic in nature, and the need to specialize them for specific use cases and adapt them in enterprise applications.


### Load the transcript in memory

In [12]:
from langchain_community.document_loaders import TextLoader
loader = TextLoader("transcription.txt")
text_documents = loader.load()
text_documents

[Document(metadata={'source': 'transcription.txt'}, page_content="Let's talk about RAG versus fine-tuning. Now, they're both powerful ways to enhance the capabilities of large language models, but today you're going to learn about their strengths, their use cases, and how you can choose between them. So one of the biggest issues with dealing with generative AI right now is one, enhancing the models, but also two, dealing with their limitations. For example, I just recently asked my favorite LLM a simple question, who won the Euro 2024 World Championship? And while this might seem like a simple query for my model, well, there's a slight issue. Because the model wasn't trained on that specific information, it can't give me an accurate or up-to-date answer. At the same time, these popular models are very generalistic. And so how do we think about specializing them for specific use cases and adapt them in enterprise applications? Because your data is one of the most important things that y

### Split text into chunks

In [13]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
text_splitter.split_documents(text_documents)


[Document(metadata={'source': 'transcription.txt'}, page_content="Let's talk about RAG versus fine-tuning. Now, they're both powerful ways to enhance the capabilities of large language models, but today you're going to learn about their strengths, their use cases, and how you can choose between them. So one of the biggest issues with dealing with generative AI right now is one, enhancing the models, but also two, dealing with their limitations. For example, I just recently asked my favorite LLM a simple question, who won the Euro 2024 World Championship? And while this might seem like a simple query for my model, well, there's a slight issue. Because the model wasn't trained on that specific information, it can't give me an accurate or up-to-date answer. At the same time, these popular models are very generalistic. And so how do we think about specializing them for specific use cases and adapt them in enterprise applications? Because your data is one of the most important things that y

### Finding the 'most' relevant chunks

In [26]:
# Generate the embeddings for an arbitrary query
from langchain_openai.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-ada-002")
embedded_query = embeddings.embed_query("Who is Britney's sister?")
print(f"Embedded length: {len(embedded_query)}")
print(embedded_query[:20])

Embedded length: 1536
[-0.0021738149225711823, -0.008245505392551422, -0.007408461533486843, -0.008813945576548576, -0.006939966697245836, 0.017240602523088455, -0.0003338024253025651, 0.013417685404419899, 0.004094643052667379, -0.022512728348374367, 0.00396346440538764, -0.012892971746623516, 0.01120014488697052, -0.010675430297851562, 0.006149772554636002, -0.02646057680249214, 0.025136297568678856, -0.008645287714898586, 0.02948392741382122, -0.026935316622257233]


In [15]:
# Generate embedding of two sample contexts
context1 = embeddings.embed_query("Britney's sister is Alyssia")
context2 = embeddings.embed_query("Hatim's mother is a lecturer")

In [16]:
# Let's use Cosine Similarity to compute the similarity between the query and both contexts
from sklearn.metrics.pairwise import cosine_similarity
query_context1_similarity = cosine_similarity([embedded_query], [context1])[0][0]
query_context2_similarity = cosine_similarity([embedded_query], [context2])[0][0]
print(f"Similarities: {query_context1_similarity, query_context2_similarity}")


Similarities: (0.9266278574518928, 0.7321724059856504)


### Setting up a Knowledge Base (KB) / Vector Store (VS)


In [17]:
'''
A KB/VS is a database consisting of embeddings that specializes in fast similarity searches

'''
from langchain_community.vectorstores import DocArrayInMemorySearch
vectorstore1 = DocArrayInMemorySearch.from_texts(
    [
        "Britney's sister is Alyssia",
        "Kamil's beloved is Angelina Jolie",
        "Steve and Bill are brothers",
        "Susu likes blue cars",
        "hatim's mother is a lecturer",
        "Aziz drives Ferrari",
        "Newton has two siblings"
    ],embedding=embeddings
)

##### Query the vector store to find and retrieve similar embeddings to a given query

In [18]:
query = "What is the world population?"
vectorstore1.similarity_search_with_score(query = query, k=5)

[(Document(page_content='Newton has two siblings'), 0.7401043216058089),
 (Document(page_content='Steve and Bill are brothers'), 0.736737873485554),
 (Document(page_content="Kamil's beloved is Angelina Jolie"),
  0.7177979498989019),
 (Document(page_content="hatim's mother is a lecturer"), 0.716337677717846),
 (Document(page_content="Britney's sister is Alyssia"), 0.7068967830519695)]

### Connecting the KB / VS to the chain

In [19]:
'''
We can use the vector store to store the transcription.txt and retrieve relevant chunks from the latter
and send them to the model afterwards.
- TODO:
    - Configure a Retriever: will run similarity search in the VS and return the most similar chunks
    - We can get a retriever directly from the vector store
'''
chunks_retriever = vectorstore1.as_retriever()
chunks_retriever.invoke("How many siblings does Newton have?")

[Document(page_content='Newton has two siblings'),
 Document(page_content='Steve and Bill are brothers'),
 Document(page_content="Britney's sister is Alyssia"),
 Document(page_content="hatim's mother is a lecturer")]

#### Reminder: 

**Our prompt expects two parameters, to with, "context" and "question".
We can use the retriever to find the relevant chunks we will use as the context to answer
 the question.**


In [20]:
from langchain_core.runnables import RunnableParallel, RunnablePassthrough
setup = RunnableParallel(context = chunks_retriever, question=RunnablePassthrough())
setup.invoke("How many siblings does Newtom have?")

{'context': [Document(page_content='Newton has two siblings'),
  Document(page_content='Steve and Bill are brothers'),
  Document(page_content="Britney's sister is Alyssia"),
  Document(page_content="hatim's mother is a lecturer")],
 'question': 'How many siblings does Newtom have?'}

#### Add the setup map to the chain

In [21]:
chain = setup | prompt | model | parser
chain.invoke("How many siblings does Newtom have?")

'Newton has two siblings.'

In [22]:
chain.invoke("What car does Aziz drive?")

'Aziz drives a Ferrari.'

In [23]:
chain.invoke("Whose beloved is Angelina Jolie?")

"Kamil's beloved is Angelina Jolie."

### Loading Transcript into the Vector Store

In [24]:
# Setup the chain using the correct vectorstore.add()
documents = text_splitter.split_documents(text_documents)
trans_vectorstore = DocArrayInMemorySearch.from_documents(documents=documents, embedding=embeddings)

In [25]:
chain = (
    {"context": trans_vectorstore.as_retriever(), "question": RunnablePassthrough()}
    | prompt
    | model
    |parser
)
chain.invoke("What is RAG?")


'RAG stands for retrieval augmented generation, which is a technique used to increase the capabilities of a model by retrieving external and up-to-date information, augmenting the original prompt given to the model, and generating a response using that context and information.'

### Let's use Pinecone as Vector Store.
Pinecone is the leading AI infrastructure for building accurate, secure, and scalable AI applications. Use Pinecone Database to store and search vector data at scale, or start with Pinecone Assistant to get a RAG application running in minutes.

In [27]:
from langchain_pinecone import PineconeVectorStore
pinecone_index_name = "rcwcourses2024"

pinecone = PineconeVectorStore.from_documents(
    documents=documents, embedding=embeddings, index_name=pinecone_index_name
)


### Use Pinecone as retrieval

In [30]:
pinecone.similarity_search("How does RAG differ from Fine-Tune?")[:3]

[Document(metadata={'source': 'transcription.txt'}, page_content="that we had with the World Cup example. So both of these have their strengths and weaknesses. But let's actually see this in some examples and use cases here. So when you're thinking about choosing between RAG and fine tuning, it's really important to consider your AI-enabled application's priorities and requirements. So mainly, this starts off with the data. Is the data that you're working with slow moving, or is it fast? For example, if we need to use up-to-date external information and have that ready contextually every time we use a model, then this could be a great use case for RAG. For example, a product documentation chatbot where we can continually update the responses with up-to-date information. Now, at the same time, let's think about the industry that you might be in. Now, fine tuning is really powerful for specific industries that have nuances in their writing styles, terminology, vocabulary. And so, for exa

### Combine Picone with the Chain

In [29]:
chain = (
    {"context": pinecone.as_retriever(), "question": RunnablePassthrough()}
    | prompt
    | model
    | parser
)
chain.invoke("How does RAG differ from Fine-Tune?")

'RAG (Retrieval Augmented Generation) enhances the capabilities of a model by retrieving external and up-to-date information, augmenting the original prompt given to the model, and generating a response using that context and information. On the other hand, Fine-Tuning is a technique that specializes the model in a certain domain by adjusting its parameters based on specific data from that domain.'