### Build RAG Application

In [31]:
import os
from dotenv import load_dotenv
load_dotenv()
YOUTUBE_VIDEO = "https://www.youtube.com/watch?v=cdiD-9MMpb0"

### Setup the Model  /  Configurer le modèle

In [32]:
from langchain_openai.chat_models import ChatOpenAI
# from langchain_openai import ChatOpenAI
model = ChatOpenAI(model="gpt-3.5-turbo", temperature=0.0)

# Test the model
model.invoke("Who is Elon Musk?")

AIMessage(content='Elon Musk is a billionaire entrepreneur and CEO of multiple companies, including Tesla, SpaceX, Neuralink, and The Boring Company. He is known for his work in the fields of electric vehicles, space exploration, and renewable energy. Musk is also known for his ambitious goals, such as colonizing Mars and developing a high-speed transportation system known as the Hyperloop.', response_metadata={'token_usage': {'completion_tokens': 74, 'prompt_tokens': 12, 'total_tokens': 86, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-d7eb3d4e-bcec-482d-8388-1f2bbf4deb70-0')

In [33]:
# Let's use StrOutputParser to extract the answer as a sstring
from langchain_core.output_parsers import StrOutputParser
parser = StrOutputParser()
chain = model | parser
chain.invoke("Who is Elon Musk?")

'Elon Musk is a billionaire entrepreneur and CEO of multiple companies, including Tesla, SpaceX, Neuralink, and The Boring Company. He is known for his work in the fields of electric vehicles, space exploration, and renewable energy. Musk is also a prominent figure in popular culture and is often referred to as a visionary and innovator.'

#### Introduce Prompts Template
## Introduire le Prompts Template -- Modèle d'invite de présentation

In [47]:
from langchain.prompts import ChatPromptTemplate
template = """
Answer the question based on the context below. If you can't answer the question, 
reply "My apologies, but I have no clue".env
Context: {context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
mycontext = "Britney's sister is Alyssia"
myquestion = "Who is Alyssia's sister?"
prompt.format(context = mycontext, question=myquestion)

'Human: \nAnswer the question based on the context below. If you can\'t answer the question, \nreply "My apologies, but I have no clue".env\nContext: Britney\'s sister is Alyssia\n\nQuestion: Who is Alyssia\'s sister?\n'

#### chain the prompt with the model and output parser

In [48]:
chain = prompt | model | parser
chain.invoke({
    "context" :  mycontext, 
    "question" : myquestion
})

"Alyssia's sister is Britney."

#### Concatenate / combining Chains

In [37]:
# Let's create a new prompt template for translating the output into Spanish/ French
translation_prompt = ChatPromptTemplate.from_template(
    "Translate the {answer} to {language}"
)

In [49]:
# Let's create a new translation chain that combines the first chain with the second one (translation prompt)
from operator import itemgetter
translation_chain = (
    {"answer": chain, "language": itemgetter("language")} | translation_prompt | model | parser
)
translation_chain.invoke(
    {
        "context": "Sarra's sister is Cerine. She does not have any more siblings.",
        "question": "How many sisters does Sarra have?",
        "language": ["Spanish", "Portuguese", "French", "Farsi"],
    }
)

'Spanish: Sarra tiene una hermana, Cerine.\nPortuguese: Sarra tem uma irmã, Cerine.\nFrench: Sarra a une sœur, Cerine.\nFarsi: سارا یک خواهر دارد، سرین.'

### Transcribing the YouTube Video

In [None]:
# We want to send a context to the model from YouTUBE. Let's OpenAI Whisper
import tempfile
import whisper
from pytubefix import YouTube

YOUTUBE_VIDEO = "https://www.youtube.com/watch?v=u47GtXwePms"
# Check if file not exist then create ...
if not os.path.exists("transcription.txt"):
    youtube = YouTube(YOUTUBE_VIDEO)
    audio = youtube.streams.filter(only_audio=True).first()
    
    # Let's loas the base model. Not accurate
    whisper_model = whisper.load_model("base")
    
    with tempfile.TemporaryDirectory() as tmpdir:
        file = audio.download(output_path=tmpdir)
        transcription = whisper_model.transcribe(file, fp16=False)["text"].strip()
        
        with open("transcription.txt", "w") as file:
            file.write(transcription)

    


#### display the first few characters to ensure all is working / afficher les premiers caractères pour s'assurer que tout fonctionne

In [40]:
with open("transcription.txt") as file:
    transcription = file.read()
transcription[: 120]

"Hello everyone, now welcome to my Code to Care video series. What I'm doing is I'm rotating through three different type"

### Using the entire transcription as context

In [41]:
try:
    response = chain.invoke({
        "context": transcription,
        "question": "What are some challenges of LLM? "
    })
    print(response)
except Exception as e:
    print(e)

Some challenges of LLM include the need for prompt engineering to ensure accurate responses, the difficulty of extracting relevant information from large amounts of content, and the potential bias in the generated responses. Additionally, LLM systems may struggle with understanding context and providing coherent answers to complex questions.


### Load the transcript in memory

In [42]:
from langchain_community.document_loaders import TextLoader
loader = TextLoader("transcription.txt")
text_documents = loader.load()
text_documents

[Document(page_content="Hello everyone, now welcome to my Code to Care video series. What I'm doing is I'm rotating through three different types of topics, educational topics, use case topics, and then kind of bias, ethics, safety topics. So now on the education rotation. And today what I wanted to talk about is what is retrieval, augmented generation or reg. And you may think that I'm going into some kind of nook and cranny of the AI field, but this is a very important and popular kind of solution pattern that I see being used over and over and over again for how to leverage large language models. So I thought I would explain it to you. And the thing that this is used for is basically systems that leverage large language models, but on your own content. So let me describe that. If you think of like the chat GPT experience, and if you think about that relative to like the search engine experience that we had before, if you ask a question like, I don't know what color is the sky or how

### Split text into chunks

In [43]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
# Diviser le texte en morceaux (petites parties) avec un chevauchement pour préserver le contexte entre eux.
# Split text into chunks (small parts) with some overlap to preserve context between them
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)  
text_splitter.split_documents(text_documents)


[Document(page_content="Hello everyone, now welcome to my Code to Care video series. What I'm doing is I'm rotating through three different types of topics, educational topics, use case topics, and then kind of bias, ethics, safety topics. So now on the education rotation. And today what I wanted to talk about is what is retrieval, augmented generation or reg. And you may think that I'm going into some kind of nook and cranny of the AI field, but this is a very important and popular kind of solution pattern that I see being used over and over and over again for how to leverage large language models. So I thought I would explain it to you. And the thing that this is used for is basically systems that leverage large language models, but on your own content. So let me describe that. If you think of like the chat GPT experience, and if you think about that relative to like the search engine experience that we had before, if you ask a question like, I don't know what color is the sky or how

### Finding the 'most' relevant chunks / Trouver les morceaux « les plus » pertinents

In [50]:
# Generate the embeddings for an arbitrary query
from langchain_openai.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-ada-002")
embedded_query = embeddings.embed_query("Who is Britney's sister?")
print(f"Embedded length: {len(embedded_query)}")
print(embedded_query[:20])

Embedded length: 1536
[-0.002130595652592444, -0.008266212101502673, -0.007428968971531963, -0.008891019629032127, -0.006960362860223603, 0.01723220878284994, -0.00038289024073184796, 0.01342087839457209, 0.004142477949460095, -0.02244310926213952, 0.004011268107908599, -0.012946024971672916, 0.011177817861998838, -0.010659226738706205, 0.006144988114788353, -0.02631691928558327, 0.025204759986682864, -0.00879105053638612, 0.029515935838190756, -0.02684175678914418]


In [None]:
# Generate embedding of two sample contexts / Générer l'intégration de deux contextes d'échantillon
context1 = embeddings.embed_query("Britney's sister is Alyssia")
context2 = embeddings.embed_query("Hatim's mother is a lecturer")

In [None]:
# Use Cosine Similarity to compute the similarity between the query and both contexts
# Utiliser la similarité cosinus pour calculer la similarité entre la requête et les deux contextes

from sklearn.metrics.pairwise import cosine_similarity
# cosine_similarity returns a 2D matrix / cosine_similarity renvoie une matrice 2D
query_context1_similarity = cosine_similarity([embedded_query], [context1])[0][0] 
query_context2_similarity = cosine_similarity([embedded_query], [context2])[0][0]
print(f"Similarities: {query_context1_similarity, query_context2_similarity}")


Similarities: (0.9266274878875663, 0.7323881018213361)


### Setting up a Knowledge Base (KB) / Vector Store (VS)
###### A Vector Store is a smart knowledge base that stores text as vectors (meaning), so you can do semantic search.

In [18]:
'''
A KB/VS is a database consisting of embeddings that specializes in fast similarity searches

'''
from langchain_community.vectorstores import DocArrayInMemorySearch
vectorstore1 = DocArrayInMemorySearch.from_texts(
    [
        "Britney's sister is Alyssia",
        "Kamil's beloved is Angelina Jolie",
        "Steve and Bill are brothers",
        "Susu likes blue cars",
        "hatim's mother is a lecturer",
        "Aziz drives Ferrari",
        "Newton has two siblings"
    ],embedding=embeddings
)

##### Query the vector store to find and retrieve similar embeddings to a given query
#### Interroger le magasin de vecteurs pour trouver et récupérer des incorporations similaires à une requête donnée

In [None]:
query = "What is the world population?"
vectorstore1.similarity_search_with_score(query = query, k=5)

[(Document(metadata={}, page_content='Newton has two siblings'),
  0.7396408197524729),
 (Document(metadata={}, page_content='Steve and Bill are brothers'),
  0.7364655602869042),
 (Document(metadata={}, page_content="Kamil's beloved is Angelina Jolie"),
  0.7175855140937656),
 (Document(metadata={}, page_content="hatim's mother is a lecturer"),
  0.7158917297626433),
 (Document(metadata={}, page_content='Susu likes blue cars'),
  0.707576119773465)]

### Connecting the KB / VS to the chain

In [None]:
'''
We can use the vector store to store the transcription.txt and retrieve relevant chunks from the latter
and send them to the model afterwards.
- TODO:
    - Configure a Retriever: will run similarity search in the VS and return the most similar chunks
    - We can get a retriever directly from the vector store
'''

# as_retriever : This turns your vector store into a Retriever.
#Think of a Retriever like a smart search engine — you give it a question, 
# and it gives you the best matching pieces of text from your knowledge base.
# simply wraps the vector store so that you can plug it into a chain later.


# as_retriever : Ceci transforme votre magasin vectoriel en Retriever.
#Considérez un Retriever comme un moteur de recherche intelligent : vous lui posez une question,
# et il vous fournit les éléments de texte les plus pertinents de votre base de connaissances.
# Il encapsule simplement le magasin vectoriel afin que vous puissiez l'intégrer ultérieurement à une chaîne.


# Chain : Combines retriever + prompt + model to generate full answers
chunks_retriever = vectorstore1.as_retriever()
chunks_retriever.invoke("How many siblings does Newton have?")

[Document(page_content='Newton has two siblings'),
 Document(page_content='Steve and Bill are brothers'),
 Document(page_content="Britney's sister is Alyssia"),
 Document(page_content="hatim's mother is a lecturer")]

#### Reminder: 

**Our prompt expects two parameters, to with, "context" and "question".
We can use the retriever to find the relevant chunks we will use as the context to answer
 the question.**


In [26]:
from langchain_core.runnables import RunnableParallel, RunnablePassthrough
setup = RunnableParallel(context = chunks_retriever, question=RunnablePassthrough())
setup.invoke("How many siblings does Newtom have?")

{'context': [Document(page_content='Newton has two siblings'),
  Document(page_content='Steve and Bill are brothers'),
  Document(page_content="Britney's sister is Alyssia"),
  Document(page_content="hatim's mother is a lecturer")],
 'question': 'How many siblings does Newtom have?'}

#### Add the setup map to the chain

In [None]:
# Il prend la question, trouve les informations pertinentes, les formate, demande au modèle et nettoie la réponse.
# it Takes the question, finds relevant info, formats it, asks the model, and cleans up the answer.
chain = setup | prompt | model | parser
chain.invoke("How many siblings does Newtom have?")

'Newton has two siblings.'

In [None]:
chain.invoke("What car does Aziz drive?")

'Aziz drives a Ferrari.'

In [None]:
chain.invoke("Whose beloved is Angelina Jolie?")

"Kamil's beloved is Angelina Jolie."

### Loading Transcript into the Vector Store

In [19]:
# Setup the chain using the correct vectorstore.add()
documents = text_splitter.split_documents(text_documents)
trans_vectorstore = DocArrayInMemorySearch.from_documents(documents=documents, embedding=embeddings)

In [None]:
chain = (
    {"context": trans_vectorstore.as_retriever(), "question": RunnablePassthrough()}
    | prompt
    | model
    |parser
)
chain.invoke("What is RAG?")


'RAG stands for Retrieval Augmented Generation, which is an architecture used to retrieve relevant documents from content, augment the generation process, and improve the ability of large language models to generate AI based on the retrieved documents.'

### Let's use Pinecone as Vector Store.
Pinecone is the leading AI infrastructure for building accurate, secure, and scalable AI applications. Use Pinecone Database to store and search vector data at scale, or start with Pinecone Assistant to get a RAG application running in minutes.

In [None]:
# taking your documents, turning them into vector format (with meaning), 
# and storing them in Pinecone so your AI assistant can later search and answer questions based on them.

# prendre vos documents, les transformer en format vectoriel (avec sens) 
# et les stocker dans Pinecone afin que votre assistant IA puisse ensuite rechercher et répondre à des questions en fonction de ceux-ci.
from dotenv import load_dotenv
load_dotenv()

from langchain_pinecone import PineconeVectorStore
pinecone_index_name = "rcw2025"

pinecone = PineconeVectorStore.from_documents(
    documents=documents, embedding=embeddings, index_name=pinecone_index_name
)


### Use Pinecone as retrieval

In [28]:
pinecone.similarity_search("How does RAG differ from Fine-Tune?")[:3]

[Document(page_content="pull out only the parts of your content that are relevant to the patient's question. So this is another tricky part of this whole rag architecture. And the way that works is that you take all your content and you break it into chunks or these systems will break it into chunks. So chunks might be a paragraph of content or a couple paragraphs, a page, something like that. And then those chunks are sent to a large language model, could be the same one or a different one. And they are turned into a vector. And so each each paragraph or each chunk will have a vector which is just a series of numbers. And that series of numbers, you can think of it as the numeric representation of the essence of that paragraph. And what's different about these numbers, they're not random numbers, but paragraphs to talk about a similar topic have close by numbers. They almost have the same vectors. So in addition to the, it's a numericized version of the paragraph, but it's such that s

### Combine Picone with the Chain / Combiner Picone avec la chaîne

In [27]:
chain = (
    {"context": pinecone.as_retriever(), "question": RunnablePassthrough()}
    | prompt
    | model
    | parser
)
chain.invoke("How does RAG differ from Fine-Tune?")

'My apologies, but I have no clue.'