### Build RAG Application

In [28]:
import os
from dotenv import load_dotenv
load_dotenv()

True

### Setup the Model  /  Configurer le modèle

In [29]:
from langchain_openai.chat_models import ChatOpenAI

model = ChatOpenAI(model="gpt-3.5-turbo", temperature=0.0)

# Test the model
model.invoke("Who is Elon Musk?")

AIMessage(content='Elon Musk is a billionaire entrepreneur and CEO of multiple companies, including Tesla, SpaceX, Neuralink, and The Boring Company. He is known for his work in the fields of electric vehicles, space exploration, and renewable energy. Musk is also a prominent figure in popular culture and is often referred to as a visionary and innovator.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 68, 'prompt_tokens': 12, 'total_tokens': 80, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-99226f9b-5f0b-4db4-8c96-64b6fa152987-0', usage_metadata={'input_tokens': 12, 'output_tokens': 68, 'total_tokens': 80})

In [30]:
# use StrOutputParser to extract the answer as a string
from langchain_core.output_parsers import StrOutputParser
parser = StrOutputParser()
chain = model | parser
chain.invoke("Who is Elon Musk?")

'Elon Musk is a billionaire entrepreneur and CEO of multiple companies, including Tesla, SpaceX, Neuralink, and The Boring Company. He is known for his work in the fields of electric vehicles, space exploration, and renewable energy. Musk is also a prominent figure in popular culture and is often referred to as a visionary and innovator.'

#### Introduce Prompts Template
## Introduire le Prompts Template -- Modèle d'invite de présentation

In [31]:
from langchain.prompts import ChatPromptTemplate
template = """
Answer the question based on the context below. If you can't answer the question, 
reply "My apologies, but I have no clue".env
Context: {context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

#### chain the prompt with the model and output parser

In [32]:
chain = prompt | model | parser 

#### Concatenate / combining Chains

In [33]:
# Let's create a new prompt template for translating the output into Spanish/ French
translation_prompt = ChatPromptTemplate.from_template(
    "Translate the {answer} to {language}"
)

In [34]:
# Let's create a new translation chain that combines the first chain with the second one (translation prompt)
from operator import itemgetter
translation_chain = (
    {"answer": chain, "language": itemgetter("language")} | translation_prompt | model | parser
)
translation_chain.invoke(
    {
        "context": "Sarra's sister is Cerine. She does not have any more siblings.",
        "question": "How many sisters does Sarra have?",
        "language": ["Spanish", "Portuguese", "French", "Farsi"],
    }
)

'Spanish: Sarra tiene una hermana, Cerine.\nPortuguese: Sarra tem uma irmã, Cerine.\nFrench: Sarra a une sœur, Cerine.\nFarsi: سارا یک خواهر دارد، سرین.'

### Transcribing the YouTube Video

In [35]:
# We want to send a context to the model from YouTUBE. Let's OpenAI Whisper
import tempfile
import whisper
from pytubefix import YouTube

YOUTUBE_VIDEO = "https://www.youtube.com/watch?v=u47GtXwePms"
# Check if file not exist then create ...
if not os.path.exists("transcription.txt"):
    youtube = YouTube(YOUTUBE_VIDEO)
    audio = youtube.streams.filter(only_audio=True).first()
    
    # Let's loas the base model. Not accurate
    whisper_model = whisper.load_model("base")
    
    with tempfile.TemporaryDirectory() as tmpdir:
        file = audio.download(output_path=tmpdir)
        transcription = whisper_model.transcribe(file, fp16=False)["text"].strip()
        
        with open("transcription.txt", "w") as file:
            file.write(transcription)

    


#### display the first few characters to ensure all is working / afficher les premiers caractères pour s'assurer que tout fonctionne

In [36]:
with open("transcription.txt") as file:
    transcription = file.read()
transcription[: 120]

"Hello everyone, now welcome to my Code to Care video series. What I'm doing is I'm rotating through three different type"

### Using the entire transcription as context

In [37]:
try:
    response = chain.invoke({
        "context": transcription,
        "question": "What are some challenges of LLM? "
    })
    print(response)
except Exception as e:
    print(e)

Some challenges of LLM include the need for prompt engineering to ensure accurate responses, the difficulty of extracting relevant information from large amounts of content, and the potential bias in the generated responses.


### Load the transcript in memory

In [38]:
from langchain_community.document_loaders import TextLoader
loader = TextLoader("transcription.txt")
text_documents = loader.load()
text_documents

[Document(metadata={'source': 'transcription.txt'}, page_content="Hello everyone, now welcome to my Code to Care video series. What I'm doing is I'm rotating through three different types of topics, educational topics, use case topics, and then kind of bias, ethics, safety topics. So now on the education rotation. And today what I wanted to talk about is what is retrieval, augmented generation or reg. And you may think that I'm going into some kind of nook and cranny of the AI field, but this is a very important and popular kind of solution pattern that I see being used over and over and over again for how to leverage large language models. So I thought I would explain it to you. And the thing that this is used for is basically systems that leverage large language models, but on your own content. So let me describe that. If you think of like the chat GPT experience, and if you think about that relative to like the search engine experience that we had before, if you ask a question like,

### Split text into chunks

In [39]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
# Diviser le texte en morceaux (petites parties) avec un chevauchement pour préserver le contexte entre eux.
# Split text into chunks (small parts) with some overlap to preserve context between them
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)  
text_splitter.split_documents(text_documents)


[Document(metadata={'source': 'transcription.txt'}, page_content="Hello everyone, now welcome to my Code to Care video series. What I'm doing is I'm rotating through three different types of topics, educational topics, use case topics, and then kind of bias, ethics, safety topics. So now on the education rotation. And today what I wanted to talk about is what is retrieval, augmented generation or reg. And you may think that I'm going into some kind of nook and cranny of the AI field, but this is a very important and popular kind of solution pattern that I see being used over and over and over again for how to leverage large language models. So I thought I would explain it to you. And the thing that this is used for is basically systems that leverage large language models, but on your own content. So let me describe that. If you think of like the chat GPT experience, and if you think about that relative to like the search engine experience that we had before, if you ask a question like,

### Finding the 'most' relevant chunks / Trouver les morceaux « les plus » pertinents

In [40]:
# Generate the embeddings for an arbitrary query
from langchain_openai.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-ada-002")
embedded_query = embeddings.embed_query("What is RAG?")
print(f"Embedded length: {len(embedded_query)}")
print(embedded_query[:20])

Embedded length: 1536
[-0.012432429939508438, -0.016618860885500908, 0.007928845472633839, -0.017041731625795364, 0.011424585245549679, -0.004905312322080135, -0.03549303859472275, -0.004852453712373972, -0.03693079948425293, -0.023398904129862785, 0.009274987503886223, -0.00261651910841465, 0.0011558495461940765, 0.0021266925614327192, 0.002008640905842185, -0.00010714954260038212, 0.029093578457832336, 0.004961695522069931, -0.004193478263914585, 0.001330284052528441]


### Setting up a Knowledge Base (KB) / Vector Store (VS)
###### A Vector Store is a smart knowledge base that stores text as vectors (meaning), so you can do semantic search.

### Loading Transcript into the Vector Store

In [41]:
# Setup the chain using the correct vectorstore.add()
'''
A KB/VS is a database consisting of embeddings that specializes in fast similarity searches

'''
from langchain_community.vectorstores import DocArrayInMemorySearch
documents = text_splitter.split_documents(text_documents)
trans_vectorstore = DocArrayInMemorySearch.from_documents(documents=documents, embedding=embeddings)

In [42]:
chain = (
    {"context": trans_vectorstore.as_retriever(), "question": RunnablePassthrough()}
    | prompt
    | model
    |parser
)
chain.invoke("What is RAG?")


'RAG stands for Retrieval Augmented Generation, which is an architecture used to retrieve relevant documents from content, augment the generation process, and improve the ability of large language models to generate AI based on the retrieved documents.'

### Let's use Pinecone as Vector Store.
Pinecone is the leading AI infrastructure for building accurate, secure, and scalable AI applications. Use Pinecone Database to store and search vector data at scale, or start with Pinecone Assistant to get a RAG application running in minutes.

In [43]:
# taking your documents, turning them into vector format (with meaning), 
# and storing them in Pinecone so your AI assistant can later search and answer questions based on them.

# prendre vos documents, les transformer en format vectoriel (avec sens) 
# et les stocker dans Pinecone afin que votre assistant IA puisse ensuite rechercher et répondre à des questions en fonction de ceux-ci.
from dotenv import load_dotenv
load_dotenv()

from langchain_pinecone import PineconeVectorStore
pinecone_index_name = "rcw2025"

pinecone = PineconeVectorStore.from_documents(
    documents=documents, embedding=embeddings, index_name=pinecone_index_name
)


### Use Pinecone as retrieval

In [44]:
pinecone.similarity_search("How does RAG differ from Fine-Tune?")[:3]

[Document(metadata={'source': 'transcription.txt'}, page_content="pull out only the parts of your content that are relevant to the patient's question. So this is another tricky part of this whole rag architecture. And the way that works is that you take all your content and you break it into chunks or these systems will break it into chunks. So chunks might be a paragraph of content or a couple paragraphs, a page, something like that. And then those chunks are sent to a large language model, could be the same one or a different one. And they are turned into a vector. And so each each paragraph or each chunk will have a vector which is just a series of numbers. And that series of numbers, you can think of it as the numeric representation of the essence of that paragraph. And what's different about these numbers, they're not random numbers, but paragraphs to talk about a similar topic have close by numbers. They almost have the same vectors. So in addition to the, it's a numericized vers

### Combine Picone with the Chain / Combiner Picone avec la chaîne

In [45]:
from langchain_core.runnables import RunnableParallel, RunnablePassthrough
chain = (
    {"context": pinecone.as_retriever(), "question": RunnablePassthrough()}
    | prompt
    | model
    | parser
)
chain.invoke("How does RAG differ from Fine-Tune?")

'RAG (Retrieval Augmented Generation) differs from Fine-Tune in that RAG architecture involves breaking content into chunks, converting them into vectors, and using large language models to generate responses based on similar topics. On the other hand, Fine-Tune typically involves fine-tuning a pre-trained language model on specific tasks or datasets to improve performance on those tasks.'