# RAG

A continuación se muestra una descripción general de alto nivel del sistema que queremos construir:


<img src='images/img_1.png' width="800">

# PARTE I

Empecemos cargando las variables de entorno que necesitamos utilizar.

## Setting up the model
Definamos el modelo LLM que utilizaremos como parte del flujo de trabajo.

In [4]:
import os
from dotenv import load_dotenv

load_dotenv()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

# This is the YouTube video we're going to use.
YOUTUBE_VIDEO = "https://www.youtube.com/watch?v=jymzFKtMPac"

In [5]:
from langchain_openai.chat_models import ChatOpenAI

model = ChatOpenAI(openai_api_key=OPENAI_API_KEY, model="gpt-3.5-turbo")

Probamos el modelo haciendo una pregunta sencilla

In [6]:
pregunta_sencilla = "¿Cuál es la capital de Túnez?"
respuesta = model.invoke(pregunta_sencilla)

# Imprimimos el contenido de la respuesta
print(respuesta.content)

La capital de Túnez es Túnez.


El resultado del modelo es una instancia de `AIMessage` que contiene la respuesta. Podemos extraer esta respuesta encadenando el modelo con un analizador de salida [outputParser](https://python.langchain.com/docs/modules/model_io/output_parsers/).

Así es como se ve el encadenamiento del modelo con un analizador de salida:

<img src='images/chain1.png' width="1200">

Para este ejemplo, utilizaremos un `StrOutputParser` simple para extraer la respuesta como una cadena.

In [7]:
from langchain_core.output_parsers import StrOutputParser

#(convertir AIMessage a string)
parser = StrOutputParser()

# Creamos la cadena simple combinando el modelo y el parser
chain = model | parser

# Probamos la cadena con la misma pregunta sencilla
pregunta_sencilla = "¿Cuál es la capital de Alemaña?"
respuesta_parseada = chain.invoke(pregunta_sencilla)

# Imprimimos la respuesta (ahora debería ser una cadena directamente)
print(respuesta_parseada)
print(type(respuesta_parseada)) # Para verificar que es un string

La capital de Alemania es Berlín.
<class 'str'>


## Presentamos las plantillas de preguntas

Queremos contextualizar el modelo y la pregunta. [Prompt templates](https://python.langchain.com/docs/modules/model_io/prompts/quick_start) Son una forma sencilla de definir y reutilizar indicaciones.

In [8]:
from langchain.prompts import ChatPromptTemplate

template = """
Responda la pregunta según el contexto descrito a continuación. Si no puede responder, responda "No lo sé".

Contexto: {contexto}

Pregunta: {pregunta}
"""


Ahora podemos encadenar el mensaje con el modelo y el analizador de salida.

<img src='images/chain2.png' width="1200">

In [9]:
from langchain_core.runnables import  RunnablePassthrough

chain = (
         {
            "contexto": RunnablePassthrough(),
            "pregunta": RunnablePassthrough()
        }
        | ChatPromptTemplate.from_template(template)
        | model
        | parser
    )

## Combinación de cadenas

Podemos combinar diferentes cadenas para crear flujos de trabajo más complejos. Por ejemplo, creemos una segunda cadena que traduzca la respuesta de la primera a otro idioma.

Comencemos creando una nueva plantilla de solicitud para la cadena de traducción:

In [10]:
translation_prompt = ChatPromptTemplate.from_template(
    "Traduce {answer} al {language}"
)

Ahora podemos crear una nueva cadena de traducción que combine el resultado de la primera cadena con la solicitud de traducción.

Así es como se ve el nuevo flujo de trabajo:

<img src='images/chain3.png' width="1200">

In [11]:
from operator import itemgetter

# Cadena de respuesta usando contexto (primera cadena, ya definida anteriormente)
qa_chain = (
    {
        "contexto": RunnablePassthrough(),
        "pregunta": RunnablePassthrough()
    }
    | ChatPromptTemplate.from_template(template)
    | model
    | parser
)

# Crear la cadena de traducción
translation_chain = (
    {
        "answer": RunnablePassthrough(),
        "language": lambda _: "Castellano"  # valor predeterminado
    }
    | translation_prompt
    | model
    | parser
)

# Combinamos ambas cadenas: primero obtenemos la respuesta, luego la traducimos
combined_chain = qa_chain | (lambda answer: translation_chain.invoke({"answer": answer, "language": "Castellano"}))

# Probemos la cadena combinada
respuesta_traducida = combined_chain.invoke({
    "contexto": "París es la capital de Francia y una de las ciudades más visitadas del mundo.",
    "pregunta": "¿Cuál es la capital de Francia?"
})

print("Respuesta traducida:", respuesta_traducida)

Respuesta traducida: {'respuesta': 'La capital de Francia es París.', 'idioma': 'Castellano'}


# PARTE II

## Transcripcion de video de YouTube

El contexto que queremos enviar al modelo proviene de un video de YouTube. Descargamos el video y transcribámoslo con [OpenAI's Whisper](https://openai.com/research/whisper).

In [12]:
import tempfile
import whisper
import os
import yt_dlp  # Using yt-dlp instead of pytube

if not os.path.exists("transcription.txt"):
    print(f"Downloading video: {YOUTUBE_VIDEO}")
    
    # Create a temporary directory for the download
    with tempfile.TemporaryDirectory() as tmpdir:
        # yt-dlp options for downloading audio only
        ydl_opts = {
            'format': 'bestaudio/best',
            'outtmpl': os.path.join(tmpdir, 'audio.%(ext)s'),
            'postprocessors': [{
                'key': 'FFmpegExtractAudio',
                'preferredcodec': 'mp3',
                'preferredquality': '192',
            }],
            'quiet': False
        }
        
        # Download the audio
        with yt_dlp.YoutubeDL(ydl_opts) as ydl:
            ydl.extract_info(YOUTUBE_VIDEO, download=True)
            audio_file = os.path.join(tmpdir, 'audio.mp3')
        
        print(f"Transcribing audio file: {audio_file}")
        
        # Load Whisper model
        whisper_model = whisper.load_model("base")
        
        # Transcribe the audio
        transcription = whisper_model.transcribe(audio_file, fp16=False)["text"].strip()
        
        # Save the transcription to a file
        with open("transcription.txt", "w") as file:
            file.write(transcription)
        
        print("Transcription completed and saved to 'transcription.txt'")
else:
    print("Transcription file already exists!")

Downloading video: https://www.youtube.com/watch?v=jymzFKtMPac
[youtube] Extracting URL: https://www.youtube.com/watch?v=jymzFKtMPac
[youtube] jymzFKtMPac: Downloading webpage
[youtube] jymzFKtMPac: Downloading tv client config
[youtube] jymzFKtMPac: Downloading player 4fcd6e4a
[youtube] jymzFKtMPac: Downloading tv player API JSON
[youtube] jymzFKtMPac: Downloading ios player API JSON
[youtube] jymzFKtMPac: Downloading m3u8 information
[info] jymzFKtMPac: Downloading 1 format(s): 251
[download] Destination: /var/folders/fl/0425_ksd7gl2jnqxhr8qwwhw0000gn/T/tmpa1w3hh9o/audio.webm
[download] 100% of   15.70MiB in 00:00:00 at 23.79MiB/s    
[ExtractAudio] Destination: /var/folders/fl/0425_ksd7gl2jnqxhr8qwwhw0000gn/T/tmpa1w3hh9o/audio.mp3
Deleting original file /var/folders/fl/0425_ksd7gl2jnqxhr8qwwhw0000gn/T/tmpa1w3hh9o/audio.webm (pass -k to keep)
Transcribing audio file: /var/folders/fl/0425_ksd7gl2jnqxhr8qwwhw0000gn/T/tmpa1w3hh9o/audio.mp3
Transcription completed and saved to 'transcrip

Vamos a leer la transcripción y mostrar los primeros caracteres para asegurarnos de que todo funciona como se espera.

In [13]:
with open("transcription.txt") as file:
    transcription = file.read()

transcription[:100]

"What's really happening in Antarctica? Why was the entire continent suddenly locked down after a US "

## Usando la transcripción completa como contexto

Si intentamos invocar la cadena usando la transcripción como contexto, el modelo devolverá un error porque el contexto es demasiado largo.

Los modelos de lenguaje grandes admiten tamaños de contexto limitados. El vídeo que estamos usando es demasiado largo para que el modelo lo pueda procesar, por lo que necesitamos buscar una solución diferente.

In [14]:
# Primero, aseguramos que tengamos la transcripción cargada
with open("transcription.txt") as file:
    transcription = file.read()

# Intentamos usar la transcripción como contexto y guardamos la respuesta
try:
    respuesta = chain.invoke({
        "contexto": transcription,
        "pregunta": "¿Es una buena idea leer artículos?"
    })
    print("Respuesta:", respuesta)
except Exception as e:
    print("Error:", e)

Respuesta: No lo sé.


## División de la transcripción

Dado que no podemos usar la transcripción completa como contexto para el modelo, una posible solución es dividir la transcripción en fragmentos más pequeños. Así, podemos invocar el modelo utilizando solo los fragmentos relevantes para responder a una pregunta específica:

<img src='images/system2.png' width="1200">

Comencemos cargando la transcripción en la memoria:

In [15]:
from langchain_community.document_loaders import TextLoader

# Load the transcription file
loader = TextLoader("transcription.txt")
documents = loader.load()

# Print basic info about the loaded document
print(f"Loaded {len(documents)} document")
print(f"Text length: {len(documents[0].page_content)} characters")



Loaded 1 document
Text length: 17825 characters


Hay muchas maneras de dividir un documento. En este ejemplo, usaremos un divisor simple que divide el documento en fragmentos de tamaño fijo. Consulta [Divisores de texto](https://python.langchain.com/docs/modules/data_connection/document_transformers/) para obtener más información sobre los diferentes enfoques para dividir documentos.

A modo de ejemplo, dividiremos la transcripción en fragmentos de 100 caracteres con una superposición de 20 caracteres y mostraremos los primeros fragmentos:

In [16]:
from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

# First load the document
loader = TextLoader("transcription.txt")
documents = loader.load()

# Create a text splitter with chunk size of 100 and overlap of 20 characters
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=100,
    chunk_overlap=20,
    length_function=len,
)

# Split the document into chunks
chunks = text_splitter.split_documents(documents)

# Display information about the chunks
print(f"Split the document into {len(chunks)} chunks")

# Show the first 3 chunks as an example
print("\nFirst three chunks:")
for i, chunk in enumerate(chunks[:3]):
    print(f"\nChunk {i+1}:")
    print(f"Length: {len(chunk.page_content)} characters")
    print(f"Content: {chunk.page_content}")

Split the document into 223 chunks

First three chunks:

Chunk 1:
Length: 99 characters
Content: What's really happening in Antarctica? Why was the entire continent suddenly locked down after a US

Chunk 2:
Length: 95 characters
Content: down after a US drone captured something no one was ever supposed to see? Stick around. Because

Chunk 3:
Length: 98 characters
Content: around. Because what we're about to uncover might change everything you thought you knew about the


Para nuestra aplicación específica, utilizaremos 1000 caracteres en su lugar:

In [17]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Create a text splitter with chunk size of 1000 and overlap of 200 characters
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    length_function=len,
)

# Split the document into chunks
chunks = text_splitter.split_documents(documents)

# Display information about the chunks
print(f"Split the document into {len(chunks)} chunks")

# Show the first chunk as an example
if chunks:
    print("\nFirst chunk:")
    print(f"Length: {len(chunks[0].page_content)} characters")
    print(f"Content: {chunks[0].page_content}")

Split the document into 23 chunks

First chunk:
Length: 999 characters
Content: What's really happening in Antarctica? Why was the entire continent suddenly locked down after a US drone captured something no one was ever supposed to see? Stick around. Because what we're about to uncover might change everything you thought you knew about the icy depths of our planet. In January of 2018, an email from a retired naval flight engineer who went by the name of Brian revealed that a US drone had flown over the frozen wasteland of Antarctica and captured something that no one was ever supposed to see. A massive, glowing hole in the ice perfectly round in shape exposed something mysterious beneath the surface. Within hours the entire continent was placed under an immediate lockdown and those who dared to talk about it vanished without a trace. Brian had spent years flying over Antarctica, logging thousands of hours in the sky. But there was one flight in the late 1980s that stood out among all 

# PARTE III

## Configuración de un Vector Store

Necesitamos una forma eficiente de almacenar fragmentos de documentos, sus Embeddings y realizar búsquedas de similitud a gran escala. Para ello, usaremos un Vector Store.

Un Vector Store es una base de datos de Embeddings especializada en búsquedas rápidas de similitud.


<img src='images/chain4.png' width="1200">

Necesitamos configurar un retriever (https://python.langchain.com/docs/how_to/#retrievers). Este retriever realizará una búsqueda de similitud en el almacén vectorial y devolverá los documentos más similares al siguiente paso de la cadena.

## Configurar Pinecone

Para este ejemplo, usaremos [Pinecone](https://www.pinecone.io/).

<img src="images/pinecone.png" width="800">

El primer paso es crear una cuenta de Pinecone, configurar un índice, obtener una clave API y configurarla como variable de entorno `PINECONE_API_KEY`.

In [18]:
from langchain_pinecone import PineconeVectorStore
from langchain_community.embeddings import HuggingFaceEmbeddings
import os
from pinecone import Pinecone, ServerlessSpec

# Initialize Pinecone
PINECONE_API_KEY = os.getenv("PINECONE_API_KEY")

# Create Pinecone client
pc = Pinecone(api_key=PINECONE_API_KEY)

# Create the index name 
index_name = "rag-transcription"

# Si el índice existe, lo eliminamos primero para recrearlo con la dimensión correcta
if index_name in pc.list_indexes().names():
    print(f"Eliminating existing index '{index_name}' to recreate with correct dimensions")
    pc.delete_index(index_name)
    # Esperar un momento para que la eliminación se complete
    import time
    time.sleep(5)

# Create the index with the correct dimensions
pc.create_index(
    name=index_name,
    dimension=384,  # HuggingFace 'all-MiniLM-L6-v2' embeddings have 384 dimensions
    metric="cosine",
    spec=ServerlessSpec(
        cloud="aws",
        region="us-east-1"
    )
)
print(f"Created new index '{index_name}' with dimension 384")

# Initialize HuggingFace embeddings model (no API key needed, runs locally)
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

# Create the vector store and load the documents
vectorstore = PineconeVectorStore.from_documents(
    documents=chunks,  # Use your previously created chunks
    embedding=embeddings,
    index_name=index_name
)

print(f"Successfully loaded {len(chunks)} chunks into Pinecone index '{index_name}'")

  from tqdm.autonotebook import tqdm


Eliminating existing index 'rag-transcription' to recreate with correct dimensions
Created new index 'rag-transcription' with dimension 384


  embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")


Successfully loaded 23 chunks into Pinecone index 'rag-transcription'


Ahora ejecutemos una búsqueda de similitud en pinecone para asegurarnos de que todo funciona:

In [19]:
# Test similarity search
query = "What are the main topics discussed in the video?"
docs = vectorstore.similarity_search(query, k=3)

print("\nResults for query:", query)
print("-" * 50)
for i, doc in enumerate(docs, 1):
    print(f"\nResult {i}:")
    print(doc.page_content)
    print("-" * 50)


Results for query: What are the main topics discussed in the video?
--------------------------------------------------

Result 1:
forever. Be from Brian's encounter with the glowing hole in the ice to the lost testimonies of Admiral Bird and the frightening reports of military suppression, it's clear that there's far more beneath the surface of this icy continent than meets the eye. What do you think is really going on in Antarctica? What do you believe is being concealed beneath the ice and why? Could it be something extraterrestrial, an ancient civilization, or a government cover-up that stretches beyond imagination? If you enjoyed today's video, please give it a like and don't forget to subscribe to the channel. Your support means a lot to us. Click on the video that appears on your screen right now. I'm sure you'll love this content. Comment below about the next topic you'd like to see featured on our channel. Thank you so much for watching and we'll see you in the next video. Tha

Configuremos la nueva cadena usando Pinecone como almacén vectorial:

In [20]:
from langchain_pinecone import PineconeVectorStore
from langchain_core.runnables import RunnableParallel, RunnablePassthrough
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

# Create the prompt template
template = """Answer the following question based on the provided context:

Context:
{context}

Question:
{question}

Answer the question based on the context provided. If you cannot find the answer in the context, say so."""

prompt = ChatPromptTemplate.from_template(template)

# Create the RAG chain
model = ChatOpenAI(temperature=0, model="gpt-3.5-turbo")

# Create the chain components
chain = (
    RunnableParallel(
        context=lambda x: vectorstore.similarity_search(x["question"], k=3),
        question=RunnablePassthrough()
    )
    | prompt
    | model
    | StrOutputParser()
)

# Test the chain
question = "What are the key points discussed in the video?"
response = chain.invoke({"question": question})
print("\nQuestion:", question)
print("\nAnswer:", response)


Question: What are the key points discussed in the video?

Answer: The key points discussed in the video include encounters with a glowing hole in the ice, lost testimonies of Admiral Bird, reports of military suppression, mysterious dark shapes moving inside the glowing hole, malfunctioning of aircraft systems near the hole, military response to the coded message about the hole, deployment of an entire operation to the site, questioning of Brian by a secret government agency, and the deepening secrecy surrounding the bizarre phenomena in Antarctica.
