## Técnicas de RAG

### Configuración

In [1]:
import os
from dotenv import load_dotenv, find_dotenv

load_dotenv(find_dotenv())

os.environ['OPENAI_API_KEY']=os.getenv("OPENAI_API_KEY")
api_key = os.getenv("OPENAI_API_KEY")

### Rag Simple (LlamaIndex)

In [2]:
from llama_index.readers.wikipedia import WikipediaReader

reader = WikipediaReader()
pages = ['Jenna Ortega', 'Beetlejuice Beetlejuice', 'Winona Ryder']
documents_wiki = reader.load_data(pages=pages, auto_suggest=False, redirect = False)

In [3]:
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import VectorStoreIndex

# Inicializa modelos
gpt= OpenAI( )
embed_model = OpenAIEmbedding()

# Transforma los chunks en vectores
index = VectorStoreIndex.from_documents(documents_wiki)
# Obtiene los embeddings similares
retriever = index.as_retriever(similarity_top_k=3)

In [4]:
from llama_index.core.prompts import PromptTemplate

# Construye una plantilla de prompt para solo proporcionar respuestas basadas en los documentos cargados 
template = (
"We have provided context information below. \n"
    "---------------------\n"
    "{context_str}"
    "\n---------------------\n"
    "Given this information, please answer the question: {query_str}\n"
    "Don't give an answer unless it is supported by the context above.\n"
)

qa_template = PromptTemplate(template)

In [None]:
question = "What is the name of the movie in which Winona Ryder acted alongside Jenna Ortega?"

# Recupera el contexto para el modelo
contexts = retriever.retrieve(question)
context_list = [n.get_content() for n in contexts]
prompt = qa_template.format(context_str="\n\n".join(context_list), query_str=question)

# Genera la respuesta
response = gpt.complete(prompt)
print(str(response))

The name of the movie in which Winona Ryder acted alongside Jenna Ortega is "Beetlejuice Beetlejuice" (2024).


In [8]:
def rag_function(question):
    
    contexts = retriever.retrieve(question)
    context_list = [n.get_content() for n in contexts]
    prompt = qa_template.format(context_str="\n\n".join(context_list), query_str=question)

    response = gpt.complete(prompt)
    print(str(response))

In [29]:
question = "What is the plot of the movie in which Winona Ryder acted alongside Jenna Ortega?"

rag_function(question)

The plot of the movie in which Winona Ryder acted alongside Jenna Ortega is not provided in the context above.


In [10]:
question = "Compare the families of Winona Ryder and Jenna Ortega"

rag_function(question)

Based on the context provided, Winona Ryder's family background is described in detail, including her parents, siblings, and family friends. It is mentioned that her father is of Ashkenazi Jewish descent and her mother is a Buddhist. Winona has a younger brother, Urie, and two older half-siblings from her mother's prior marriage. Her family friends include Timothy Leary, Allen Ginsberg, Lawrence Ferlinghetti, and Philip K. Dick. Additionally, it is noted that Winona's family lived in a commune in California when she was a child.

On the other hand, there is no information provided about Jenna Ortega's family background in the context above. Therefore, it is not possible to compare the families of Winona Ryder and Jenna Ortega based on the information given.


### RAG Simple con archivo .CSV (LangChain)

In [11]:
! python.exe -m pip install --upgrade pip



In [12]:
from langchain_community.document_loaders.csv_loader import CSVLoader
from pathlib import Path
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

llm = ChatOpenAI(model="gpt-3.5-turbo-0125")

In [13]:
import pandas as pd

file_path = ('../data/customers.csv')
data = pd.read_csv(file_path)

data.head()

Unnamed: 0,Customer Id,First Name,Last Name,Company,City,Country,Phone 1,Phone 2,Email,Subscription Date,Website
0,45dd324a-fb0f-49,Jason,Archer,Thomas-Carlson,New Christopherside,Somalia,001-493-077-0319x60273,001-424-199-8089x6114,kathrynfischer@yahoo.com,2023-07-04,https://www.keith-arroyo.biz/
1,d66b5a9f-9f1a-45,Carrie,Mckinney,Clark-Delacruz,Darleneland,Antigua and Barbuda,001-872-481-3319,361.647.1195x4993,lortega@mckay.com,2022-07-17,http://young.com/
2,557fa3f3-81c2-46,Lisa,Diaz,Atkinson-Williams,Hillborough,Gabon,423.071.0191x24285,224-812-6105x726,pmendoza@gmail.com,2024-04-03,http://www.keith.com/
3,0858fcf6-ae5f-4c,Tanner,Wood,Delacruz-Holt,Murphyhaven,French Polynesia,992-273-1012x1982,615.487.1127x74443,ryantucker@gmail.com,2024-06-19,https://www.wood-coleman.net/
4,b4d17245-ff34-45,Tamara,Carey,Yoder Inc,East Justinshire,Qatar,540-200-5160x938,001-756-500-8500x189,markmolina@ward.com,2022-06-11,https://dickson.info/


In [14]:
loader = CSVLoader(file_path=file_path)
docs = loader.load_and_split()

In [22]:
import faiss
from langchain_community.docstore.in_memory import InMemoryDocstore
from langchain_community.vectorstores import FAISS

embeddings = OpenAIEmbeddings()
index = faiss.IndexFlatL2(len(OpenAIEmbeddings().embed_query(" ")))
vector_store = FAISS(
    embedding_function=OpenAIEmbeddings(),
    index=index,
    docstore=InMemoryDocstore(),
    index_to_docstore_id={}
)

NameError: name 'OpenAIEmbeddings' is not defined

In [None]:
vector_store.add_documents(documents=docs)

In [None]:
! pip show langchain-core

In [21]:
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain

retriever = vector_store.as_retriever()

system_prompt = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Use three sentences maximum and keep the "
    "answer concise."
    "\n\n"
    "{context}"
)

prompt = ChatPromptTemplate.from_messages([
    ("system", system_prompt),
    ("human", "{input}"),
    
])

# Crea una cadena de pregunta y respuesta
question_answer_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)

NameError: name 'vector_store' is not defined

### Re-ranking

1. Modelos de Re-Ranking como alternativa a los modelos de embeddings.
2. LLMs como re-rankers.

In [5]:
def display_source_node(node, source_length=500):
    """
    Función para imprimir la información de un nodo de manera legible.
    
    Args:
    - node (dict o Node): El nodo a mostrar, que debe contener al menos las propiedades 'node_id' y 'text'.
    - source_length (int): El número de caracteres del texto del nodo que se mostrarán.
    """
    print(f"Node ID: {node.node_id}")
    print(f"Similarity: {node.score}")
    print(f"Text: {node.text[:source_length]}...")  # Muestra solo los primeros 'source_length' caracteres


In [6]:
# Recuperar los tres primeros fragmentos para la segunda consulta
retriever = index.as_retriever(similarity_top_k=3)
query = "Compare the families of Winona Ryder and Jenna Ortega"
nodes = retriever.retrieve(query)
# Imprimir los fragmentos
for node in nodes:
    print('----------------------------------------------------')
    display_source_node(node, source_length=500)


----------------------------------------------------
Node ID: 85d9eb22-5112-4ce4-b6fc-53ae7d064e01
Similarity: 0.8308654058813671
Text: Winona Laura Horowitz (born (1971-10-29)October 29, 1971), known professionally as Winona Ryder, is an American actress. Having come to attention playing quirky characters in the late 1980s, she achieved success with her more dramatic performances in the 1990s. Ryder's many accolades include a Golden Globe, as well as nominations for two Academy Awards, a BAFTA Award, three Screen Actors Guild Awards and a Grammy Award.
Following her film debut in Lucas (1986), Ryder rose to prominence when she ...
----------------------------------------------------
Node ID: ea42c64a-f3f8-4621-b5e2-efb1f33c815f
Similarity: 0.825133846283455
Text: == Early life ==
Winona Laura Horowitz was born in Winona County, Minnesota, to Cynthia Palmer (née Istas) and Michael D. Horowitz. Winona's mother is an author, video producer, and editor, and her father is an author, editor

##### Re-Ranking con FlagEmbeddingReranker

In [7]:
load_dotenv(find_dotenv())

os.environ['HUGGING_FACE_TOKEN']=os.getenv("HUGGING_FACE_TOKEN")

In [None]:
%pip install llama-index-postprocessor-flag-embedding-reranker
%pip install git+https://github.com/FlagOpen/FlagEmbedding.git


In [8]:
# Importar paquetes
from llama_index.postprocessor.flag_embedding_reranker import FlagEmbeddingReranker
from llama_index.core.schema import QueryBundle

# Re-Ordenar los fragmentos basados en el modelo bge-reranker-base
reranker = FlagEmbeddingReranker(
    top_n=3,
    model="BAAI/bge-reranker-base",
)

# Devolver los fragmentos actualizados
query_bundle = QueryBundle(query_str=query)
ranked_nodes = reranker._postprocess_nodes(nodes, query_bundle=query_bundle)

for ranked_node in ranked_nodes:
    print('----------------------------------------------------')
    display_source_node(ranked_node, source_length=500)


pre tokenize:   0%|          | 0/1 [00:00<?, ?it/s]You're using a XLMRobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
pre tokenize: 100%|██████████| 1/1 [00:00<00:00, 89.37it/s]


----------------------------------------------------
Node ID: 56faa4eb-c2b3-4167-92d6-76c4646705f7
Similarity: 4.252162933349609
Text: Jenna Marie Ortega (born September 27, 2002) is an American actress. She began her career as a child and received recognition for her role as a younger version of Jane in The CW comedy-drama series Jane the Virgin (2014–2019). She then won an Imagen Award for her leading role as Harley Diaz in the Disney Channel series Stuck in the Middle (2016–2018). She played Ellie Alves in the thriller series You (2019) and starred in the family film Yes Day (2021), both for Netflix.
Ortega received praise f...
----------------------------------------------------
Node ID: 85d9eb22-5112-4ce4-b6fc-53ae7d064e01
Similarity: 2.778338670730591
Text: Winona Laura Horowitz (born (1971-10-29)October 29, 1971), known professionally as Winona Ryder, is an American actress. Having come to attention playing quirky characters in the late 1980s, she achieved success with her more 

In [9]:
# Inicializar el motor de consulta con Re-Ranking
query_engine = index.as_query_engine(similarity_top_k=3, node_postprocessors=[reranker])

In [10]:
# Imprimir la respuesta del modelo
response = query_engine.query("Compare the families of Jenna Ortega and Winona Ryder")

pre tokenize: 100%|██████████| 1/1 [00:00<00:00, 250.00it/s]


In [11]:
print(response)

Jenna Ortega comes from a family with a father of Mexican descent and a mother of Mexican and Puerto Rican descent. She has four siblings and has described her childhood as "loud and extroverted." On the other hand, Winona Ryder's family background includes Ashkenazi Jewish ancestry from Ukraine and Romania. She has a younger brother and two older half-siblings from her mother's previous marriage. Ryder's family friends included notable figures like Timothy Leary, Allen Ginsberg, and Philip K. Dick.


#### Re-Ranking con RankGPTRerank

In [None]:
%pip install llama-index-postprocessor-rankgpt-rerank

In [16]:
# Importar paquetes
from llama_index.postprocessor.rankgpt_rerank import RankGPTRerank
from llama_index.core.schema import QueryBundle

# Re-Ordenar los tres primeros fragmentos basados en el modelo gpt-3.5-turbo-0125
reranker = RankGPTRerank(
    top_n=3,
    llm=OpenAI(model="gpt-4o"),
)

# Mostrar los tres primeros fragmentos basados en RankGPT
query_bundle = QueryBundle(query_str=query)
ranked_nodes = reranker._postprocess_nodes(nodes, query_bundle=query_bundle)

for ranked_node in ranked_nodes:
    print('----------------------------------------------------')
    display_source_node(ranked_node, source_length=500)


----------------------------------------------------
Node ID: ea42c64a-f3f8-4621-b5e2-efb1f33c815f
Similarity: -0.320730596780777
Text: == Early life ==
Winona Laura Horowitz was born in Winona County, Minnesota, to Cynthia Palmer (née Istas) and Michael D. Horowitz. Winona's mother is an author, video producer, and editor, and her father is an author, editor, publisher, and antiquarian bookseller. He also worked as an archivist for psychologist Timothy Leary (Ryder's godfather). Winona's father's family is of Ashkenazi Jewish descent and hails from Ukraine and Romania. Growing up, Winona visited her paternal grandparents in Bro...
----------------------------------------------------
Node ID: 56faa4eb-c2b3-4167-92d6-76c4646705f7
Similarity: 4.252162933349609
Text: Jenna Marie Ortega (born September 27, 2002) is an American actress. She began her career as a child and received recognition for her role as a younger version of Jane in The CW comedy-drama series Jane the Virgin (2014–2019)

In [13]:
# Inicializar el motor de consulta con Re-Ranking
query_engine = index.as_query_engine(similarity_top_k=3, node_postprocessors=[reranker])

In [17]:
# Imprimir la respuesta del modelo
response = query_engine.query("Compare the families of Jenna Ortega and Winona Ryder")
print (response)

Jenna Ortega comes from a large family with four siblings, born to a father of Mexican descent who works in law enforcement and a mother of Mexican and Puerto Rican descent who is a nurse. On the other hand, Winona Ryder was born to parents with diverse backgrounds - her mother is an author, video producer, and editor, while her father is an author, editor, publisher, and antiquarian bookseller. Ryder's family friends included notable figures like Timothy Leary, Allen Ginsberg, and Philip K. Dick. Additionally, Ryder has half-siblings from her mother's prior marriage, and her family has Jewish roots from Ukraine and Romania.


### Update Chunk Size