# RAG Fusion

Re-implemented from [this GitHub repo](https://github.com/Raudaschl/rag-fusion), all credit to original author

> RAG-Fusion, a search methodology that aims to bridge the gap between traditional search paradigms and the multifaceted dimensions of human queries. Inspired by the capabilities of Retrieval Augmented Generation (RAG), this project goes a step further by employing multiple query generation and Reciprocal Rank Fusion to re-rank search results.

## Setup

For this example, we will use Pinecone and some fake data

**Install Dependencies:**

In [None]:
pip install pinecone-client langchain openai tiktoken langchainhub

**Add your API keys and pinecone environment:**

In [None]:
PINECONE_API_KEY=""
PINECONE_ENVIRONMENT=""
PINECONE_INDEX_NAME=""
OPENAI_API_KEY=""


In [None]:
import pinecone
from langchain.vectorstores import Pinecone
from langchain.embeddings import OpenAIEmbeddings

pinecone.init(api_key=PINECONE_API_KEY,environment=PINECONE_ENVIRONMENT)

  from tqdm.autonotebook import tqdm


In [None]:
all_documents = {
    "doc1": "Climate change and economic impact.",
    "doc2": "Public health concerns due to climate change.",
    "doc3": "Climate change: A social perspective.",
    "doc4": "Technological solutions to climate change.",
    "doc5": "Policy changes needed to combat climate change.",
    "doc6": "Climate change and its impact on biodiversity.",
    "doc7": "Climate change: The science and models.",
    "doc8": "Global warming: A subset of climate change.",
    "doc9": "How climate change affects daily weather.",
    "doc10": "The history of climate change activism."
}

## Define the Query Generator

We will now define a chain to do the query generation

In [None]:
vectorstore = Pinecone.from_texts(list(all_documents.values()), OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY), index_name=PINECONE_INDEX_NAME)

In [None]:
from langchain.chat_models import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.schema.output_parser import StrOutputParser

In [None]:
from langchain import hub

prompt = hub.pull('langchain-ai/rag-fusion-query-generation')

In [None]:
# prompt = ChatPromptTemplate.from_messages([
#     ("system", "You are a helpful assistant that generates multiple search queries based on a single input query."),
#     ("user", "Generate multiple search queries related to: {original_query}"),
#     ("user", "OUTPUT (4 queries):")
# ])

In [None]:
generate_queries = prompt | ChatOpenAI(temperature=0,openai_api_key=OPENAI_API_KEY) | StrOutputParser() | (lambda x: x.split("\n"))

## Define the full chain

We can now put it all together and define the full chain. This chain:
    
    1. Generates a bunch of queries
    2. Looks up each query in the retriever
    3. Joins all the results together using reciprocal rank fusion
    
    
Note that it does NOT do a final generation step

In [None]:
original_query = "impact of climate change"

In [None]:
vectorstore = Pinecone.from_existing_index("rag-fusion", OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY))
retriever = vectorstore.as_retriever()

In [None]:
from langchain.load import dumps, loads
def reciprocal_rank_fusion(results: list[list], k=60):
    fused_scores = {}
    for docs in results:
        # Assumes the docs are returned in sorted order of relevance
        for rank, doc in enumerate(docs):
            doc_str = dumps(doc)
            if doc_str not in fused_scores:
                fused_scores[doc_str] = 0
            previous_score = fused_scores[doc_str]
            fused_scores[doc_str] += 1 / (rank + k)

    reranked_results = [(loads(doc), score) for doc, score in sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)]
    return reranked_results

In [None]:
chain = generate_queries | retriever.map() | reciprocal_rank_fusion

In [None]:
chain.invoke({"original_query": original_query})

[(Document(page_content='Climate change and economic impact.'),
  0.08093517329371375),
 (Document(page_content='Climate change and its impact on biodiversity.'),
  0.06506215742069787),
 (Document(page_content='Public health concerns due to climate change.'),
  0.049189141547682),
 (Document(page_content='Technological solutions to climate change.'),
  0.049189141547682),
 (Document(page_content='Policy changes needed to combat climate change.'),
  0.015873015873015872)]