# RAG Fusion
A LangChain JS port of (this Github repo)[https://github.com/Raudaschl/rag-fusion], all credit to the original author.
> RAG-Fusion, a search methodology that aims to bridge the gap between traditional search paradigms and the multifaceted dimensions of human queries. Inspired by the capabilities of Retrieval Augmented Generation (RAG), this project goes a step further by employing multiple query generation and Reciprocal Rank Fusion to re-rank search results.

## Setup
For this example we'll use HSNW as our vector store/retriever, and some fake data.


In [None]:
// Deno.env.set("OPENAI_API_KEY", "");

import { OpenAIEmbeddings } from "npm:langchain@0.0.172/embeddings/openai";
import { HNSWLib } from "npm:langchain@0.0.172/vectorstores/hnswlib";

In [None]:
/** Define our fake data */
const allDocuments = [
  { id: "doc1", text: "Climate change and economic impact." },
  { id: "doc2", text: "Public health concerns due to climate change." },
  { id: "doc3", text: "Climate change: A social perspective." },
  { id: "doc4", text: "Technological solutions to climate change." },
  { id: "doc5", text: "Policy changes needed to combat climate change." },
  { id: "doc6", text: "Climate change and its impact on biodiversity." },
  { id: "doc7", text: "Climate change: The science and models." },
  { id: "doc8", text: "Global warming: A subset of climate change." },
  { id: "doc9", text: "How climate change affects daily weather." },
  { id: "doc10", text: "The history of climate change activism." },
];

In [None]:
/** Initialize our vector store with the fake data and OpenAI embeddings. */
const vectorStore = await HNSWLib.fromTexts(
  allDocuments.map(({ text }) => text),
  allDocuments.map(({ id }) => ({ id })),
  new OpenAIEmbeddings()
);
/** Create the retriever */
const retriever = vectorStore.asRetriever();

## Define the Query Generator

We will now define a chain to do the query generation
This chain (pulls a prompt)[https://smith.langchain.com/hub/langchain-ai/rag-fusion-query-generation] from the (LangChain Hub)[https://smith.langchain.com/hub] that when provided a query, it tasks the model to generate multiple search queries related to the original. In our case, we're asking for 4 additional queries.

In [None]:
import { ChatOpenAI } from "npm:langchain@0.0.172/chat_models/openai";
import { pull } from "npm:langchain@0.0.172/hub";
import { StringOutputParser } from "npm:langchain@0.0.172/schema/output_parser";
import { RunnableLambda, RunnableSequence } from "npm:langchain@0.0.172/schema/runnable";

In [None]:
/** Define the chat model */
const model = new ChatOpenAI({
  temperature: 0,
  openAIApiKey: Deno.env.get("OPENAI_API_KEY"),
});

In [None]:
/** Pull a prompt from the hub */
const prompt = await pull("langchain-ai/rag-fusion-query-generation");
//  const prompt = ChatPromptTemplate.fromMessages([
//    ["system", "You are a helpful assistant that generates multiple search queries based on a single input query."],
//    ["user", "Generate multiple search queries related to: {original_query}"],
//    ["user", "OUTPUT (4 queries):"],
//  ]);

In [None]:
/** Define our chain for generating queries  */
const generateQueries = RunnableSequence.from([
  prompt,
  model,
  new StringOutputParser(),
  RunnableLambda.from((output) => output.split("\n")),
]);

## Construct the Reciprocal Rank Fusion function
This function is used for combining the results of multiple search queries to produce a single ranked list of results. This is a common technique in information retrieval known as data fusion or result merging.

In [None]:
import { Document } from "npm:langchain@0.0.172/document";

In [None]:
const reciprocalRankFusion = (results: string[][], k = 60) => {
  const fusedScores: Record<string, number> = {};
  for (const result of results) {
    // Assumes the docs are returned in sorted order of relevance
    result.forEach((item, index) => {
      if (!(item in fusedScores)) {
        fusedScores[item] = 0;
      }
      fusedScores[item] += 1 / (index + k);
    });
  }

  const rerankedResults = Object.entries(fusedScores)
    .sort((a, b) => b[1] - a[1])
    .map(
      ([doc, score]) => new Document({ pageContent: doc, metadata: { score } })
    );
  return rerankedResults;
};

## Define the full chain
Now we can put all our pieces together in one chain.
The chain preforms the following steps:
1. Generate 4 search queries based on the original query
2. Perform lookups with the retriever for each generated query
3. Pass the results of the vector store lookup to the `reciprocalRankFusion` function

In [None]:
const chain = RunnableSequence.from([
  generateQueries,
  retriever.map(),
  reciprocalRankFusion,
]);

In [None]:
const result = await chain.invoke({
  original_query: originalQuery,
});

console.log(result);

The result of the chain is the following:

In [None]:
[
  Document {
    pageContent: '{"pageContent":"Climate change and economic impact.","metadata":{"id":"doc1"}}',
    metadata: { score: 0.06558258417063283 }
  },
  Document {
    pageContent: '{"pageContent":"Climate change and its impact on biodiversity.","metadata":{"id":"doc6"}}',
    metadata: { score: 0.04866871479774705 }
  },
  Document {
    pageContent: '{"pageContent":"How climate change affects daily weather.","metadata":{"id":"doc9"}}',
    metadata: { score: 0.048131080389144903 }
  },
  Document {
    pageContent: '{"pageContent":"Public health concerns due to climate change.","metadata":{"id":"doc2"}}',
    metadata: { score: 0.03306010928961749 }
  },
  Document {
    pageContent: '{"pageContent":"Climate change: A social perspective.","metadata":{"id":"doc3"}}',
    metadata: { score: 0.031746031746031744 }
  },
  Document {
    pageContent: '{"pageContent":"Technological solutions to climate change.","metadata":{"id":"doc4"}}',
    metadata: { score: 0.016666666666666666 }
  },
  Document {
    pageContent: '{"pageContent":"Policy changes needed to combat climate change.","metadata":{"id":"doc5"}}',
    metadata: { score: 0.01639344262295082 }
  }
]