<a href="https://colab.research.google.com/github/franlin1860/llm/blob/main/relative_score_dist_fusion_v240830.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/docs/examples/retrievers/relative_score_dist_fusion.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Relative Score Fusion and Distribution-Based Score Fusion

In this example, we demonstrate using QueryFusionRetriever with two methods which aim to improve on Reciprocal Rank Fusion:
1. Relative Score Fusion ([Weaviate](https://weaviate.io/blog/hybrid-search-fusion-algorithms))
2. Distribution-Based Score Fusion ([Mazzeschi: blog post](https://medium.com/plain-simple-software/distribution-based-score-fusion-dbsf-a-new-approach-to-vector-search-ranking-f87c37488b18))

# Prevent Disconnection

In [1]:
#@markdown <h3>← 输入了代码后运行以防止断开</h>
import IPython
from google.colab import output

display(IPython.display.Javascript('''
 function ClickConnect(){
   btn = document.querySelector("colab-connect-button")
   if (btn != null){
     console.log("Click colab-connect-button");
     btn.click()
     }

   btn = document.getElementById('ok')
   if (btn != null){
     console.log("Click reconnect");
     btn.click()
     }
  }

setInterval(ClickConnect,60000)
'''))

print("Done.")

<IPython.core.display.Javascript object>

Done.


In [None]:
function ConnectButton(){
    console.log("Connect pushed");
    document.querySelector("#connect").click()
}
setInterval(ConnectButton,60000);

# LLM Setup

In [2]:
import os

os.environ["ZHIPU_API_KEY"] = ""

In [3]:
!pip install llama_index-llms-openai_like
!pip install llama_index-embeddings-huggingface

Collecting llama_index-llms-openai_like
  Downloading llama_index_llms_openai_like-0.2.0-py3-none-any.whl.metadata (753 bytes)
Collecting llama-index-core<0.12.0,>=0.11.0 (from llama_index-llms-openai_like)
  Downloading llama_index_core-0.11.3-py3-none-any.whl.metadata (2.4 kB)
Collecting llama-index-llms-openai<0.3.0,>=0.2.0 (from llama_index-llms-openai_like)
  Downloading llama_index_llms_openai-0.2.0-py3-none-any.whl.metadata (648 bytes)
Collecting dataclasses-json (from llama-index-core<0.12.0,>=0.11.0->llama_index-llms-openai_like)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting deprecated>=1.2.9.3 (from llama-index-core<0.12.0,>=0.11.0->llama_index-llms-openai_like)
  Downloading Deprecated-1.2.14-py2.py3-none-any.whl.metadata (5.4 kB)
Collecting dirtyjson<2.0.0,>=1.0.8 (from llama-index-core<0.12.0,>=0.11.0->llama_index-llms-openai_like)
  Downloading dirtyjson-1.0.8-py3-none-any.whl.metadata (11 kB)
Collecting httpx (from llama-index-core<0.1

In [4]:
import os
import logging
import sys
from llama_index.llms.openai_like import OpenAILike
from llama_index.core import Settings, ServiceContext
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

# 配置日志
logging.basicConfig(stream=sys.stdout, level=logging.INFO)

# 定义DeepSpeed model
llm = OpenAILike(model="glm-4-flash",
                 api_base="https://open.bigmodel.cn/api/paas/v4/",
                 api_key=os.environ["ZHIPU_API_KEY"],
                 temperature=0.6,
                 is_chat_model=True)

# 配置环境
Settings.llm = llm

# 设置嵌入模型
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-zh-v1.5")
Settings.embed_model = embed_model
Settings.chunk_size = 256

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/27.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/776 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/95.8M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/367 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/110k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/439k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [8]:
!pip install llama_index
!pip install llama-index-retrievers-bm25

Collecting llama_index
  Downloading llama_index-0.11.2-py3-none-any.whl.metadata (11 kB)
Collecting llama-index-agent-openai<0.4.0,>=0.3.0 (from llama_index)
  Downloading llama_index_agent_openai-0.3.0-py3-none-any.whl.metadata (728 bytes)
Collecting llama-index-cli<0.4.0,>=0.3.0 (from llama_index)
  Downloading llama_index_cli-0.3.0-py3-none-any.whl.metadata (1.5 kB)
Collecting llama-index-embeddings-openai<0.3.0,>=0.2.0 (from llama_index)
  Downloading llama_index_embeddings_openai-0.2.3-py3-none-any.whl.metadata (635 bytes)
Collecting llama-index-indices-managed-llama-cloud>=0.3.0 (from llama_index)
  Downloading llama_index_indices_managed_llama_cloud-0.3.0-py3-none-any.whl.metadata (3.8 kB)
Collecting llama-index-legacy<0.10.0,>=0.9.48 (from llama_index)
  Downloading llama_index_legacy-0.9.48.post3-py3-none-any.whl.metadata (8.5 kB)
Collecting llama-index-multi-modal-llms-openai<0.3.0,>=0.2.0 (from llama_index)
  Downloading llama_index_multi_modal_llms_openai-0.2.0-py3-none-an

## Setup


If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.

Download Data

In [6]:
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

--2024-08-30 07:54:26--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.109.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 75042 (73K) [text/plain]
Saving to: ‘data/paul_graham/paul_graham_essay.txt’


2024-08-30 07:54:26 (1.92 MB/s) - ‘data/paul_graham/paul_graham_essay.txt’ saved [75042/75042]



In [9]:
from llama_index.core import SimpleDirectoryReader

documents = SimpleDirectoryReader("./data/paul_graham/").load_data()

Next, we will setup a vector index over the documentation.

In [10]:
from llama_index.core import VectorStoreIndex
from llama_index.core.node_parser import SentenceSplitter

splitter = SentenceSplitter(chunk_size=256)

index = VectorStoreIndex.from_documents(
    documents, transformations=[splitter], show_progress=True
)

Parsing nodes:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/423 [00:00<?, ?it/s]

## Create a Hybrid Fusion Retriever using Relative Score Fusion

In this step, we fuse our index with a BM25 based retriever. This will enable us to capture both semantic relations and keywords in our input queries.

Since both of these retrievers calculate a score, we can use the `QueryFusionRetriever` to re-sort our nodes without using an additional models or excessive computation.

The following example uses the [Relative Score Fusion](https://weaviate.io/blog/hybrid-search-fusion-algorithms) algorithm from Weaviate, which applies a MinMax scaler to each result set, then makes a weighted sum. Here, we'll give the vector retriever slightly more weight than BM25 (0.6 vs. 0.4).

First, we create our retrievers. Each will retrieve the top-10 most similar nodes.

In [11]:
from llama_index.retrievers.bm25 import BM25Retriever

vector_retriever = index.as_retriever(similarity_top_k=5)

bm25_retriever = BM25Retriever.from_defaults(
    docstore=index.docstore, similarity_top_k=10
)

DEBUG:bm25s:Building index from IDs objects


Next, we can create our fusion retriever, which well return the top-10 most similar nodes from the 20 returned nodes from the retrievers.

Note that the vector and BM25 retrievers may have returned all the same nodes, only in different orders; in this case, it simply acts as a re-ranker.

In [12]:
from llama_index.core.retrievers import QueryFusionRetriever

retriever = QueryFusionRetriever(
    [vector_retriever, bm25_retriever],
    retriever_weights=[0.6, 0.4],
    similarity_top_k=10,
    num_queries=1,  # set this to 1 to disable query generation
    mode="relative_score",
    use_async=True,
    verbose=True,
)

In [13]:
# apply nested async to run in a notebook
import nest_asyncio

nest_asyncio.apply()

In [14]:
nodes_with_scores = retriever.retrieve(
    "What happened at Interleafe and Viaweb?"
)

In [15]:
for node in nodes_with_scores:
    print(f"Score: {node.score:.2f} - {node.text[:100]}...\n-----")

Score: 0.92 - The following spring, lightning struck. I was invited to give a talk at a Lisp conference, so I gave...
-----
Score: 0.52 - The subset I would build as an open source project was the new Lisp, whose parentheses I now wouldn'...
-----
Score: 0.44 - I started working on the application builder, Dan worked on network infrastructure, and the two unde...
-----
Score: 0.40 - I certainly did. So at the end of the summer Dan and I switched to working on this new dialect of Li...
-----
Score: 0.28 - You wouldn't need versions, or ports, or any of that crap. At Interleaf there had been a whole group...
-----
Score: 0.12 - [9] We'd had a code editor in Viaweb for users to define their own page styles. They didn't know it,...
-----
Score: 0.11 - The point is that it was really cheap, less than half market price.

[8] Most software you can launc...
-----
Score: 0.09 - [7] Technically the apartment wasn't rent-controlled but rent-stabilized, but this is a refinement o...
-----
Score: 0

### Distribution-Based Score Fusion

A variant on Relative Score Fusion, [Distribution-Based Score Fusion](https://medium.com/plain-simple-software/distribution-based-score-fusion-dbsf-a-new-approach-to-vector-search-ranking-f87c37488b18) scales the scores a bit differently - based on the mean and standard deviation of the scores for each result set.

In [16]:
from llama_index.core.retrievers import QueryFusionRetriever

retriever = QueryFusionRetriever(
    [vector_retriever, bm25_retriever],
    retriever_weights=[0.6, 0.4],
    similarity_top_k=10,
    num_queries=1,  # set this to 1 to disable query generation
    mode="dist_based_score",
    use_async=True,
    verbose=True,
)

nodes_with_scores = retriever.retrieve(
    "What happened at Interleafe and Viaweb?"
)

for node in nodes_with_scores:
    print(f"Score: {node.score:.2f} - {node.text[:100]}...\n-----")

Score: 0.72 - The following spring, lightning struck. I was invited to give a talk at a Lisp conference, so I gave...
-----
Score: 0.55 - The subset I would build as an open source project was the new Lisp, whose parentheses I now wouldn'...
-----
Score: 0.38 - I started working on the application builder, Dan worked on network infrastructure, and the two unde...
-----
Score: 0.32 - I certainly did. So at the end of the summer Dan and I switched to working on this new dialect of Li...
-----
Score: 0.26 - You wouldn't need versions, or ports, or any of that crap. At Interleaf there had been a whole group...
-----
Score: 0.20 - So while he agreed that it sounded like a plausible idea, he firmly refused to work on it.

Hmph. We...
-----
Score: 0.19 - By then there was a name for the kind of company Viaweb was, an "application service provider," or A...
-----
Score: 0.17 - [9] We'd had a code editor in Viaweb for users to define their own page styles. They didn't know it,...
-----
Score: 0

## Use in a Query Engine!

Now, we can plug our retriever into a query engine to synthesize natural language responses.

In [17]:
from llama_index.core.query_engine import RetrieverQueryEngine

query_engine = RetrieverQueryEngine.from_args(retriever)

In [18]:
response = query_engine.query("What happened at Interleafe and Viaweb?")

In [19]:
from llama_index.core.response.notebook_utils import display_response

display_response(response)

**`Final Response:`** At Interleaf, the context suggests that there was a significant group focused on release engineering, which was as large as the group that developed the software. This implies that the process of updating and releasing software was complex and resource-intensive. In contrast, Viaweb, the company founded by the author, aimed to simplify software updates by allowing direct updates on the server, reducing the need for versions and ports. Viaweb was initially an application service provider (ASP), which evolved into the concept of software as a service (SaaS). The company was started with seed funding and involved the development of an application builder, network infrastructure, and initial services such as image and phone call hosting.