# FlashRank 重排器
## 概述
> [FlashRank](https://github.com/PrithivirajDamodaran/FlashRank) 是一個超輕量級和超快速的 Python 函式庫，專為在現有搜尋和**檢索**管道中添加重排序功能而設計。它基於最先進（****SoTA****）的 **cross-encoders**。

本筆記介紹在 LangChain 框架中使用 ```FlashRank-Reranker```，展示如何應用重排序技術來改善搜尋或**檢索**結果的品質。它提供了將 ```FlashRank``` 整合到 LangChain 管道的實用程式碼範例和說明，突出其效率和有效性。重點在於利用 ```FlashRank``` 的能力，以簡化且可擴展的方式增強輸出排名。

### 目錄
- [概述](#概述)
- [環境設定](#環境設定)
- [FlashRankRerank](#flashrankrerank)

---

## 我的見解

FlashRank 的主要價值在於「輕量級」和「快速」，它為需要高效重排序的應用提供了理想的解決方案，特別適合資源受限的環境。

## 學習補充重點

**核心優勢：**
- **超輕量級**：最小化記憶體和儲存需求
- **超快速**：優化的推理速度，適合即時應用
- **SoTA 基礎**：基於最先進的 cross-encoder 架構
- **易於整合**：簡化的 API 設計

**技術特點：**
- **高效能設計**：專注於速度和資源使用優化
- **即插即用**：無需複雜配置即可整合
- **跨平台支援**：支援多種部署環境
- **開源免費**：降低使用門檻

**適用場景：**
- **即時搜尋**：需要毫秒級響應的搜尋系統
- **邊緣計算**：資源受限的邊緣設備
- **高吞吐量**：大併發量的線上服務
- **原型開發**：快速驗證重排序效果

**與其他方案比較：**
- **vs HuggingFace CrossEncoder**：更輕量、更快速
- **vs Jina Reranker**：本地部署、無 API 依賴
- **vs 自建方案**：開箱即用、無需訓練

**部署優勢：**
- **低資源消耗**：適合中小型應用
- **快速部署**：減少基礎設施需求
- **成本效益**：無額外 API 費用
- **離線運行**：不依賴外部服務

**最佳實務：**
- 在資源敏感環境優先考慮
- 適合需要快速回應的互動式應用
- 可作為其他重排器的輕量級替代方案
- 適合概念驗證和快速原型開發

## Environment Setup

Set up the environment. You may refer to [Environment Setup](https://wikidocs.net/257836) for more details.

**[Note]**
- ```langchain-opentutorial``` is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials.
- You can checkout the [```langchain-opentutorial```](https://github.com/LangChain-OpenTutorial/langchain-opentutorial-pypi) for more details.

In [None]:
# Set environment variables
from langchain_opentutorial import set_env

set_env(
    {
        "OPENAI_API_KEY": "",
    }
)

You can alternatively set OPENAI_API_KEY in ```.env``` file and load it.

[Note] This is not necessary if you've already set OPENAI_API_KEY in previous steps.

In [1]:
# Configuration file to manage API keys as environment variables
from dotenv import load_dotenv

# Load API key information
load_dotenv(override=True)

True

In [None]:
%%capture --no-stderr
%pip install langchain-opentutorial

In [2]:
# Install required packages
from langchain_opentutorial import package

package.install(
    [
        "flashrank"
    ],
    verbose=False,
    upgrade=False,
)


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.1.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49m/opt/homebrew/opt/python@3.11/bin/python3.11 -m pip install --upgrade pip[0m


In [3]:
def pretty_print_docs(docs):
    print(
        f"\n{'-' * 100}\n".join(
            [
                f"Document {i+1}:\n\n{d.page_content}\nMetadata: {d.metadata}"
                for i, d in enumerate(docs)
            ]
        )
    )

## FlashrankRerank

Load data for a simple example and create a **retriever**.

In [4]:
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings

# Load the documents
documents = TextLoader("./data/appendix-keywords.txt").load()

# Initialized the text splitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)

# Split the documents
texts = text_splitter.split_documents(documents)

# Add a unique ID to each text
for idx, text in enumerate(texts):
    text.metadata["id"] = idx

# Initialize the retriever
retriever = FAISS.from_documents(
    texts, OpenAIEmbeddings()
).as_retriever(search_kwargs={"k": 10})

# query
query = "Tell me about Word2Vec"

# Search for documents
docs = retriever.invoke(query)

# Print the document
pretty_print_docs(docs)

Document 1:

Word2Vec
Definition: Word2Vec is a technique in NLP that maps words to a vector space, representing their semantic relationships based on context.
Example: In a Word2Vec model, "king" and "queen" are represented by vectors located close to each other.
Related Keywords: Natural Language Processing (NLP), Embedding, Semantic Similarity
Metadata: {'source': './data/appendix-keywords.txt', 'id': 12}
----------------------------------------------------------------------------------------------------
Document 2:

Embedding
Definition: Embedding is the process of converting textual data, such as words or sentences, into low-dimensional continuous vectors that computers can process and understand.
Example: The word "apple" can be represented as a vector like [0.65, -0.23, 0.17].
Related Keywords: Natural Language Processing (NLP), Vectorization, Deep Learning
Metadata: {'source': './data/appendix-keywords.txt', 'id': 1}
-------------------------------------------------------------

Now, let's wrap the base **retriever** with a ```ContextualCompressionRetriever``` and use ```FlashrankRerank``` as the compressor.

In [None]:
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import FlashrankRerank
from langchain_openai import ChatOpenAI

# 初始化 LLM（大型語言模型）
# temperature=0 表示生成內容更穩定、可重現性高
llm = ChatOpenAI(temperature=0)

# 初始化 FlashrankRerank 重排器
# 這裡使用 Hugging Face 提供的 cross-encoder 模型
# "ms-marco-MultiBERT-L-12" 適合多語言檢索，能根據查詢與文件的相關性重新排序
compressor = FlashrankRerank(model="ms-marco-MultiBERT-L-12")

# 初始化 ContextualCompressionRetriever
# base_compressor = 負責重排/壓縮的元件
# base_retriever = 原本的檢索器（需事先定義 retriever）
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=retriever
)

# 使用壓縮檢索器進行查詢
# 它會先從 base_retriever 取得候選文件，再用 FlashrankRerank 重新排序並過濾
compressed_docs = compression_retriever.invoke(
    "Tell me about Word2Vec."
)

# 輸出被保留的文件 ID（假設 metadata 中有 "id" 欄位）
print([doc.metadata["id"] for doc in compressed_docs])

INFO:flashrank.Ranker:Downloading ms-marco-MultiBERT-L-12...
ms-marco-MultiBERT-L-12.zip: 100%|██████████| 98.7M/98.7M [00:01<00:00, 54.1MiB/s]
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


[12, 0, 18]


Compare the results after reranker is applied.

In [6]:
# Print the results of document compressions
pretty_print_docs(compressed_docs)

Document 1:

Word2Vec
Definition: Word2Vec is a technique in NLP that maps words to a vector space, representing their semantic relationships based on context.
Example: In a Word2Vec model, "king" and "queen" are represented by vectors located close to each other.
Related Keywords: Natural Language Processing (NLP), Embedding, Semantic Similarity
Metadata: {'id': 12, 'relevance_score': 0.9994176, 'source': './data/appendix-keywords.txt'}
----------------------------------------------------------------------------------------------------
Document 2:

Semantic Search
Definition: Semantic search is a search technique that understands the meaning of a user's query beyond simple keyword matching, returning results that are contextually relevant.
Example: If a user searches for "planets in the solar system," the system provides information about planets like Jupiter and Mars.
Related Keywords: Natural Language Processing (NLP), Search Algorithms, Data Mining
Metadata: {'id': 0, 'relevance_sc