# Adaptive Retrieval for Enhanced RAG Systems

In this notebook, I implement an Adaptive Retrieval system that dynamically selects the most appropriate retrieval strategy based on the type of query. This approach significantly enhances our RAG system's ability to provide accurate and relevant responses across a diverse range of questions.

Different questions demand different retrieval strategies. Our system:

1. Classifies the query type (Factual, Analytical, Opinion, or Contextual)
2. Selects the appropriate retrieval strategy
3. Executes specialized retrieval techniques
4. Generates a tailored response

### 增强型RAG系统的自适应检索  

在本笔记本中，我实现了一个自适应检索系统，该系统可基于查询类型动态选择最合适的检索策略。这种方法显著提升了RAG系统在处理各类问题时提供准确且相关回答的能力。  

不同类型的问题需要不同的检索策略。我们的系统能够：  
1. 对查询类型进行分类（事实型、分析型、观点型或上下文型）  
2. 选择合适的检索策略  
3. 执行专门的检索技术  
4. 生成定制化的回答  


### 关键概念解析  
- **自适应检索（Adaptive Retrieval）**：根据查询特征动态调整检索策略，而非采用固定方法。  
- **查询分类（Query Classification）**：将用户问题划分为不同类别，例如：  
  - **事实型（Factual）**：寻求具体事实答案（如“谁发明了蒸汽机？”）。  
  - **分析型（Analytical）**：需要因果分析或逻辑推理（如“为什么气候变化会影响农业？”）。  
  - **观点型（Opinion）**：涉及主观评价或建议（如“如何评价某部电影？”）。  
  - **上下文型（Contextual）**：依赖对话历史或特定场景（如“基于之前的讨论，下一步该怎么做？”）。  


### 系统优势  
- **精准性提升**：针对不同问题类型采用最优检索策略，避免“一刀切”带来的误差。  
- **泛化能力增强**：支持从简单事实查询到复杂分析型问题的全场景覆盖。  
- **用户体验优化**：根据问题特性生成更贴合需求的回答，减少信息冗余或缺失。  

通过这种自适应机制，RAG系统能够像人类一样理解问题的深层意图，并匹配最恰当的知识检索方式，从而显著提升整体问答质量。

## Setting Up the Environment
We begin by importing necessary libraries.

In [None]:
pip install PymuPDF

Collecting PymuPDF
  Downloading pymupdf-1.26.1-cp39-abi3-manylinux_2_28_x86_64.whl.metadata (3.4 kB)
Downloading pymupdf-1.26.1-cp39-abi3-manylinux_2_28_x86_64.whl (24.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.1/24.1 MB[0m [31m20.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: PymuPDF
Successfully installed PymuPDF-1.26.1


In [None]:
import os
import numpy as np
import json
import fitz
from openai import OpenAI
import re

## Extracting Text from a PDF File
To implement RAG, we first need a source of textual data. In this case, we extract text from a PDF file using the PyMuPDF library.

In [None]:
def extract_text_from_pdf(pdf_path):
    """
    Extracts text from a PDF file and prints the first `num_chars` characters.

    Args:
    pdf_path (str): Path to the PDF file.

    Returns:
    str: Extracted text from the PDF.
    """
    # Open the PDF file
    mypdf = fitz.open(pdf_path)
    all_text = ""  # Initialize an empty string to store the extracted text

    # Iterate through each page in the PDF
    for page_num in range(mypdf.page_count):
        page = mypdf[page_num]  # Get the page
        text = page.get_text("text")  # Extract text from the page
        all_text += text  # Append the extracted text to the all_text string

    return all_text  # Return the extracted text

## Chunking the Extracted Text
Once we have the extracted text, we divide it into smaller, overlapping chunks to improve retrieval accuracy.

In [None]:
def chunk_text(text, n, overlap):
    """
    Chunks the given text into segments of n characters with overlap.

    Args:
    text (str): The text to be chunked.
    n (int): The number of characters in each chunk.
    overlap (int): The number of overlapping characters between chunks.

    Returns:
    List[str]: A list of text chunks.
    """
    chunks = []  # Initialize an empty list to store the chunks

    # Loop through the text with a step size of (n - overlap)
    for i in range(0, len(text), n - overlap):
        # Append a chunk of text from index i to i + n to the chunks list
        chunks.append(text[i:i + n])

    return chunks  # Return the list of text chunks

## Setting Up the OpenAI API Client
We initialize the OpenAI client to generate embeddings and responses.

In [None]:
client = OpenAI(
    base_url="xxxxx1/",
    api_key="skxxxxxx" # Retrieve the API key from environment variables
)

## Simple Vector Store Implementation
We'll create a basic vector store to manage document chunks and their embeddings.

In [None]:
class SimpleVectorStore:
    """
    A simple vector store implementation using NumPy.
    """
    def __init__(self):
        """
        Initialize the vector store.
        """
        self.vectors = []  # List to store embedding vectors
        self.texts = []  # List to store original texts
        self.metadata = []  # List to store metadata for each text

    def add_item(self, text, embedding, metadata=None):
        """
        Add an item to the vector store.

        Args:
        text (str): The original text.
        embedding (List[float]): The embedding vector.
        metadata (dict, optional): Additional metadata.
        """
        self.vectors.append(np.array(embedding))  # Convert embedding to numpy array and add to vectors list
        self.texts.append(text)  # Add the original text to texts list
        self.metadata.append(metadata or {})  # Add metadata to metadata list, default to empty dict if None

    def similarity_search(self, query_embedding, k=5, filter_func=None):
        """
        Find the most similar items to a query embedding.

        Args:
        query_embedding (List[float]): Query embedding vector.
        k (int): Number of results to return.
        filter_func (callable, optional): Function to filter results.

        Returns:
        List[Dict]: Top k most similar items with their texts and metadata.
        """
        if not self.vectors:
            return []  # Return empty list if no vectors are stored

        # Convert query embedding to numpy array
        query_vector = np.array(query_embedding)

        # Calculate similarities using cosine similarity
        similarities = []
        for i, vector in enumerate(self.vectors):
            # Apply filter if provided
            if filter_func and not filter_func(self.metadata[i]):
                continue

            # Calculate cosine similarity
            similarity = np.dot(query_vector, vector) / (np.linalg.norm(query_vector) * np.linalg.norm(vector))
            similarities.append((i, similarity))  # Append index and similarity score

        # Sort by similarity (descending)
        similarities.sort(key=lambda x: x[1], reverse=True)

        # Return top k results
        results = []
        for i in range(min(k, len(similarities))):
            idx, score = similarities[i]
            results.append({
                "text": self.texts[idx],  # Add the text
                "metadata": self.metadata[idx],  # Add the metadata
                "similarity": score  # Add the similarity score
            })

        return results  # Return the list of top k results

这段代码实现了一个简单但功能完整的向量数据库（Vector Store），它是基于NumPy数组实现的。向量数据库是现代AI系统中的核心组件，主要用于存储文本、图像或其他数据的向量表示（embeddings），并能根据向量相似度快速检索相关内容。

下面我将逐行解释这个向量数据库的工作原理：


### 类结构与初始化
```python
class SimpleVectorStore:
    """
    A simple vector store implementation using NumPy.
    """
    def __init__(self):
        """
        Initialize the vector store.
        """
        self.vectors = []  # List to store embedding vectors
        self.texts = []  # List to store original texts
        self.metadata = []  # List to store metadata for each text
```

这个类有三个核心属性：
1. `vectors`：存储所有文本的向量表示（embeddings）
2. `texts`：存储对应的原始文本
3. `metadata`：存储额外信息（如文本来源、创建时间等）

这三个列表通过索引位置关联，例如`vectors[0]`、`texts[0]`和`metadata[0]`对应同一条记录。


### 添加数据项
```python
def add_item(self, text, embedding, metadata=None):
    """
    Add an item to the vector store.

    Args:
    text (str): The original text.
    embedding (List[float]): The embedding vector.
    metadata (dict, optional): Additional metadata.
    """
    self.vectors.append(np.array(embedding))  # Convert embedding to numpy array and add to vectors list
    self.texts.append(text)  # Add the original text to texts list
    self.metadata.append(metadata or {})  # Add metadata to metadata list, default to empty dict if None
```

这个方法用于向向量数据库添加新记录：
1. 将输入的embedding转换为NumPy数组（提高计算效率）
2. 保存原始文本
3. 保存元数据（如果没有提供则使用空字典）

三个列表同步增长，保持索引位置的一致性。


### 相似度搜索
```python
def similarity_search(self, query_embedding, k=5, filter_func=None):
    """
    Find the most similar items to a query embedding.

    Args:
    query_embedding (List[float]): Query embedding vector.
    k (int): Number of results to return.
    filter_func (callable, optional): Function to filter results.

    Returns:
    List[Dict]: Top k most similar items with their texts and metadata.
    """
    if not self.vectors:
        return []  # Return empty list if no vectors are stored
    
    # Convert query embedding to numpy array
    query_vector = np.array(query_embedding)
    
    # Calculate similarities using cosine similarity
    similarities = []
    for i, vector in enumerate(self.vectors):
        # Apply filter if provided
        if filter_func and not filter_func(self.metadata[i]):
            continue
            
        # Calculate cosine similarity
        similarity = np.dot(query_vector, vector) / (np.linalg.norm(query_vector) * np.linalg.norm(vector))
        similarities.append((i, similarity))  # Append index and similarity score
    
    # Sort by similarity (descending)
    similarities.sort(key=lambda x: x[1], reverse=True)
    
    # Return top k results
    results = []
    for i in range(min(k, len(similarities))):
        idx, score = similarities[i]
        results.append({
            "text": self.texts[idx],  # Add the text
            "metadata": self.metadata[idx],  # Add the metadata
            "similarity": score  # Add the similarity score
        })
    
    return results  # Return the list of top k results
```

这是向量数据库的核心方法，实现了基于向量相似度的检索功能：

#### 输入参数：
- `query_embedding`：查询向量（表示用户的查询）
- `k`：返回结果的数量（默认5）
- `filter_func`：可选的过滤函数（用于筛选符合条件的记录）

#### 执行流程：
1. **输入检查**：如果数据库为空，直接返回空列表
2. **向量转换**：将查询向量转换为NumPy数组
3. **相似度计算**：
   - 遍历所有存储的向量
   - 对每个向量应用过滤条件（如果有）
   - 计算查询向量与存储向量的余弦相似度
   - 余弦相似度公式：`dot(a, b) / (norm(a) * norm(b))`
   - 保存结果（索引和相似度分数）

4. **结果排序**：按相似度分数降序排列
5. **结果收集**：提取前k个结果，包含原始文本、元数据和相似度分数

#### 余弦相似度：
这是向量检索中最常用的相似度度量方法，取值范围为[-1, 1]：
- 1表示两个向量方向完全相同（最相似）
- 0表示两个向量正交（不相关）
- -1表示两个向量方向完全相反

在自然语言处理中，余弦相似度越高，表示两个文本的语义越相近。


### 过滤功能
```python
if filter_func and not filter_func(self.metadata[i]):
    continue
```

这个条件语句允许用户通过自定义函数过滤结果。例如：
- 只检索特定来源的文档：`lambda meta: meta['source'] == 'book1'`
- 只检索最近一个月内添加的文档：`lambda meta: meta['timestamp'] > one_month_ago`

这为向量检索增加了灵活性，可以结合语义相似度和其他条件进行更精准的筛选。


### 为什么这个实现很重要？
向量数据库是现代RAG（检索增强生成）系统的核心组件，它解决了两个关键问题：
1. **语义检索**：能够理解用户查询的语义，而不仅仅是关键词匹配
2. **高效匹配**：通过向量相似度，可以快速找到相关内容（即使表述不同）

这个简单实现虽然使用了朴素的遍历算法（适用于小规模数据），但它展示了向量数据库的核心原理。在实际应用中，大规模向量数据库通常会使用更高效的数据结构（如KD树、HNSW图等）来加速检索过程。

## Creating Embeddings

In [None]:
def create_embeddings(text, model="text-embedding-ada-002"):
    """
    Creates embeddings for the given text.

    Args:
    text (str or List[str]): The input text(s) for which embeddings are to be created.
    model (str): The model to be used for creating embeddings.

    Returns:
    List[float] or List[List[float]]: The embedding vector(s).
    """
    # Handle both string and list inputs by converting string input to a list
    input_text = text if isinstance(text, list) else [text]

    # Create embeddings for the input text using the specified model
    response = client.embeddings.create(
        model=model,
        input=input_text
    )

    # If the input was a single string, return just the first embedding
    if isinstance(text, str):
        return response.data[0].embedding

    # Otherwise, return all embeddings for the list of texts
    return [item.embedding for item in response.data]

In [None]:
print(response.data)

NameError: name 'response' is not defined

## Document Processing Pipeline

In [None]:
def process_document(pdf_path, chunk_size=1000, chunk_overlap=200):
    """
    Process a document for use with adaptive retrieval.

    Args:
    pdf_path (str): Path to the PDF file.
    chunk_size (int): Size of each chunk in characters.
    chunk_overlap (int): Overlap between chunks in characters.

    Returns:
    Tuple[List[str], SimpleVectorStore]: Document chunks and vector store.
    """
    # Extract text from the PDF file
    print("Extracting text from PDF...")
    extracted_text = extract_text_from_pdf(pdf_path)

    # Chunk the extracted text
    print("Chunking text...")
    chunks = chunk_text(extracted_text, chunk_size, chunk_overlap)
    print(f"Created {len(chunks)} text chunks")

    # Create embeddings for the text chunks
    print("Creating embeddings for chunks...")
    chunk_embeddings = create_embeddings(chunks)

    # Initialize the vector store
    store = SimpleVectorStore()

    # Add each chunk and its embedding to the vector store with metadata
    for i, (chunk, embedding) in enumerate(zip(chunks, chunk_embeddings)):
        store.add_item(
            text=chunk,
            embedding=embedding,
            metadata={"index": i, "source": pdf_path}
        )

    print(f"Added {len(chunks)} chunks to the vector store")

    # Return the chunks and the vector store
    return chunks, store

这段代码实现了文档处理的完整流程，将PDF文档转换为适合自适应检索系统使用的格式。它主要完成三个核心任务：**文本提取**、**文本分块**和**向量嵌入**，最终构建一个可用于语义检索的向量数据库。

下面我将逐行解释这个文档处理函数的工作原理：


### 函数概述与参数
```python
def process_document(pdf_path, chunk_size=1000, chunk_overlap=200):
    """
    Process a document for use with adaptive retrieval.

    Args:
    pdf_path (str): Path to the PDF file.
    chunk_size (int): Size of each chunk in characters.
    chunk_overlap (int): Overlap between chunks in characters.

    Returns:
    Tuple[List[str], SimpleVectorStore]: Document chunks and vector store.
    """
```

这个函数接收三个参数：
- `pdf_path`：PDF文件的路径
- `chunk_size`：每个文本块的大小（默认1000个字符）
- `chunk_overlap`：相邻文本块之间的重叠长度（默认200个字符）

返回值是一个元组：
- 包含所有文本块的列表
- 已经填充好数据的向量数据库实例


### 文本提取与分块
```python
# Extract text from the PDF file
print("Extracting text from PDF...")
extracted_text = extract_text_from_pdf(pdf_path)

# Chunk the extracted text
print("Chunking text...")
chunks = chunk_text(extracted_text, chunk_size, chunk_overlap)
print(f"Created {len(chunks)} text chunks")
```

这部分代码完成两个关键步骤：
1. **文本提取**：调用`extract_text_from_pdf`函数从PDF文件中提取原始文本。这个函数通常使用OCR技术或直接解析PDF格式来获取文本内容。
2. **文本分块**：使用`chunk_text`函数将提取的完整文本分割成多个小块。分块时采用**滑动窗口**策略，相邻块之间有一定的重叠，这样可以避免重要信息被分割在两个块中而导致信息丢失。


### 向量嵌入生成
```python
# Create embeddings for the text chunks
print("Creating embeddings for chunks...")
chunk_embeddings = create_embeddings(chunks)
```

这行代码调用`create_embeddings`函数，将所有文本块转换为向量表示（embeddings）。向量嵌入是一种将文本转换为多维空间中向量的技术，使得语义相似的文本在向量空间中距离更近。

这里的`create_embeddings`函数通常会调用大型语言模型（如OpenAI的text-embedding-ada-002）来生成向量表示。注意，这个函数支持批量处理，可以同时为多个文本生成嵌入向量。


### 构建向量数据库
```python
# Initialize the vector store
store = SimpleVectorStore()

# Add each chunk and its embedding to the vector store with metadata
for i, (chunk, embedding) in enumerate(zip(chunks, chunk_embeddings)):
    store.add_item(
        text=chunk,
        embedding=embedding,
        metadata={"index": i, "source": pdf_path}
    )

print(f"Added {len(chunks)} chunks to the vector store")
```

这部分代码将处理好的文本块及其向量表示存入向量数据库：
1. **初始化向量库**：创建`SimpleVectorStore`类的实例
2. **批量添加数据**：遍历所有文本块及其对应的向量，调用`add_item`方法将它们存入向量库
3. **添加元数据**：为每个文本块添加元数据，包括块索引和来源文件路径

这里的元数据很重要，它为后续的检索和结果解释提供了额外信息。例如，当系统返回某个文本块时，我们可以知道它来自哪个文件以及在文件中的位置。


### 函数返回
```python
# Return the chunks and the vector store
return chunks, store
```

最终返回两个对象：
1. **文本块列表**：包含所有分割后的文本片段
2. **向量数据库**：存储了所有文本块的向量表示，可以用于语义检索

这两个对象是后续自适应检索系统的基础：文本块提供了原始内容，而向量数据库则支持基于语义的高效检索。


### 为什么这样设计很重要？
这个文档处理流程是整个自适应检索系统的基础，它解决了几个关键问题：

1. **处理长文档**：大多数LLM模型有输入长度限制，直接处理整个文档不可行。分块技术将长文档转换为适合模型处理的小块。

2. **保留上下文**：通过设置块之间的重叠，确保相邻块之间有语义连续性，避免重要信息丢失。

3. **语义检索**：将文本转换为向量表示后，可以使用余弦相似度等方法进行语义检索，比传统的关键词匹配更强大。

4. **可追溯性**：通过元数据记录每个文本块的来源和位置，使得检索结果可以追溯到原始文档，增强回答的可信度。

这个实现虽然简单，但包含了文档处理的核心要素，是构建高效RAG系统的关键一步。

## Query Classification

In [None]:
def classify_query(query, model="o1"):
    """
    Classify a query into one of four categories: Factual, Analytical, Opinion, or Contextual.

    Args:
        query (str): User query
        model (str): LLM model to use

    Returns:
        str: Query category
    """
    # Define the system prompt to guide the AI's classification
    system_prompt = """You are an expert at classifying questions.
        Classify the given query into exactly one of these categories:
        - Factual: Queries seeking specific, verifiable information.
        - Analytical: Queries requiring comprehensive analysis or explanation.
        - Opinion: Queries about subjective matters or seeking diverse viewpoints.
        - Contextual: Queries that depend on user-specific context.

        Return ONLY the category name, without any explanation or additional text.
    """

    # Create the user prompt with the query to be classified
    user_prompt = f"Classify this query: {query}"

    # Generate the classification response from the AI model
    response = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ],
        temperature=0
    )

    # Extract and strip the category from the response
    category = response.choices[0].message.content.strip()

    # Define the list of valid categories
    valid_categories = ["Factual", "Analytical", "Opinion", "Contextual"]

    # Ensure the returned category is valid
    for valid in valid_categories:
        if valid in category:
            return valid

    # Default to "Factual" if classification fails
    return "Factual"

## Implementing Specialized Retrieval Strategies
### 1. Factual Strategy - Focus on Precision

In [None]:
def factual_retrieval_strategy(query, vector_store, k=4):
    """
    Retrieval strategy for factual queries focusing on precision.

    Args:
        query (str): User query
        vector_store (SimpleVectorStore): Vector store
        k (int): Number of documents to return

    Returns:
        List[Dict]: Retrieved documents
    """
    print(f"Executing Factual retrieval strategy for: '{query}'")

    # Use LLM to enhance the query for better precision
    system_prompt = """You are an expert at enhancing search queries.
        Your task is to reformulate the given factual query to make it more precise and
        specific for information retrieval. Focus on key entities and their relationships.

        Provide ONLY the enhanced query without any explanation.
    """

    user_prompt = f"Enhance this factual query: {query}"

    # Generate the enhanced query using the LLM
    response = client.chat.completions.create(
        model="o1",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ],
        temperature=0
    )

    # Extract and print the enhanced query
    enhanced_query = response.choices[0].message.content.strip()
    print(f"Enhanced query: {enhanced_query}")

    # Create embeddings for the enhanced query
    query_embedding = create_embeddings(enhanced_query)

    # Perform initial similarity search to retrieve documents
    initial_results = vector_store.similarity_search(query_embedding, k=k*2)

    # Initialize a list to store ranked results
    ranked_results = []

    # Score and rank documents by relevance using LLM
    for doc in initial_results:
        relevance_score = score_document_relevance(enhanced_query, doc["text"])
        ranked_results.append({
            "text": doc["text"],
            "metadata": doc["metadata"],
            "similarity": doc["similarity"],
            "relevance_score": relevance_score
        })

    # Sort the results by relevance score in descending order
    ranked_results.sort(key=lambda x: x["relevance_score"], reverse=True)

    # Return the top k results
    return ranked_results[:k]

### 事实型检索策略详解：精准定位信息的"智能放大镜"

这个函数实现了针对事实型问题的检索策略，核心目标是像"放大镜"一样精准定位具体答案。事实型问题通常寻求具体、可验证的信息（如"谁发明了电灯？"），因此策略设计聚焦于提高检索的精确性。下面逐行解析其工作原理：


### 策略整体流程概述
```python
def factual_retrieval_strategy(query, vector_store, k=4):
    """针对事实型查询的检索策略（注重精确性）"""
    print(f"Executing Factual retrieval strategy for: '{query}'")
    # ...（中间代码）
    return ranked_results[:k]
```

这个策略遵循"**优化查询→初步检索→二次筛选**"的三层逻辑：
1. 先用LLM优化查询，使其更精准
2. 检索出较多候选结果（k*2）
3. 再用LLM对结果进行相关性评分，筛选出最相关的k个


### 第一步：查询优化——让问题更"锋利"
```python
# 使用LLM优化查询以提高精确性
system_prompt = """你是优化搜索查询的专家。
    你的任务是重新表述给定的事实型查询，使其更精确、更具体，以用于信息检索。
    关注关键实体及其关系。

    只返回优化后的查询，不做任何解释。
"""

user_prompt = f"Enhance this factual query: {query}"

# 调用LLM生成优化后的查询
response = client.chat.completions.create(
    model="o1",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt}
    ],
    temperature=0  # 确保输出确定性
)

enhanced_query = response.choices[0].message.content.strip()
print(f"优化后的查询: {enhanced_query}")
```

**核心技巧**：通过LLM将模糊问题转化为精准查询。例如：
- 输入查询："相对论的提出者"
- 优化后："阿尔伯特·爱因斯坦在哪一年提出了相对论？"

**为什么有效**：
1. 明确关键实体（爱因斯坦）和关系（提出相对论）
2. 添加具体维度（时间）使查询更精准
3. `temperature=0`确保LLM生成最确定的答案，避免发散


### 第二步：初步检索——撒网捞鱼
```python
# 为优化后的查询创建嵌入向量
query_embedding = create_embeddings(enhanced_query)

# 执行初始检索（获取k*2个结果）
initial_results = vector_store.similarity_search(query_embedding, k=k*2)
```

**策略要点**：
1. 用优化后的查询生成向量，确保检索语义精准
2. 检索数量设为`k*2`（如k=4时检索8条），为后续筛选留有余地
3. 这里使用向量数据库的`similarity_search`方法，基于余弦相似度检索语义相近的文本块


### 第三步：二次筛选——LLM当"裁判"
```python
# 初始化排序结果列表
ranked_results = []

# 使用LLM对每个文档进行相关性评分
for doc in initial_results:
    relevance_score = score_document_relevance(enhanced_query, doc["text"])
    ranked_results.append({
        "text": doc["text"],
        "metadata": doc["metadata"],
        "similarity": doc["similarity"],
        "relevance_score": relevance_score
    })

# 按相关性分数降序排序
ranked_results.sort(key=lambda x: x["relevance_score"], reverse=True)

# 返回前k个结果
return ranked_results[:k]
```

**关键函数`score_document_relevance`解析**：
```python
def score_document_relevance(query, document):
    """让LLM判断文档与查询的相关性（0-10分）"""
    system_prompt = """你是评估相关性的专家，给文档打0-10分：
    0=完全不相关，10=完美匹配查询"""
    # ...（构造提示词并调用LLM）
    # 提取分数并返回
```

**二次筛选的价值**：
1. 向量相似度是"语义相近"的度量，但不一定完全匹配问题
2. LLM能理解自然语言的细微差别，例如：
   - 查询："Python 3.8的新特性"
   - 文档1："Python 3.8新增海象运算符"（相关度10分）
   - 文档2："Python 3.9的性能优化"（向量相似度可能高，但LLM会打低分）
3. 结合`similarity`和`relevance_score`双重排序，确保结果既语义相关又精准匹配问题


### 策略核心优势：精准性超越传统检索
1. **查询优化层**：将自然语言问题转化为信息检索友好的表达式，类似人类将问题转化为数据库查询
2. **双层筛选机制**：
   - 向量相似度：快速过滤出语义相关的候选
   - LLM相关性评分：在候选中精准定位最匹配的答案
3. **可解释性**：每个结果都带有`relevance_score`，便于理解为什么选中该文档


### 实际应用案例
**用户查询**："世界上最长的河流是哪条？"

#### 执行流程：
1. **查询优化**：
   - LLM将查询转化为："世界上长度最长的河流及其确切长度是多少？"
2. **初步检索**：
   - 检索出8个与"河流长度"相关的文本块，包括：
     - "尼罗河全长约6650公里"
     - "亚马逊河长度争议"
     - "长江亚洲第一长"
3. **二次筛选**：
   - LLM对每个文档打分：
     - 尼罗河文档：10分（直接回答问题）
     - 亚马逊河文档：7分（涉及长度但有争议）
     - 长江文档：5分（仅提及亚洲范围）
4. **返回结果**：
   - 前4个结果中，尼罗河相关文档排在最前，确保用户获得准确答案


### 与普通检索的对比
| 对比维度        | 普通检索（仅向量相似度）        | 事实型检索策略                |
|-----------------|---------------------------------|-----------------------------|
| **查询处理**    | 直接使用原始查询                | 先优化查询，明确关键实体关系  |
| **结果数量**    | 直接返回k个结果                | 先检索k*2个，再筛选出k个     |
| **相关性判断**  | 仅依赖向量距离                  | 向量距离+LLM语义理解        |
| **精准性**      | 可能返回语义相关但不直接的答案  | 精准定位直接回答问题的内容  |

这个策略通过引入LLM进行查询优化和结果重排序，将事实型问题的检索精准性提升到了新的高度，尤其适合需要具体答案的场景。

### 2. Analytical Strategy - Comprehensive Coverage

In [None]:
def analytical_retrieval_strategy(query, vector_store, k=4):
    """
    Retrieval strategy for analytical queries focusing on comprehensive coverage.

    Args:
        query (str): User query
        vector_store (SimpleVectorStore): Vector store
        k (int): Number of documents to return

    Returns:
        List[Dict]: Retrieved documents
    """
    print(f"Executing Analytical retrieval strategy for: '{query}'")

    # Define the system prompt to guide the AI in generating sub-questions
    system_prompt = """You are an expert at breaking down complex questions.
    Generate sub-questions that explore different aspects of the main analytical query.
    These sub-questions should cover the breadth of the topic and help retrieve
    comprehensive information.

    Return a list of exactly 3 sub-questions, one per line.
    """

    # Create the user prompt with the main query
    user_prompt = f"Generate sub-questions for this analytical query: {query}"

    # Generate the sub-questions using the LLM
    response = client.chat.completions.create(
        model="o1",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ],
        temperature=0.3
    )

    # Extract and clean the sub-questions
    sub_queries = response.choices[0].message.content.strip().split('\n')
    sub_queries = [q.strip() for q in sub_queries if q.strip()]
    print(f"Generated sub-queries: {sub_queries}")

    # Retrieve documents for each sub-query
    all_results = []
    for sub_query in sub_queries:
        # Create embeddings for the sub-query
        sub_query_embedding = create_embeddings(sub_query)
        # Perform similarity search for the sub-query
        results = vector_store.similarity_search(sub_query_embedding, k=2)
        all_results.extend(results)

    # Ensure diversity by selecting from different sub-query results
    # Remove duplicates (same text content)
    unique_texts = set()
    diverse_results = []

    for result in all_results:
        if result["text"] not in unique_texts:
            unique_texts.add(result["text"])
            diverse_results.append(result)

    # If we need more results to reach k, add more from initial results
    if len(diverse_results) < k:
        # Direct retrieval for the main query
        main_query_embedding = create_embeddings(query)
        main_results = vector_store.similarity_search(main_query_embedding, k=k)

        for result in main_results:
            if result["text"] not in unique_texts and len(diverse_results) < k:
                unique_texts.add(result["text"])
                diverse_results.append(result)

    # Return the top k diverse results
    return diverse_results[:k]

### 分析型检索策略详解：多维视角拆解复杂问题

这个函数实现了针对分析型问题的检索策略，核心思路是将复杂问题拆解为多个维度，从不同视角检索信息，从而获得全面的分析依据。分析型问题通常需要综合多方面因素（如"人工智能对就业市场的影响"），因此策略设计聚焦于"全面覆盖"和"多维度分析"。


### 策略整体流程概述
```python
def analytical_retrieval_strategy(query, vector_store, k=4):
    """针对分析型查询的检索策略（注重全面性）"""
    print(f"Executing Analytical retrieval strategy for: '{query}'")
    # ...（中间代码）
    return diverse_results[:k]
```

这个策略遵循"**问题拆解→多维度检索→结果整合与去重**"的三层逻辑：
1. 先用LLM将复杂问题拆解为3个子问题（覆盖不同维度）
2. 针对每个子问题分别检索相关文档
3. 整合结果并确保多样性，避免重复和片面性


### 第一步：问题拆解——把复杂问题"切片"
```python
# 引导LLM生成子问题的系统提示词
system_prompt = """你是拆解复杂问题的专家。
为给定的分析型查询生成子问题，这些子问题应涵盖主题的不同方面，
有助于检索全面的信息。

精确返回3个子问题，每行一个。
"""

user_prompt = f"Generate sub-questions for this analytical query: {query}"

# 调用LLM生成子问题
response = client.chat.completions.create(
    model="o1",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt}
    ],
    temperature=0.3  # 稍微增加随机性，获取多样化的子问题
)

# 提取并清理子问题
sub_queries = response.choices[0].message.content.strip().split('\n')
sub_queries = [q.strip() for q in sub_queries if q.strip()]
print(f"生成的子问题: {sub_queries}")
```

**核心技巧**：通过精心设计的提示词，引导LLM从不同维度拆解问题。例如：
- 原始查询："人工智能对就业市场的影响"
- 生成的子问题：
  1. "哪些行业的工作岗位最容易被AI自动化取代？"（影响维度）
  2. "AI创造了哪些新的就业机会？"（机遇维度）
  3. "政府和企业应采取哪些措施应对就业结构变化？"（对策维度）

**为什么`temperature=0.3`**：
- 较低的temperature确保生成的子问题围绕主题展开，不会偏离太远
- 适度的随机性避免每次生成相同的子问题，增加多样性


### 第二步：多维度检索——每个子问题"撒一网"
```python
# 为每个子问题检索相关文档
all_results = []
for sub_query in sub_queries:
    sub_query_embedding = create_embeddings(sub_query)
    results = vector_store.similarity_search(sub_query_embedding, k=2)
    all_results.extend(results)
```

**策略要点**：
1. 每个子问题生成独立的向量表示
2. 为每个子问题检索2个相关文档（共3个子问题，最多6个文档）
3. 这种方式确保从不同维度收集信息，避免只关注单一视角

**举例说明**：
- 对于"哪些行业易被AI取代"子问题，可能检索到：
  - "制造业自动化程度及岗位流失预测"
  - "客服行业AI聊天机器人的应用现状"
- 对于"AI创造的新机会"子问题，可能检索到：
  - "AI训练师和数据标注员的职业发展"
  - "机器学习工程师的市场需求分析"


### 第三步：结果整合与去重——确保信息多样性
```python
# 去重（相同文本内容只保留一个）
unique_texts = set()
diverse_results = []

for result in all_results:
    if result["text"] not in unique_texts:
        unique_texts.add(result["text"])
        diverse_results.append(result)

# 如果结果不足k个，从主查询直接检索补充
if len(diverse_results) < k:
    main_query_embedding = create_embeddings(query)
    main_results = vector_store.similarity_search(main_query_embedding, k=k)
    
    for result in main_results:
        if result["text"] not in unique_texts and len(diverse_results) < k:
            unique_texts.add(result["text"])
            diverse_results.append(result)

# 返回前k个多样化结果
return diverse_results[:k]
```

**去重与补充机制**：
1. **去重逻辑**：使用集合（set）存储已选文本内容，避免重复
2. **补充机制**：如果去重后结果不足k个，直接使用原始查询进行检索
3. **多样性保障**：优先保留来自不同子问题的结果，确保覆盖多维度视角


### 策略核心优势：全面性与多维度分析
1. **问题拆解能力**：利用LLM的理解能力，自动识别复杂问题的关键维度
2. **多维度检索**：针对每个维度独立检索，避免单一检索可能导致的视角盲区
3. **智能去重**：既保证信息全面性，又避免冗余内容干扰分析
4. **灵活性**：当某维度信息不足时，能从主查询补充相关内容


### 实际应用案例
**用户查询**："如何评估一家科技公司的投资价值？"

#### 执行流程：
1. **问题拆解**：
   - LLM生成3个子问题：
     1. "评估科技公司投资价值的关键财务指标有哪些？"
     2. "科技公司的技术创新能力如何衡量？"
     3. "哪些市场因素影响科技公司的长期增长潜力？"
2. **多维度检索**：
   - 针对财务指标：检索PE、PS等估值方法的文档
   - 针对技术创新：检索专利数量、研发投入等内容
   - 针对市场因素：检索行业竞争格局、政策环境等资料
3. **结果整合**：
   - 从每个维度筛选最相关的文档，去重后形成最终结果集

#### 返回结果示例：
1. "科技股估值的五大核心指标解析"（财务维度）
2. "专利分析在科技公司评估中的应用"（技术维度）
3. "全球半导体行业竞争格局与未来趋势"（市场维度）
4. "科技公司研发投入与市场表现的相关性研究"（综合维度）


### 与事实型检索的对比
| 对比维度        | 事实型检索策略                | 分析型检索策略                |
|-----------------|-----------------------------|-----------------------------|
| **问题类型**    | 寻求具体、明确答案          | 需要综合分析和多维度视角    |
| **查询处理**    | 优化查询以提高精确性        | 拆解问题以覆盖多方面        |
| **检索方式**    | 单次检索+LLM重排序          | 多次检索（每个子问题一次）  |
| **结果特点**    | 精准定位直接答案            | 全面覆盖不同维度的信息      |
| **多样性控制**  | 无特殊处理                  | 严格去重并确保多维度覆盖    |

这个策略特别适合需要综合分析的场景，如商业决策、学术研究、政策制定等。通过将复杂问题拆解为多个可检索的子问题，它能够系统性地收集全面信息，为深入分析提供坚实基础。

### 3. Opinion Strategy - Diverse Perspectives

In [None]:
def opinion_retrieval_strategy(query, vector_store, k=4):
    """
    Retrieval strategy for opinion queries focusing on diverse perspectives.

    Args:
        query (str): User query
        vector_store (SimpleVectorStore): Vector store
        k (int): Number of documents to return

    Returns:
        List[Dict]: Retrieved documents
    """
    print(f"Executing Opinion retrieval strategy for: '{query}'")

    # Define the system prompt to guide the AI in identifying different perspectives
    system_prompt = """You are an expert at identifying different perspectives on a topic.
        For the given query about opinions or viewpoints, identify different perspectives
        that people might have on this topic.

        Return a list of exactly 3 different viewpoint angles, one per line.
    """

    # Create the user prompt with the main query
    user_prompt = f"Identify different perspectives on: {query}"

    # Generate the different perspectives using the LLM
    response = client.chat.completions.create(
        model="o1",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ],
        temperature=0.3
    )

    # Extract and clean the viewpoints
    viewpoints = response.choices[0].message.content.strip().split('\n')
    viewpoints = [v.strip() for v in viewpoints if v.strip()]
    print(f"Identified viewpoints: {viewpoints}")

    # Retrieve documents representing each viewpoint
    all_results = []
    for viewpoint in viewpoints:
        # Combine the main query with the viewpoint
        combined_query = f"{query} {viewpoint}"
        # Create embeddings for the combined query
        viewpoint_embedding = create_embeddings(combined_query)
        # Perform similarity search for the combined query
        results = vector_store.similarity_search(viewpoint_embedding, k=2)

        # Mark results with the viewpoint they represent
        for result in results:
            result["viewpoint"] = viewpoint

        # Add the results to the list of all results
        all_results.extend(results)

    # Select a diverse range of opinions
    # Ensure we get at least one document from each viewpoint if possible
    selected_results = []
    for viewpoint in viewpoints:
        # Filter documents by viewpoint
        viewpoint_docs = [r for r in all_results if r.get("viewpoint") == viewpoint]
        if viewpoint_docs:
            selected_results.append(viewpoint_docs[0])

    # Fill remaining slots with highest similarity docs
    remaining_slots = k - len(selected_results)
    if remaining_slots > 0:
        # Sort remaining docs by similarity
        remaining_docs = [r for r in all_results if r not in selected_results]
        remaining_docs.sort(key=lambda x: x["similarity"], reverse=True)
        selected_results.extend(remaining_docs[:remaining_slots])

    # Return the top k results
    return selected_results[:k]

### 观点型检索策略详解：多元视角的"辩论会组织者"

这个函数实现了针对观点型问题的检索策略，核心目标是像"辩论会组织者"一样，收集并呈现关于某一主题的多元观点。观点型问题通常涉及主观判断（如"元宇宙是否值得投资"），因此策略设计聚焦于"视角多样性"和"无偏呈现"。


### 策略整体流程概述
```python
def opinion_retrieval_strategy(query, vector_store, k=4):
    """针对观点型查询的检索策略（注重多元视角）"""
    print(f"Executing Opinion retrieval strategy for: '{query}'")
    # ...（中间代码）
    return selected_results[:k]
```

这个策略遵循"**视角识别→定向检索→平衡选择**"的三层逻辑：
1. 先用LLM识别关于该主题的不同观点角度
2. 针对每个观点角度定向检索支持文档
3. 平衡选择不同观点的文档，确保无偏呈现


### 第一步：视角识别——发现辩论的"正反方"
```python
# 引导LLM识别不同观点的系统提示词
system_prompt = """你是识别主题不同观点的专家。
对于给定的关于观点或看法的查询，识别人们可能持有的不同观点。

精确返回3个不同的观点角度，每行一个。
"""

user_prompt = f"Identify different perspectives on: {query}"

# 调用LLM生成不同观点
response = client.chat.completions.create(
    model="o1",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt}
    ],
    temperature=0.3  # 适度随机，获取多样化视角
)

# 提取并清理观点
viewpoints = response.choices[0].message.content.strip().split('\n')
viewpoints = [v.strip() for v in viewpoints if v.strip()]
print(f"识别的观点: {viewpoints}")
```

**核心技巧**：通过提示词引导LLM扮演"中立观察者"，发现辩论的不同立场。例如：
- 原始查询："社交媒体对青少年的影响"
- 生成的观点角度：
  1. "社交媒体对青少年认知发展的积极影响"（支持方）
  2. "社交媒体导致青少年焦虑和注意力分散"（反对方）
  3. "需要平衡使用的中立观点"（中立方）

**为什么`temperature=0.3`**：
- 既保证观点围绕主题展开（非完全随机）
- 又能生成稍有差异的视角，避免重复或片面


### 第二步：定向检索——为每个观点"找论据"
```python
# 为每个观点检索支持文档
all_results = []
for viewpoint in viewpoints:
    combined_query = f"{query} {viewpoint}"  # 组合主查询和观点
    viewpoint_embedding = create_embeddings(combined_query)
    results = vector_store.similarity_search(viewpoint_embedding, k=2)
    
    # 标记每个结果属于哪个观点
    for result in results:
        result["viewpoint"] = viewpoint
    
    all_results.extend(results)
```

**策略要点**：
1. 将主查询与观点组合（如"元宇宙是否值得投资 支持观点"）
2. 为每个组合查询生成向量并检索2个相关文档
3. 为检索结果标记所属观点，便于后续分类

**举例说明**：
- 观点1："元宇宙具有巨大投资潜力"
  - 检索到："元宇宙用户增长数据与市场规模预测"
  - 检索到："科技巨头在元宇宙的布局与投资"
- 观点2："元宇宙存在泡沫和不确定性"
  - 检索到："元宇宙项目失败案例分析"
  - 检索到："虚拟资产监管风险解读"


### 第三步：平衡选择——做无偏的"观点整合者"
```python
# 确保每个观点至少有一个代表文档
selected_results = []
for viewpoint in viewpoints:
    viewpoint_docs = [r for r in all_results if r.get("viewpoint") == viewpoint]
    if viewpoint_docs:
        selected_results.append(viewpoint_docs[0])  # 选第一个匹配文档

# 用高相似度文档填充剩余位置
remaining_slots = k - len(selected_results)
if remaining_slots > 0:
    remaining_docs = [r for r in all_results if r not in selected_results]
    remaining_docs.sort(key=lambda x: x["similarity"], reverse=True)
    selected_results.extend(remaining_docs[:remaining_slots])

# 返回前k个平衡结果
return selected_results[:k]
```

**平衡选择机制**：
1. **优先保证视角多样性**：每个观点至少选一个文档（如果有）
2. **相似度补充**：当某观点无足够文档时，用高相似度文档填充
3. **去重处理**：通过`selected_results`自然去重（同文档不会被选多次）

**关键逻辑**：
- 避免某一观点占据过多份额（如k=4时，至少3个来自不同观点，第4个选最相似的）
- 通过`viewpoint`标记确保结果可追溯，便于理解每个文档的立场


### 策略核心优势：无偏性与视角多样性
1. **自动视角发现**：无需人工标注，LLM自动识别潜在的辩论角度
2. **定向检索能力**：针对每个观点精准匹配支持文档
3. **平衡选择机制**：通过算法确保多元观点被公平呈现
4. **可解释性**：每个结果都明确标注所属观点，便于用户理解立场


### 实际应用案例
**用户查询**："是否应该全面禁止塑料包装？"

#### 执行流程：
1. **视角识别**：
   - LLM生成3个观点：
     1. "环境保护角度的支持观点"
     2. "经济影响角度的反对观点"
     3. "替代方案可行性的中立观点"
2. **定向检索**：
   - 支持观点："塑料污染对海洋生态的影响研究"
   - 反对观点："塑料包装对中小企业成本的影响分析"
   - 中立观点："可降解材料的研发进展与挑战"
3. **平衡选择**：
   - 先为每个观点选1个文档（共3个）
   - 再选1个相似度最高的文档（如"全球各国塑料禁令效果对比"）

#### 返回结果示例：
1. "塑料微粒对海洋生物的危害及案例"（支持禁塑）
2. "发展中国家塑料包装依赖的经济原因"（反对全面禁塑）
3. "玉米淀粉基可降解包装的成本与性能分析"（中立视角）
4. "欧盟塑料税政策的实施效果评估"（综合视角）


### 与其他策略的对比
| 对比维度        | 事实型策略          | 分析型策略          | 观点型策略          |
|-----------------|-------------------|-------------------|-------------------|
| **问题类型**    | 具体事实          | 复杂分析          | 主观观点          |
| **核心目标**    | 精准定位          | 全面覆盖          | 多元呈现          |
| **查询处理**    | 优化查询          | 拆解问题          | 识别视角          |
| **结果特点**    | 唯一正确答案      | 多维度分析依据    | 多元观点论据      |
| **价值取向**    | 客观性            | 系统性            | 平衡性            |

这个策略特别适合需要决策参考、舆论分析或学术讨论的场景。通过系统性地收集多元观点，它帮助用户超越单一视角，在充分了解各方立场后做出更明智的判断，避免信息茧房带来的认知偏差。

### 4. Contextual Strategy - User Context Integration

In [None]:
def contextual_retrieval_strategy(query, vector_store, k=4, user_context=None):
    """
    Retrieval strategy for contextual queries integrating user context.

    Args:
        query (str): User query
        vector_store (SimpleVectorStore): Vector store
        k (int): Number of documents to return
        user_context (str): Additional user context

    Returns:
        List[Dict]: Retrieved documents
    """
    print(f"Executing Contextual retrieval strategy for: '{query}'")

    # If no user context provided, try to infer it from the query
    if not user_context:
        system_prompt = """You are an expert at understanding implied context in questions.
For the given query, infer what contextual information might be relevant or implied
but not explicitly stated. Focus on what background would help answering this query.

Return a brief description of the implied context."""

        user_prompt = f"Infer the implied context in this query: {query}"

        # Generate the inferred context using the LLM
        response = client.chat.completions.create(
            model="o1",
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": user_prompt}
            ],
            temperature=0.1
        )

        # Extract and print the inferred context
        user_context = response.choices[0].message.content.strip()
        print(f"Inferred context: {user_context}")

    # Reformulate the query to incorporate context
    system_prompt = """You are an expert at reformulating questions with context.
    Given a query and some contextual information, create a more specific query that
    incorporates the context to get more relevant information.

    Return ONLY the reformulated query without explanation."""

    user_prompt = f"""
    Query: {query}
    Context: {user_context}

    Reformulate the query to incorporate this context:"""

    # Generate the contextualized query using the LLM
    response = client.chat.completions.create(
        model="o1",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ],
        temperature=0
    )

    # Extract and print the contextualized query
    contextualized_query = response.choices[0].message.content.strip()
    print(f"Contextualized query: {contextualized_query}")

    # Retrieve documents based on the contextualized query
    query_embedding = create_embeddings(contextualized_query)
    initial_results = vector_store.similarity_search(query_embedding, k=k*2)

    # Rank documents considering both relevance and user context
    ranked_results = []

    for doc in initial_results:
        # Score document relevance considering the context
        context_relevance = score_document_context_relevance(query, user_context, doc["text"])
        ranked_results.append({
            "text": doc["text"],
            "metadata": doc["metadata"],
            "similarity": doc["similarity"],
            "context_relevance": context_relevance
        })

    # Sort by context relevance and return top k results
    ranked_results.sort(key=lambda x: x["context_relevance"], reverse=True)
    return ranked_results[:k]

## Helper Functions for Document Scoring

In [None]:
def score_document_relevance(query, document, model="o1"):
    """
    Score document relevance to a query using LLM.

    Args:
        query (str): User query
        document (str): Document text
        model (str): LLM model

    Returns:
        float: Relevance score from 0-10
    """
    # System prompt to instruct the model on how to rate relevance
    system_prompt = """You are an expert at evaluating document relevance.
        Rate the relevance of a document to a query on a scale from 0 to 10, where:
        0 = Completely irrelevant
        10 = Perfectly addresses the query

        Return ONLY a numerical score between 0 and 10, nothing else.
    """

    # Truncate document if it's too long
    doc_preview = document[:1500] + "..." if len(document) > 1500 else document

    # User prompt containing the query and document preview
    user_prompt = f"""
        Query: {query}

        Document: {doc_preview}

        Relevance score (0-10):
    """

    # Generate response from the model
    response = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ],
        temperature=0
    )

    # Extract the score from the model's response
    score_text = response.choices[0].message.content.strip()

    # Extract numeric score using regex
    match = re.search(r'(\d+(\.\d+)?)', score_text)
    if match:
        score = float(match.group(1))
        return min(10, max(0, score))  # Ensure score is within 0-10
    else:
        # Default score if extraction fails
        return 5.0

In [None]:
def score_document_context_relevance(query, context, document, model="o1"):
    """
    Score document relevance considering both query and context.

    Args:
        query (str): User query
        context (str): User context
        document (str): Document text
        model (str): LLM model

    Returns:
        float: Relevance score from 0-10
    """
    # System prompt to instruct the model on how to rate relevance considering context
    system_prompt = """You are an expert at evaluating document relevance considering context.
        Rate the document on a scale from 0 to 10 based on how well it addresses the query
        when considering the provided context, where:
        0 = Completely irrelevant
        10 = Perfectly addresses the query in the given context

        Return ONLY a numerical score between 0 and 10, nothing else.
    """

    # Truncate document if it's too long
    doc_preview = document[:1500] + "..." if len(document) > 1500 else document

    # User prompt containing the query, context, and document preview
    user_prompt = f"""
    Query: {query}
    Context: {context}

    Document: {doc_preview}

    Relevance score considering context (0-10):
    """

    # Generate response from the model
    response = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ],
        temperature=0
    )

    # Extract the score from the model's response
    score_text = response.choices[0].message.content.strip()

    # Extract numeric score using regex
    match = re.search(r'(\d+(\.\d+)?)', score_text)
    if match:
        score = float(match.group(1))
        return min(10, max(0, score))  # Ensure score is within 0-10
    else:
        # Default score if extraction fails
        return 5.0

## The Core Adaptive Retriever

In [None]:
def adaptive_retrieval(query, vector_store, k=4, user_context=None):
    """
    Perform adaptive retrieval by selecting and executing the appropriate strategy.

    Args:
        query (str): User query
        vector_store (SimpleVectorStore): Vector store
        k (int): Number of documents to retrieve
        user_context (str): Optional user context for contextual queries

    Returns:
        List[Dict]: Retrieved documents
    """
    # Classify the query to determine its type
    query_type = classify_query(query)
    print(f"Query classified as: {query_type}")

    # Select and execute the appropriate retrieval strategy based on the query type
    if query_type == "Factual":
        # Use the factual retrieval strategy for precise information
        results = factual_retrieval_strategy(query, vector_store, k)
    elif query_type == "Analytical":
        # Use the analytical retrieval strategy for comprehensive coverage
        results = analytical_retrieval_strategy(query, vector_store, k)
    elif query_type == "Opinion":
        # Use the opinion retrieval strategy for diverse perspectives
        results = opinion_retrieval_strategy(query, vector_store, k)
    elif query_type == "Contextual":
        # Use the contextual retrieval strategy, incorporating user context
        results = contextual_retrieval_strategy(query, vector_store, k, user_context)
    else:
        # Default to factual retrieval strategy if classification fails
        results = factual_retrieval_strategy(query, vector_store, k)

    return results  # Return the retrieved documents

## Response Generation

In [None]:
def generate_response(query, results, query_type, model="o1"):
    """
    Generate a response based on query, retrieved documents, and query type.

    Args:
        query (str): User query
        results (List[Dict]): Retrieved documents
        query_type (str): Type of query
        model (str): LLM model

    Returns:
        str: Generated response
    """
    # Prepare context from retrieved documents by joining their texts with separators
    context = "\n\n---\n\n".join([r["text"] for r in results])

    # Create custom system prompt based on query type
    if query_type == "Factual":
        system_prompt = """You are a helpful assistant providing factual information.
    Answer the question based on the provided context. Focus on accuracy and precision.
    If the context doesn't contain the information needed, acknowledge the limitations."""

    elif query_type == "Analytical":
        system_prompt = """You are a helpful assistant providing analytical insights.
    Based on the provided context, offer a comprehensive analysis of the topic.
    Cover different aspects and perspectives in your explanation.
    If the context has gaps, acknowledge them while providing the best analysis possible."""

    elif query_type == "Opinion":
        system_prompt = """You are a helpful assistant discussing topics with multiple viewpoints.
    Based on the provided context, present different perspectives on the topic.
    Ensure fair representation of diverse opinions without showing bias.
    Acknowledge where the context presents limited viewpoints."""

    elif query_type == "Contextual":
        system_prompt = """You are a helpful assistant providing contextually relevant information.
    Answer the question considering both the query and its context.
    Make connections between the query context and the information in the provided documents.
    If the context doesn't fully address the specific situation, acknowledge the limitations."""

    else:
        system_prompt = """You are a helpful assistant. Answer the question based on the provided context. If you cannot answer from the context, acknowledge the limitations."""

    # Create user prompt by combining the context and the query
    user_prompt = f"""
    Context:
    {context}

    Question: {query}

    Please provide a helpful response based on the context.
    """

    # Generate response using the OpenAI client
    response = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ],
        temperature=0.2
    )

    # Return the generated response content
    return response.choices[0].message.content

## Complete RAG Pipeline with Adaptive Retrieval

In [None]:
def rag_with_adaptive_retrieval(pdf_path, query, k=4, user_context=None):
    """
    Complete RAG pipeline with adaptive retrieval.

    Args:
        pdf_path (str): Path to PDF document
        query (str): User query
        k (int): Number of documents to retrieve
        user_context (str): Optional user context

    Returns:
        Dict: Results including query, retrieved documents, query type, and response
    """
    print("\n=== RAG WITH ADAPTIVE RETRIEVAL ===")
    print(f"Query: {query}")

    # Process the document to extract text, chunk it, and create embeddings
    chunks, vector_store = process_document(pdf_path)

    # Classify the query to determine its type
    query_type = classify_query(query)
    print(f"Query classified as: {query_type}")

    # Retrieve documents using the adaptive retrieval strategy based on the query type
    retrieved_docs = adaptive_retrieval(query, vector_store, k, user_context)

    # Generate a response based on the query, retrieved documents, and query type
    response = generate_response(query, retrieved_docs, query_type)

    # Compile the results into a dictionary
    result = {
        "query": query,
        "query_type": query_type,
        "retrieved_documents": retrieved_docs,
        "response": response
    }

    print("\n=== RESPONSE ===")
    print(response)

    return result

## Evaluation Framework

In [None]:
def evaluate_adaptive_vs_standard(pdf_path, test_queries, reference_answers=None):
    """
    Compare adaptive retrieval with standard retrieval on a set of test queries.

    This function processes a document, runs both standard and adaptive retrieval methods
    on each test query, and compares their performance. If reference answers are provided,
    it also evaluates the quality of responses against these references.

    Args:
        pdf_path (str): Path to PDF document to be processed as the knowledge source
        test_queries (List[str]): List of test queries to evaluate both retrieval methods
        reference_answers (List[str], optional): Reference answers for evaluation metrics

    Returns:
        Dict: Evaluation results containing individual query results and overall comparison
    """
    print("=== EVALUATING ADAPTIVE VS. STANDARD RETRIEVAL ===")

    # Process document to extract text, create chunks and build the vector store
    chunks, vector_store = process_document(pdf_path)

    # Initialize collection for storing comparison results
    results = []

    # Process each test query with both retrieval methods
    for i, query in enumerate(test_queries):
        print(f"\n\nQuery {i+1}: {query}")

        # --- Standard retrieval approach ---
        print("\n--- Standard Retrieval ---")
        # Create embedding for the query
        query_embedding = create_embeddings(query)
        # Retrieve documents using simple vector similarity
        standard_docs = vector_store.similarity_search(query_embedding, k=4)
        # Generate response using a generic approach
        standard_response = generate_response(query, standard_docs, "General")

        # --- Adaptive retrieval approach ---
        print("\n--- Adaptive Retrieval ---")
        # Classify the query to determine its type (Factual, Analytical, Opinion, Contextual)
        query_type = classify_query(query)
        # Retrieve documents using the strategy appropriate for this query type
        adaptive_docs = adaptive_retrieval(query, vector_store, k=4)
        # Generate a response tailored to the query type
        adaptive_response = generate_response(query, adaptive_docs, query_type)

        # Store complete results for this query
        result = {
            "query": query,
            "query_type": query_type,
            "standard_retrieval": {
                "documents": standard_docs,
                "response": standard_response
            },
            "adaptive_retrieval": {
                "documents": adaptive_docs,
                "response": adaptive_response
            }
        }

        # Add reference answer if available for this query
        if reference_answers and i < len(reference_answers):
            result["reference_answer"] = reference_answers[i]

        results.append(result)

        # Display preview of both responses for quick comparison
        print("\n--- Responses ---")
        print(f"Standard: {standard_response[:200]}...")
        print(f"Adaptive: {adaptive_response[:200]}...")

    # Calculate comparative metrics if reference answers are available
    if reference_answers:
        comparison = compare_responses(results)
        print("\n=== EVALUATION RESULTS ===")
        print(comparison)

    # Return the complete evaluation results
    return {
        "results": results,
        "comparison": comparison if reference_answers else "No reference answers provided for evaluation"
    }

In [None]:
def compare_responses(results):
    """
    Compare standard and adaptive responses against reference answers.

    Args:
        results (List[Dict]): Results containing both types of responses

    Returns:
        str: Comparison analysis
    """
    # Define the system prompt to guide the AI in comparing responses
    comparison_prompt = """You are an expert evaluator of information retrieval systems.
    Compare the standard retrieval and adaptive retrieval responses for each query.
    Consider factors like accuracy, relevance, comprehensiveness, and alignment with the reference answer.
    Provide a detailed analysis of the strengths and weaknesses of each approach."""

    # Initialize the comparison text with a header
    comparison_text = "# Evaluation of Standard vs. Adaptive Retrieval\n\n"

    # Iterate through each result to compare responses
    for i, result in enumerate(results):
        # Skip if there is no reference answer for the query
        if "reference_answer" not in result:
            continue

        # Add query details to the comparison text
        comparison_text += f"## Query {i+1}: {result['query']}\n"
        comparison_text += f"*Query Type: {result['query_type']}*\n\n"
        comparison_text += f"**Reference Answer:**\n{result['reference_answer']}\n\n"

        # Add standard retrieval response to the comparison text
        comparison_text += f"**Standard Retrieval Response:**\n{result['standard_retrieval']['response']}\n\n"

        # Add adaptive retrieval response to the comparison text
        comparison_text += f"**Adaptive Retrieval Response:**\n{result['adaptive_retrieval']['response']}\n\n"

        # Create the user prompt for the AI to compare the responses
        user_prompt = f"""
        Reference Answer: {result['reference_answer']}

        Standard Retrieval Response: {result['standard_retrieval']['response']}

        Adaptive Retrieval Response: {result['adaptive_retrieval']['response']}

        Provide a detailed comparison of the two responses.
        """

        # Generate the comparison analysis using the OpenAI client
        response = client.chat.completions.create(
            model="o1",
            messages=[
                {"role": "system", "content": comparison_prompt},
                {"role": "user", "content": user_prompt}
            ],
            temperature=0.2
        )

        # Add the AI's comparison analysis to the comparison text
        comparison_text += f"**Comparison Analysis:**\n{response.choices[0].message.content}\n\n"

    return comparison_text  # Return the complete comparison analysis

## Evaluating the Adaptive Retrieval System (Customized Queries)

The final step to use the adaptive RAG evaluation system is to call the evaluate_adaptive_vs_standard() function with your PDF document and test queries:

In [None]:
# Path to your knowledge source document
# This PDF file contains the information that the RAG system will use
pdf_path = "AI_Information.pdf"

# Define test queries covering different query types to demonstrate
# how adaptive retrieval handles various query intentions
test_queries = [
    "What is Explainable AI (XAI)?",                                              # Factual query - seeking definition/specific information
    # "How do AI ethics and governance frameworks address potential societal impacts?",  # Analytical query - requiring comprehensive analysis
    # "Is AI development moving too fast for proper regulation?",                   # Opinion query - seeking diverse perspectives
    # "How might explainable AI help in healthcare decisions?",                     # Contextual query - benefits from context-awareness
]

# Reference answers for more thorough evaluation
# These can be used to objectively assess response quality against a known standard
reference_answers = [
    "Explainable AI (XAI) aims to make AI systems transparent and understandable by providing clear explanations of how decisions are made. This helps users trust and effectively manage AI technologies.",
    # "AI ethics and governance frameworks address potential societal impacts by establishing guidelines and principles to ensure AI systems are developed and used responsibly. These frameworks focus on fairness, accountability, transparency, and the protection of human rights to mitigate risks and promote beneficial output.5.",
    # "Opinions on whether AI development is moving too fast for proper regulation vary. Some argue that rapid advancements outpace regulatory efforts, leading to potential risks and ethical concerns. Others believe that innovation should continue at its current pace, with regulations evolving alongside to address emerging challenges.",
    # "Explainable AI can significantly aid healthcare decisions by providing transparent and understandable insights into AI-driven recommendations. This transparency helps healthcare professionals trust AI systems, make informed decisions, and improve patient output by understanding the rationale behind AI suggestions."
]

In [None]:
# Run the evaluation comparing adaptive vs standard retrieval
# This will process each query using both methods and compare the results
evaluation_results = evaluate_adaptive_vs_standard(
    pdf_path=pdf_path,                  # Source document for knowledge extraction
    test_queries=test_queries,          # List of test queries to evaluate
    reference_answers=reference_answers  # Optional ground truth for comparison
)

# The results will show a detailed comparison between standard retrieval and
# adaptive retrieval performance across different query types, highlighting
# where adaptive strategies provide improved outcomes
print(evaluation_results["comparison"])

=== EVALUATING ADAPTIVE VS. STANDARD RETRIEVAL ===
Extracting text from PDF...
Chunking text...
Created 42 text chunks
Creating embeddings for chunks...
Added 42 chunks to the vector store


Query 1: What is Explainable AI (XAI)?

--- Standard Retrieval ---

--- Adaptive Retrieval ---
Query classified as: Factual
Executing Factual retrieval strategy for: 'What is Explainable AI (XAI)?'
Enhanced query: What are the core principles, methodologies, and real-world applications of Explainable AI (XAI) in improving transparency and interpretability of machine learning models?

--- Responses ---
Standard: Explainable AI (XAI) refers to techniques and approaches designed to make an AI system’s decision-making process more transparent and understandable. It helps users see how and why a particular outcom...
Adaptive: Explainable AI (XAI) refers to methods and techniques designed to make AI systems and their decision-making processes more transparent and understandable to humans. By offering ins

### 自适应检索：让RAG系统像人类一样"见机行事"

你发的代码实现了一个"会思考"的RAG系统——它能根据问题类型自动切换检索策略，就像人遇到不同问题会用不同思路解答一样。下面我用最通俗的语言拆解这个系统的核心逻辑：


### 一、给问题"贴标签"：查询分类器

想象你去图书馆问问题，图书管理员会先判断你要找什么类型的信息。这个系统的`classify_query`函数就是干这个的：

```python
def classify_query(query):
    # 让AI当"问题侦探"，把问题分成4类
    # 事实型：比如"谁发明了电灯？"（要具体答案）
    # 分析型：比如"为什么手机会影响睡眠？"（要原因分析）
    # 观点型：比如"你觉得元宇宙有前途吗？"（要不同看法）
    # 上下文型：比如"之前说的那个算法，能举个例子吗？"（依赖前面对话）
    # 最后只返回类别名称，不啰嗦
```

举个例子，当用户问"地球为什么会自转？"，系统会识别这是**分析型**问题，需要找因果关系的资料。


### 二、4种"解题思路"：不同问题不同策略

#### 1. 事实型问题：像查字典一样精准

```python
def factual_retrieval_strategy(query):
    # 比如用户问"相对论的提出者是谁？"
    # 第一步：把问题变得更精准，比如变成"爱因斯坦什么时候提出相对论？"
    # 第二步：用优化后的问题检索，只找最匹配的几段话
    # 第三步：让AI给每段话打分（0-10分），只返回最高分的结果
```

关键技巧：把模糊问题变成精准问题，就像查字典时知道准确词条更容易找到答案。

#### 2. 分析型问题：像拼图一样找全面信息

```python
def analytical_retrieval_strategy(query):
    # 比如用户问"人工智能如何影响就业？"
    # 第一步：把大问题拆成小问题，比如：
    #   "AI对哪些行业影响最大？"
    #   "AI创造了哪些新职业？"
    #   "AI导致失业的案例有哪些？"
    # 第二步：每个小问题都找资料，再把结果拼起来
    # 第三步：去重（比如不同小问题查到相同内容），保证信息全面不重复
```

核心逻辑：复杂问题拆分成多个小问题，就像拼图时先找各个小块再拼完整。

#### 3. 观点型问题：像辩论赛一样找不同观点

```python
def opinion_retrieval_strategy(query):
    # 比如用户问"自动驾驶该不该普及？"
    # 第一步：找出不同立场，比如：
    #   "支持普及的理由"
    #   "反对普及的风险"
    #   "中立的技术挑战"
    # 第二步：每个立场都找对应的资料
    # 第三步：把不同立场的观点整理好，不偏不倚地呈现
```

独特之处：不直接给答案，而是摆出各方观点，就像辩论赛评委汇总正反方论点。

#### 4. 上下文型问题：像聊天一样理解前因后果

```python
def contextual_retrieval_strategy(query, user_context):
    # 比如用户先问"什么是机器学习"，接着问"那深度学习呢？"
    # 系统会记住前面的问题，把当前问题理解为"深度学习和机器学习的关系"
    # 第一步：如果用户没给上下文，就猜可能的背景（比如推断用户是学生，需要基础解释）
    # 第二步：把问题和上下文结合，比如变成"学生需要知道的深度学习与机器学习区别"
    # 第三步：找资料时优先匹配有上下文关联的内容
```

关键能力：能记住对话历史，像人类聊天一样理解"前因后果"，避免答非所问。


### 三、给资料"打分"：LLM当裁判

```python
def score_document_relevance(query, document):
    # 比如用户问"Python怎么写循环"，查到一段讲Java循环的资料
    # LLM会给这段资料打低分（比如2分），因为不相关
    # 而查到Python循环的资料会打高分（比如9分）
    # 最后只返回高分的资料给用户
```

这个打分函数是系统的"火眼金睛"，能避免把不相关的信息返回给用户，就像老师批改作业时判断答案是否切题。


### 四、完整工作流程：从问题到答案的旅程

1. **用户提问**：比如"为什么树叶会变黄？"
2. **问题分类**：系统判断这是**分析型**问题（需要原因分析）
3. **策略选择**：启动分析型检索策略
4. **拆分成小问题**：
   - "树叶变黄的主要因素有哪些？"
   - "季节变化如何影响树叶颜色？"
   - "叶绿素分解的过程是怎样的？"
5. **分别检索**：每个小问题找相关资料
6. **整理结果**：把各个小问题的答案拼起来，形成完整解释
7. **生成回答**：用自然语言把资料里的信息讲清楚


### 五、和普通RAG的区别：为什么自适应更聪明？

普通RAG就像用同一把锤子敲所有钉子，而自适应RAG会根据钉子类型选工具：

- **普通RAG**：不管什么问题，都用"关键词匹配"找资料，可能把不相关的信息返回（比如问"苹果手机"却返回"苹果种植"的资料）
- **自适应RAG**：
  - 问事实型问题时，像字典一样精准
  - 问分析型问题时，像研究员一样全面
  - 问观点型问题时，像辩论会一样中立
  - 问上下文问题时，像聊天一样懂前因后果


### 六、实际应用场景举例

#### 场景1：学生学习

- **问题**："量子力学的基本原理是什么？"（事实型）
- **系统行为**：
  1. 把问题优化为"量子力学的五大基本原理详解"
  2. 精准找到教材中定义和公式的段落
  3. 返回简洁准确的定义，不带多余解释

#### 场景2：职场决策

- **问题**："AI工具该不该引入我们部门？"（观点型）
- **系统行为**：
  1. 找出"支持引入的3个理由"和"反对引入的2个风险"
  2. 分别从效率提升、成本节约、员工培训等角度找资料
  3. 呈现正反方观点，帮助用户做决策

#### 场景3：技术咨询

- **问题**："之前说的那个推荐算法，在电商场景怎么用？"（上下文型）
- **系统行为**：
  1. 记住之前讨论过"协同过滤算法"
  2. 把当前问题理解为"协同过滤算法在电商的应用案例"
  3. 找电商推荐系统的具体案例和数据，不重复讲算法原理


### 七、总结：让机器像人一样"灵活思考"

这个自适应RAG系统的核心价值在于：它不是机械地匹配关键词，而是像人类一样理解问题的"意图"和"类型"，再选择最合适的方法找答案。就像你问朋友问题时，对方会根据问题类型决定是给你查字典、摆事实、讲道理还是回忆之前的对话——这种灵活性让AI的回答更精准、更有用。