## Response Synthesis Optimization for RAG

Response Synthesis Optimization is a crucial technique in Retrieval Augmented Generation systems. It involves selecting and combining information from retrieved documents to generate the most relevant and coherent response. This optimization process aims to:

    - Relevance: Ensure the generated response directly addresses the user's query.
    - Coherence: Maintain logical flow and consistency in the response.
    - Factuality: Guarantee the accuracy of the information presented.

### Strategies for implementing RSO

Below are some of the strategies for implementing response synthesis on RAG systems:

- Extractive Summarization - Process of selecting relevant sentences or paragraphs. Common techniques used are TextRank and LexRank
- Sequence-to-Sequence Models - Direct response generation from retrieved documents. Example BART and T5
- Create and Refine - Start with the first node and generate an initial response. Then for subsequent nodes, refine the answer using additional context.
- Hierarchical Summarization - Generate an answer for each node independently, and then hierarchically combine the answers.

### Code sample for "Create and Refine" strategy


Employ a feedback loop to improve the response:
- Human Feedback: Collect ratings or suggestions from human evaluators.
- Model-Based Feedback: Use the language model to identify potential issues like factual inaccuracies or lack of coherence.

Refine the Response:
Revise the response by incorporating new information from the retrieved documents or modifying the generation process.
Repeat the Refinement Process: Iterate until the desired level of quality is achieved.

In [2]:
%pip install llama-index-readers-file pymupdf
%pip install llama-index-vector-stores-pinecone
%pip install llama-index-llms-openai

Collecting beautifulsoup4<5.0.0,>=4.12.3 (from llama-index-readers-file)
  Using cached beautifulsoup4-4.12.3-py3-none-any.whl.metadata (3.8 kB)
Using cached beautifulsoup4-4.12.3-py3-none-any.whl (147 kB)
[0mInstalling collected packages: beautifulsoup4
  Attempting uninstall: beautifulsoup4
[0m    Found existing installation: beautifulsoup4 4.12.2
[31mERROR: Cannot uninstall beautifulsoup4 4.12.2, RECORD file not found. You might be able to recover from this via: 'pip install --force-reinstall --no-deps beautifulsoup4==4.12.2'.[0m[31m
[0mNote: you may need to restart the kernel to use updated packages.
[0mNote: you may need to restart the kernel to use updated packages.
[0mNote: you may need to restart the kernel to use updated packages.


In [3]:
!pip install llama-index

Collecting beautifulsoup4<5.0.0,>=4.12.3 (from llama-index-readers-file<0.5.0,>=0.4.0->llama-index)
  Using cached beautifulsoup4-4.12.3-py3-none-any.whl.metadata (3.8 kB)
Using cached beautifulsoup4-4.12.3-py3-none-any.whl (147 kB)
[0mInstalling collected packages: beautifulsoup4
  Attempting uninstall: beautifulsoup4
[0m    Found existing installation: beautifulsoup4 4.12.2
[31mERROR: Cannot uninstall beautifulsoup4 4.12.2, RECORD file not found. You might be able to recover from this via: 'pip install --force-reinstall --no-deps beautifulsoup4==4.12.2'.[0m[31m
[0m

In [4]:
!pip install --upgrade langchain

[0m

In [5]:
!pip install langchain-community

[0m

In [6]:
from langchain.document_loaders import PyPDFLoader

In [None]:
loader = PyPDFLoader("Benchmark-GLUE-data-pdf.pdf")
documents = loader.load()

In [None]:

# Function to preprocess the text
def preprocess_text(text):
    # 1. Remove extra whitespaces
    text = re.sub(r'\s+', ' ', text).strip()
    
    # 2. Remove emails
    text = re.sub(r'\S+@\S+', '', text)
    
    # 3. Remove special characters (except basic punctuation)
    text = re.sub(r'[^a-zA-Z0-9.,;:\'"\s-]', '', text)
    
    # 4. Remove numbers if unnecessary
 #   text = re.sub(r'\b\d+\b', '', text)
    
    # 5. Convert to lowercase for uniformity
    text = text.lower()
    
    # 6. Remove headers/footers if present
    text = re.sub(r'published as a conference paper.*?iclr \d{4}', '', text, flags=re.IGNORECASE)
    
    # Return cleaned text
    return text

# Preprocess each document's content
preprocessed_documents = [
    Document(metadata=doc.metadata, page_content=preprocess_text(doc.page_content))
    for doc in documents
]

# Output the preprocessed documents
for idx, doc in enumerate(preprocessed_documents):
    print(f"Document {idx + 1}")
    print(f"Metadata: {doc.metadata}")
    print(f"Content: {doc.page_content[:500]}")  # Print the first 500 characters of content
    print("-" * 40)

In [None]:

# Define the character splitter
text_splitter = CharacterTextSplitter(
    separator=" ",  # Use space as the separator
    chunk_size=300,  # Define the maximum size of each chunk
    chunk_overlap=50  # Overlap between chunks to maintain context
)

# Apply the splitter to preprocessed documents
split_documents = []
for doc in preprocessed_documents:
    chunks = text_splitter.split_text(doc.page_content)
    for chunk in chunks:
        # Create new Document objects for each chunk
        split_documents.append(Document(metadata=doc.metadata, page_content=chunk))

# Output the split documents
for idx, doc in enumerate(split_documents[:5]):  # Display only the first 5 chunks
    print(f"Chunk {idx + 1}")
    print(f"Metadata: {doc.metadata}")
    print(f"Content: {doc.page_content[:200]}")  # Print the first 200 characters
    print("-" * 40)

In [None]:
from llama_index.llms.openai import OpenAI
from llama_index.core import PromptTemplate

In [None]:
refine_prompt = PromptTemplate(
    """\
The original query is as follows: {query_str}
We have provided an existing answer: {existing_answer}
We have the opportunity to refine the existing answer \
(only if needed) with some more context below.
------------
{context_str}
------------
Given the new context, refine the original answer to better answer the query. \
If the context isn't useful, return the original answer.
Refined Answer: \
"""
)

In [None]:
from llama_index.core.response.notebook_utils import display_source_node


def generate_response_cr(
    retrieved_nodes, query_str, qa_prompt, refine_prompt, llm
):
    """Generate a response using create and refine strategy.

    The first node uses the 'QA' prompt.
    All subsequent nodes use the 'refine' prompt.

    """
    cur_response = None
    fmt_prompts = []
    for idx, node in enumerate(retrieved_nodes):
        print(f"[Node {idx}]")
        display_source_node(node, source_length=2000)
        context_str = node.get_content()
        if idx == 0:
            fmt_prompt = qa_prompt.format(
                context_str=context_str, query_str=query_str
            )
        else:
            fmt_prompt = refine_prompt.format(
                context_str=context_str,
                query_str=query_str,
                existing_answer=str(cur_response),
            )

        cur_response = llm.complete(fmt_prompt)
        fmt_prompts.append(fmt_prompt)

    return str(cur_response), fmt_prompts

In [None]:
response, fmt_prompts = generate_response_cr(
    retrieved_nodes, query_str, qa_prompt, refine_prompt, llm
)

In [None]:
print(response)

### Code sample for Hierarchical Summarization Strategy

Hierarchical Document Structure:
Represent documents as a hierarchical tree, with nodes representing paragraphs or sections.
Assign importance scores to nodes based on relevance to the query.

- Top-Down Summarization:
Traverse the tree from root to leaves, selecting important nodes.
Summarize selected nodes using extractive or abstractive techniques.

- Bottom-Up Refinement:
Combine child node summaries to create parent node summaries.
Refine summaries by removing redundancy and improving coherence.

In [None]:
def combine_results(
    texts,
    query_str,
    qa_prompt,
    llm,
    cur_prompt_list,
    num_children=10,
):
    new_texts = []
    for idx in range(0, len(texts), num_children):
        text_batch = texts[idx : idx + num_children]
        context_str = "\n\n".join([t for t in text_batch])
        fmt_qa_prompt = qa_prompt.format(
            context_str=context_str, query_str=query_str
        )
        combined_response = llm.complete(fmt_qa_prompt)
        new_texts.append(str(combined_response))
        cur_prompt_list.append(fmt_qa_prompt)

    if len(new_texts) == 1:
        return new_texts[0]
    else:
        return combine_results(
            new_texts, query_str, qa_prompt, llm, num_children=num_children
        )


def generate_response_hs(
    retrieved_nodes, query_str, qa_prompt, llm, num_children=10
):
    """Generate a response using hierarchical summarization strategy.

    Combine num_children nodes hierarchically until we get one root node.

    """
    fmt_prompts = []
    node_responses = []
    for node in retrieved_nodes:
        context_str = node.get_content()
        fmt_qa_prompt = qa_prompt.format(
            context_str=context_str, query_str=query_str
        )
        node_response = llm.complete(fmt_qa_prompt)
        node_responses.append(node_response)
        fmt_prompts.append(fmt_qa_prompt)

    response_txt = combine_results(
        [str(r) for r in node_responses],
        query_str,
        qa_prompt,
        llm,
        fmt_prompts,
        num_children=num_children,
    )

    return response_txt, fmt_prompts

In [None]:
response, fmt_prompts = generate_response_hs(
    retrieved_nodes, query_str, qa_prompt, llm
)

In [None]:
print(str(response))