## Presteps to Load llama3.2 On Colab

In [None]:
import tensorflow as tf
from psutil import virtual_memory

# Check GPU
gpu_info = tf.config.list_physical_devices('GPU')
print(f"GPU Info: {gpu_info}")

# Check RAM
ram_info = virtual_memory()
print(f"Total RAM: {ram_info.total / (1024**3)} GB")

GPU Info: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
Total RAM: 12.67477035522461 GB


In [None]:
!sudo apt-get install -y pciutils
!curl -fsSL https://ollama.com/install.sh | sh # download ollama api
from IPython.display import clear_output

# Create a Python script to start the Ollama API server in a separate thread

import os
import threading
import subprocess
import requests
import json

def ollama():
    os.environ['OLLAMA_HOST'] = '0.0.0.0:11434'
    os.environ['OLLAMA_ORIGINS'] = '*'
    subprocess.Popen(["ollama", "serve"])

ollama_thread = threading.Thread(target=ollama)
ollama_thread.start()


Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
pciutils is already the newest version (1:3.7.0-6).
0 upgraded, 0 newly installed, 0 to remove and 49 not upgraded.
>>> Cleaning up old version at /usr/local/lib/ollama
>>> Installing ollama to /usr/local
>>> Downloading Linux amd64 bundle
############################################################################################# 100.0%
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> NVIDIA GPU installed.
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.


In [None]:
from IPython.display import clear_output
!ollama pull llama3.2:3b  & ollama pull nomic-embed-text

[?25lpulling manifest ⠋ [?25h[?25lpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest 
pulling dde5aa3fc5ff... 100% ▕▏ 2.0 GB                         
pulling 966de95ca8a6... 100% ▕▏ 1.4 KB                         
pulling fcc5a6bec9da... 100% ▕▏ 7.7 KB                         
pulling a70ff7e570d9... 100% ▕▏ 6.0 KB                         
pulling 56bb8bd477a5... 100% ▕▏   96 B                         
pulling 34bb5ab01051... 100% ▕▏  561 B                         
verifying sha256 digest 
writing manifest 
success [?25h
[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest 
pulling 970aa74c0a90...   0% ▕▏    0 B/274 MB                  [?25h[?25l[2K[1G[A[2K[1Gpulling manifest 
pulling 970aa74c0a90...   0% ▕▏    0 B/274 MB                  [?25h[?25l[2K[1G[A

## Presteps to Load llama3.2 Locally

**Hardware Requirements** <br>
**CPU**: Multicore processor<br>
**RAM**: Minimum of 16 GB recommended<br>
**GPU**: NVIDIA RTX series (for optimal performance), at least 8 GB VRAM<br>

**Step1**:<br>
Download ollama from this site according to your operating system<br>
https://ollama.com/download/linux<br>
<br>
**Step2**:<br>
open your teminal<br>
<br>
**Step3**:<br>
run following commands in your terminal<br>
\$ ollama serve<br>
\$ ollama pull llama3.2:3b  & ollama pull nomic-embed-text<br>

## Load LlaMA3.2

In [None]:
!pip install -r "/content/RAG requirements.txt"



In [None]:
from langchain_community.llms import Ollama
from langchain_community.embeddings import OllamaEmbeddings

MODEL = "llama3.2:3b"

# Initialize the Llama model
model = Ollama(model=MODEL)

# Create an embedding model
embeddings = OllamaEmbeddings(model="nomic-embed-text")

[?25lpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest 
pulling dde5aa3fc5ff... 100% ▕▏ 2.0 GB                         
pulling 966de95ca8a6... 100% ▕▏ 1.4 KB                         
pulling fcc5a6bec9da... 100% ▕▏ 7.7 KB                         
pulling a70ff7e570d9... 100% ▕▏ 6.0 KB                         
pulling 56bb8bd477a5... 100% ▕▏   96 B                         
pulling 34bb5ab01051... 100% ▕▏  561 B                         
verifying sha256 digest 
writing manifest 
success [?25h
[?25lpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest 
pulling 970aa74c0a90... 100% ▕▏ 274 MB                         
pulling c71d239df917... 100% ▕▏  11 KB                         
pulling ce4a164fc046... 100% ▕▏   17 B                         
pulling 31df23ea7daa... 100% ▕▏  420 B                         
verifying sha256 dig

In [None]:
print(model.invoke("Hi. Are you LlaMA, the language model?"))

Hello! Yes, I'm an instance of Llama, a large language model developed by Meta. I'm designed to process and generate human-like text based on the input I receive. How can I help you today?


## Part1 Standard RAG

In [None]:
from langchain import hub
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.vectorstores import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough, RunnableMap
from chromadb.errors import InvalidDimensionException

#### INDEXING ####

loader = PyPDFLoader("/content/RAG_survey.pdf")
docs = loader.load()

# Split
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
# text_splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=50)
splits = text_splitter.split_documents(docs)

# Embed
## NOTE: you must run Chroma().delete_collection() before load the Chroma vectorstore
## to delete previous loaded documents.
Chroma().delete_collection()
vectorstore = Chroma.from_documents(documents = splits, embedding=embeddings)

retriever = vectorstore.as_retriever()

# Prompt
prompt = hub.pull("rlm/rag-prompt")

# LLM
llm = model

### (a) Chain the Components:

In [None]:
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


# Parser
output_parser = StrOutputParser()

# Chain

rag_chain = (
    RunnablePassthrough.assign(
        context=retriever      #用前面format_docs定義的方式來抓文件
    ) |                 # | 就代表串在(chain)一起了
    RunnableMap({
        "context": lambda x: x['context'],
        "question": lambda x: x['question']
    }) |
    (lambda x: f"Context:\n{x['context']}\n\nQuestion:\n{x['question']}") |
    llm |
    output_parser
)

In [None]:
rag_chain.invoke({"context": retriever, "question" : "what is this paper about?"})
rag_chain.invoke({"context": retriever, "question" : "Is it difficult to implement RAG ?"})

'The text does not explicitly state that implementing RAG is difficult. However, it mentions some challenges that RAG currently faces (Section VII), such as:\n\n* The need for continuous knowledge updates\n* Integration of domain-specific information\n* Wide adoption of ChatGPT\n\nIt also discusses the difficulties in addressing these challenges and points out prospective avenues for research and development.\n\nAdditionally, the text mentions that there are different versions of RAG paradigms (Naive RAG, Advanced RAG, Modular RAG), which might imply that implementing RAG can be complex due to the need to understand and integrate these different components.'

### (b) Explain TextSplitter Settings

Discussion: 作業所使用的Text splitter為RecursiveCharacterTextSplitter，因為語言模型能接受的token數量有限，所以要對文件先進行前處理，Text splitter的功能就是先將文本按照字符(character)切成區塊(chunk)，再根據區塊長度(chunk size)將這些區塊結合起來，但如果每一個區塊都沒有重疊的話，就可能會遺失上下文的重要資訊，所以可以透過設定區塊重疊(chunk overlap)的大小來保留上下文的資訊。


一開始設定的區塊長度為500，區塊重疊大小為100，就是會每個區塊長度為500個字符，然後每個區塊前後會有100個字符是重疊的，此時詢問RAG論文的內容回答如下:
"This paper appears to be about the Reader-Actor-Generator (RAG) system, specifically its indexing and retrieval phases. It explains how the RAG system processes documents, encodes them into vectors, and retrieves relevant chunks based on semantic similarity scores. The paper seems to focus on optimizing the indexing phase to improve the efficiency of the system's performance in answering questions."


現在嘗試把區塊長度改為100，區塊重疊大小改為50，一樣的問題回答如下：

Based on the provided snippet, it appears that this paper is discussing vector databases and their application in information retrieval (IR). The text mentions indexing, parallel search capabilities, and diversifying results for different query perspectives.

Given the title of the PDF, "RAG Survey", I would venture to guess that this paper is a survey or overview of the state-of-the-art in vector-based systems, possibly covering aspects such as:

1. Vector databases (e.g., Faiss, Annoy)
2. Indexing and search techniques
3. Applications in IR, NLP, and other areas

However, without reading the full text, I can only make an educated guess about the specific focus of the paper.

回答就多了不確定的成分在裡面，摘要也顯得冗長，所以適當的區塊長度與重疊大小對於分析文本是不可或缺的。




### (c) Experiment with Retriever Settings

1. 把搜尋文件數量從預設的4改為5

In [None]:
## TODO: Try some different settings for the retriever and output some examples
## you can cahnge the question if you want
## you can duplicate this cell to ouput different examples
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
retrived_docs= retriever.invoke("what is this paper about?")
for doc in retrived_docs:
    print()
    print(doc)



page_content='The paper unfolds as follows: Section II introduces the
main concept and current paradigms of RAG. The following
three sections explore core components—“Retrieval”, “Gen-
eration” and “Augmentation”, respectively. Section III focuses
on optimization methods in retrieval,including indexing, query
and embedding optimization. Section IV concentrates on post-
retrieval process and LLM fine-tuning in generation. Section V
analyzes the three augmentation processes. Section VI focuses' metadata={'page': 1, 'source': '/content/RAG_survey.pdf'}

page_content='analyzes the three augmentation processes. Section VI focuses
on RAG’s downstream tasks and evaluation system. Sec-
tion VII mainly discusses the challenges that RAG currently
faces and its future development directions. At last, the paper
concludes in Section VIII.
II. O VERVIEW OF RAG
A typical application of RAG is illustrated in Figure 2.
Here, a user poses a question to ChatGPT about a recent,
widely discussed news. Giv

2. 以Similarity score門檻來提取文件



In [None]:
## TODO: Try some different settings for the retriever and output some examples
## you can cahnge the question if you want
## you can duplicate this cell to ouput different examples
retriever = vectorstore.as_retriever(
    search_type="similarity_score_threshold",
    search_kwargs={
        "k": 5,  # 最多提取5份文件
        "score_threshold": 0.7  # 相似度門檻設定為0.7
    }
)
retrived_docs= vectorstore.similarity_search_with_score("what is this paper about?")
for doc in retrived_docs:
    print()
    print(doc)


(Document(metadata={'page': 1, 'source': '/content/RAG_survey.pdf'}, page_content='The paper unfolds as follows: Section II introduces the\nmain concept and current paradigms of RAG. The following\nthree sections explore core components—“Retrieval”, “Gen-\neration” and “Augmentation”, respectively. Section III focuses\non optimization methods in retrieval,including indexing, query\nand embedding optimization. Section IV concentrates on post-\nretrieval process and LLM fine-tuning in generation. Section V\nanalyzes the three augmentation processes. Section VI focuses'), 326.7880554199219)

(Document(metadata={'page': 1, 'source': '/content/RAG_survey.pdf'}, page_content='analyzes the three augmentation processes. Section VI focuses\non RAG’s downstream tasks and evaluation system. Sec-\ntion VII mainly discusses the challenges that RAG currently\nfaces and its future development directions. At last, the paper\nconcludes in Section VIII.\nII. O VERVIEW OF RAG\nA typical application of R

3. 以Maximal Marginal Relevance (MMR) 來提取

In [None]:
## TODO: Try some different settings for the retriever and output some examples
## you can cahnge the question if you want
## you can duplicate this cell to ouput different examples
retriever = vectorstore.as_retriever(
    search_type="mmr",
    search_kwargs={
        "k": 5,
        "fetch_k": 10,  # 在篩選文件前先列出10個候選文件
        "lambda_mult": 0.5  # 在相似度與多元性之間取平衡
    }
)
retrived_docs= retriever.invoke("what is this paper about?")
for doc in retrived_docs:
    print()
    print(doc)


page_content='The paper unfolds as follows: Section II introduces the
main concept and current paradigms of RAG. The following
three sections explore core components—“Retrieval”, “Gen-
eration” and “Augmentation”, respectively. Section III focuses
on optimization methods in retrieval,including indexing, query
and embedding optimization. Section IV concentrates on post-
retrieval process and LLM fine-tuning in generation. Section V
analyzes the three augmentation processes. Section VI focuses' metadata={'page': 1, 'source': '/content/RAG_survey.pdf'}

page_content='analyzes the three augmentation processes. Section VI focuses
on RAG’s downstream tasks and evaluation system. Sec-
tion VII mainly discusses the challenges that RAG currently
faces and its future development directions. At last, the paper
concludes in Section VIII.
II. O VERVIEW OF RAG
A typical application of RAG is illustrated in Figure 2.
Here, a user poses a question to ChatGPT about a recent,
widely discussed news. Giv

Discussion:

此處實驗三種方式來提取文件

1. 依照相似度分數來提取5筆文件:單純提取相似度分數排名前五名的文件，因為都固定會提五筆文件，當相似度分數都很低時，可能就會提取五筆都是相似度低的結果。
2. 提高相似度分數門檻:對於相似度分數設下門檻，會過濾掉品質不好的答案，此時就不一定會再抓取五筆文件，而是只會抓取有到門檻的文件，會保證結果之間的高關聯性。
3. MMR：兼顧多樣性與準確性的方式，一樣會產出五筆文件，但是在過濾之前會先有候選名單，此處設定為10筆，再透過lambda_mult這個參數來決定要以偏向準確、多樣性，或是兩者兼顧，此處設定為0.5就是兩者兼顧。設定為0就是只考慮多樣性、1則是只注重準確性。


在不同的情境之下，需要的設定也不一樣，今天若單純考慮相似度，則以第一種方式來設定k值；如果資料庫中有很多不相關的文件，則就要以第二種方式，對相似度分數設下門檻來過濾掉不必要的資訊；如果要進行不同面向的解釋，又要避免重複，則可以考慮第三種MMR的提取方式。




## Part2 Multi-Query RAG

### (a) Prompt Template for Multi-Query:

In [None]:
from langchain.prompts import ChatPromptTemplate

## TODO: Please design a prompt template that instructs the language model to respond to questions from multiple perspectives.
template = """
You are tasked with improving the coverage of a search query by generating related queries.
Given the original question, generate multiple related questions that approach the topic from different perspectives.
These perspectives could include:
- Broader or narrower interpretations.
- Alternative phrasing or synonyms.
- Questions targeting specific subtopics or contexts.
- Questions considering different viewpoints or assumptions.

Original question: {question}

Please provide 5 diverse related queries:
1.
2.
3.
4.
5.
"""
prompt_perspectives = ChatPromptTemplate.from_template(template)

In [None]:
generate_queries = (
    prompt_perspectives
    | llm
    | StrOutputParser()
    | (lambda x: x.split("\n"))
)

In [None]:
# You may generate some queries here to see if the queries diverse enough
question = "What is this paper about?"
generate_queries.invoke({"question": question})

['Here are 5 diverse related questions that approach the topic from different perspectives:',
 '',
 '1. **Broader interpretation:** What topics or subjects might be covered in a paper of this type?',
 '',
 'This question broadens the scope to consider what types of papers might be similar to the one being asked about, rather than focusing solely on its content.',
 '',
 '2. **Alternative phrasing:** Can someone summarize the main points of this paper in 50 words or less?',
 '',
 'This question encourages an alternative way of approaching the topic by asking for a concise summary, which could reveal key takeaways and main ideas.',
 '',
 '3. **Subtopic-specific query:** What specific aspect of the paper does it focus on, such as methodology, results, or conclusions?',
 '',
 'This question targets a specific subtopic within the paper, encouraging the generation of questions that explore a particular facet of the content.',
 '',
 '4. **Alternative viewpoint:** How does this paper relate to 

設計完提示(prompt)之後，此處嘗試產生一些查詢(query)來看看結果，第一個問題是關於讀者如何去詮釋內容，是偏向後設分析的方法，且假設讀者是具有學術背景的。第二個問題是在不同領域中如何找出同義詞，和第一個問題著重的點就不一樣，整體像產出的查詢內容最後一段說的，這些相關的查詢針對問題(What is this paper about?)都有不同的角度，像是讀者詮釋、語言學上的差異、數位時代下的理解、文化背景、批判思考等，所以應該能改善提取內容的廣度。


In [None]:
from langchain.load import dumps, loads

def get_unique_union(documents: list[list]):
    """ Unique union of retrieved docs """
    # Flatten list of lists, and convert each Document to string
    flattened_docs = [dumps(doc) for sublist in documents for doc in sublist]
    # Get unique documents
    unique_docs = list(set(flattened_docs))
    # Return
    return [loads(doc) for doc in unique_docs]

# Retrieve
question = "What is this paper about?"
retrieval_chain = generate_queries | retriever.map() | get_unique_union # 輸入單一查詢，但會產生多個查詢
docs = retrieval_chain.invoke({"question":question})

33

### (b) Multi-Query RAG Chain:

In [None]:
from operator import itemgetter
from langchain.chains import LLMChain, SimpleSequentialChain
from langchain.prompts import ChatPromptTemplate
from langchain.schema import Document
# RAG(最後回答問題時需要的模板)
template = """Answer the following question based on this context:

{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)


# # TODO: Coctruct a Multi-Query RAG Chain.
# Hint1: use the retrieval_chain in this chain
# Hint2: consider the format of the prompt above and also use it in the chain
multi_query_rag_chain = (

    {"context": retrieval_chain,
    "question": itemgetter("question")}
    | prompt
    | llm
    | StrOutputParser()
)
# 使用兩個不一樣的問題來測試模型回答表現
# question = "What is this paper about?"  #單純詢問論文內容
question = "Is it difficult to implement RAG"
answer = multi_query_rag_chain.invoke({"question": question})
print(answer)

Based on the text, it does not appear that implementing RAG is particularly difficult. In fact, the paper highlights "state-of-the-art technologies" and a new evaluation framework for RAG systems, suggesting that research has made significant progress in understanding and improving the framework.

However, there are some challenges mentioned in the paper that may make implementing RAG more complicated than simply following a recipe. For example:

* Addressing the semantic gap between questions and documents
* Handling cases where external knowledge retrieval is necessary but also when it's not
* Adapting to different tasks and domains

Overall, while implementing RAG may require some technical expertise and creativity, there doesn't seem to be an indication that it's inherently difficult.


### (c) Example Comparisons:

In [None]:
## TODO:  show a standard RAG output example alongside a multi-query RAG output example.
# Hint1: You may adjust the question to highlight the advantages of multi-query RAG over standard RAG.

Discussion:

1.  標準RAG(單一查詢)：在詢問這篇論文的內容時，標準的RAG產出的內容已經能符合需求，因此在後來嘗試詢問"Is it difficult to implement RAG?"(要實現RAG會很難嗎？)來看看模型的回應，回應如下:

  The text does not explicitly state that implementing RAG is difficult. However, it mentions some challenges that RAG currently faces (Section VII), such as:\n\n* The need for continuous knowledge updates\n* Integration of domain-specific information\n* Wide adoption of ChatGPT\n\nIt also discusses the difficulties in addressing these challenges and points out prospective avenues for research and development.\n\nAdditionally, the text mentions that there are different versions of RAG paradigms (Naive RAG, Advanced RAG, Modular RAG), which might imply that implementing RAG can be complex due to the need to understand and integrate these different components.

  標準RAG有去找一些關鍵字像是challenge，最後也有提到一些不同版本的RAG，但是沒有很明確回答實現RAG會不會很困難。

2.  多重查詢RAG：一樣詢問實現RAG會不會很難，回答如下

  Based on the text, it does not appear that implementing RAG is particularly difficult. In fact, the paper highlights "state-of-the-art technologies" and a new evaluation framework for RAG systems, suggesting that research has made significant progress in understanding and improving the framework.

  However, there are some challenges mentioned in the paper that may make implementing RAG more complicated than simply following a recipe. For example:

  * Addressing the semantic gap between questions and documents
  * Handling cases where external knowledge retrieval is necessary but also when it's not
  * Adapting to different tasks and domains

  Overall, while implementing RAG may require some technical expertise and creativity, there doesn't seem to be an indication that it's inherently difficult.

  這裡首先注意到回答內容比較精簡，不會像標準RAG還會泛論這篇論文的內容，多重查詢RAG的回答直接告訴使用者要實現RAG不會很難，並且列出可能遇到的挑戰。比起標準RAG，多重查詢使用多個角度來處理使用者詢問的問題，因此在回答上的表現比較肯定也比較全面，然後也能提供使用者較好的答案，確實有比標準RAG的表現好。  




## Part3 RAG Fusion

In [None]:
# TODO: Use the same templat as Part2
template = """
You are tasked with improving the coverage of a search query by generating related queries.
Given the original question, generate multiple related questions that approach the topic from different perspectives.
These perspectives could include:
- Broader or narrower interpretations.
- Alternative phrasing or synonyms.
- Questions targeting specific subtopics or contexts.
- Questions considering different viewpoints or assumptions.

Original question: {question}

Please provide 5 diverse related queries:
1.
2.
3.
4.
5.
"""

# template = """Answer the following question based on this context:

# {context}

# Question: {question}
# """
prompt_rag_fusion = ChatPromptTemplate.from_template(template)

In [None]:
generate_queries = (
    prompt_rag_fusion
    | llm
    | StrOutputParser()
    | (lambda x: x.split("\n"))
)

In [None]:
question = "What is this paper about?"
generate_queries.invoke(question)

['Here are five diverse related queries that approach the topic from different perspectives:',
 '',
 '1. **Narrower interpretation**: What are some challenges in implementing a Readability Analysis Graph (RAG) for a specific industry or domain?',
 '',
 'This query targets a specific context, such as an industry or domain, which may require more tailored approaches to RAG implementation.',
 '',
 '2. **Alternative phrasing or synonyms**: How do I create a RAG model from scratch using Python, and what are some common pitfalls to avoid?',
 '',
 'This query uses different wording and focuses on the technical implementation details of creating a RAG model using Python.',
 '',
 '3. **Targeting specific subtopics or contexts**: Is there an RAG-based approach that can be applied to sentiment analysis for social media posts, and if so, how does it work?',
 '',
 'This query targets a specific application of RAG (sentiment analysis) in a particular context (social media posts), which may require a

### (a) Implement Reciprocal Rank Fusion (RRF)

In [None]:
def reciprocal_rank_fusion(results: list[list], c=60):
    """ Reciprocal_rank_fusion that takes multiple lists of ranked documents
        and an optional parameter k used in the RRF formula """

    # Initialize a dictionary to hold fused scores for each unique document
    fused_scores = {}

    # Iterate through each list of ranked documents
    for docs in results:
        # Iterate through each document in the list, with its rank (position in the list)
        for rank, doc in enumerate(docs):
          ## TODO:  Implement Reciprocal Rank Fusion here
          doc_str = dumps(doc) #要當成key來使用(不然document為unhashable)
          score = 1 / (c+rank)
          if doc_str not in fused_scores:
            fused_scores[doc_str] = 0
          fused_scores[doc_str] += score


    # Sort the documents based on their fused scores in descending order to get the final reranked results
    reranked_results = [
        (loads(doc), score)
        for doc, score in sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)
    ]

    # Return the reranked results as a list of tuples, each containing the document and its fused score
    return reranked_results

retrieval_chain_rag_fusion = generate_queries | retriever.map() | reciprocal_rank_fusion

### (b) RRF Example and c-Value Discussion:

In [None]:
## TODO: Provide an example showing the documents after re-ranking using RRF.

c_values = [10, 60, 100]
results = retrieval_chain_rag_fusion.invoke(question)
for c in c_values:
    reranked_results = reciprocal_rank_fusion(results, c=c)
    print(f"Results for c = {c}:")
    for doc, score in reranked_results:
        print(f"  {doc}: {score:.4f}")
    print()

Results for c = 10:
  0.016129032258064516: 0.3636
  0.015625: 0.3636
  0.015873015873015872: 0.2727
  0.032266458495966696: 0.1818
  0.016666666666666666: 0.1818
  0.01639344262295082: 0.1818
  page_content='structural and semantic nuances. The initial phase focuses on
the retriever, where contrastive learning is harnessed to refine
the query and document embeddings.
Aligning LLM outputs with human or retriever preferences
through reinforcement learning is a potential approach. For
instance, manually annotating the final generated answers
and then providing feedback through reinforcement learning.
In addition to aligning with human preferences, it is also' metadata={'page': 9, 'source': '/content/RAG_survey.pdf'}: 0.1000
  page_content='In addition to extracting metadata from the original doc-
uments, metadata can also be artificially constructed. For
example, adding summaries of paragraph, as well as intro-
ducing hypothetical questions. This method is also known as
Reverse HyDE. Spe

Discussion:

此處一樣詢問論文內容，並使用RFF算法來將檢索內容進行排序，排名前面的檢索內容提到了關鍵字像是query、embedding、LLM等，偏低的檢索內容提到RAG的應用範圍，有點偏離問題想要得到的答案。

再來討論不同的c值所帶來的影響，網路上搜尋的結果會建議使用c=60，原因是適用於大部分的資料，以及能夠處理排名低，分數差距很小的僵局，讓檢索結果可以分出高下，但這並不是絕對的，以下就使用c=10、60、100來看看結果：

1. c=10：在c值偏小時，會以排名較前面的檢索內容為主，排名後面的檢索內容就不會有太多貢獻，排名前面的分數為0.36左右，後面有好幾筆沒有顯示內容的分數為0.09。

2. c=60:在c為60時，分數蠻多都集中在0.01左右，排名前後的檢索內容都會有一定的貢獻，但是缺點就是差距不大。

3. c=100:在c值偏大時，容易受到排名較後面的檢索結果影響，除了大家分數變得更加一致，彼此分數差距也越來越小，難以分出勝負的情形。

以這次作業來說由於文件數量只有一份，所以取的值對於檢索內容不會有太大影響，但是如果文件來源多的話，c值應該就會對檢索內容有較大的影響。


### (c) RAG Fusion Chain:

In [None]:
# RAG
template = """Answer the following question based on this context:

{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)


## TODO: Implement the RAG Fusion chain
# Hint1: use the retrieval_chain_rag_fusion in this chain
# Hint2: consider the format of the prompt above and also use it in the chain
rag_fusion_chain = (
    {"context": retrieval_chain_rag_fusion,
    "question": itemgetter("question")}
    | prompt
    | llm
    | StrOutputParser()
)
# question = "What is this paper about?"
question = "Is it difficult to implement RAG ?"

rag_fusion_chain.invoke({"question":question})

'The text does not explicitly state that implementing RAG is difficult, but it does mention some challenges and limitations. For example, it states that including irrelevant documents can unexpectedly increase accuracy, which may require specialized strategies for integration with language generation models. Additionally, there is a notable paucity of research dedicated to evaluating the distinct characteristics of RAG models.\n\nHowever, the text also mentions that combining RAG with fine-tuning is emerging as a leading approach, which suggests that implementing RAG can be done effectively with the right techniques and strategies.\n\nOverall, while there may be some challenges to implementing RAG, it does not seem to be an insurmountable difficulty. With careful consideration of its strengths and limitations, and the development of specialized strategies for integration with language generation models, it appears that implementing RAG is feasible.'

本來利用RAG fusion chain第一個問題也是詢問這篇論文的內容，但是回答內容和標準RAG差異不大，所以還是改問"Is it difficult to implement RAG ?"(實現RAG會很難嗎？)回答內容比起前面多重查詢RAG更加人性化了，直接告訴我們這篇論文中沒有明確提到實現RAG難不難，而是提到挑戰與限制，可以看出RAG fusion chain在面對這種沒有明確答案的問題時，表現明顯優於標準RAG，然後也可以知道如果單純問論文內容摘要，三種方式(標準RAG、多重查詢、RAG fusion chain)的表現都差不多，但是開放式問題下還是建議使用RAG fusion chain。