<a href="https://colab.research.google.com/github/lollipop6370/ML2025/blob/main/MLhw1_RAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ML2025 Homework 1 - Retrieval Augmented Generation with Agents

## Environment Setup

First, we will mount your own Google Drive and change the working directory.

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
# Change the working directory to somewhere in your Google Drive.
# You could check the path by right clicking on the folder.
%cd /content/drive/MyDrive/ML

/content/drive/MyDrive/ML


In this section, we install the necessary python packages and download model weights of the quantized version of LLaMA 3.1 8B. Also, download the dataset. Note that the model weight is around 8GB. If you are using your Google Drive as the working directory, make sure you have enough space for the model.

In [4]:
#!python3 -m pip install --no-cache-dir llama-cpp-python==0.3.4 --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu122
#!python3 -m pip install googlesearch-python bs4 charset-normalizer requests-html lxml_html_clean

# 一次性安裝所有套件並升級 websockets
!python3 -m pip install --upgrade websockets llama-cpp-python==0.3.4 googlesearch-python bs4 charset-normalizer requests-html lxml_html_clean --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu122

from pathlib import Path
if not Path('./Meta-Llama-3.1-8B-Instruct-Q8_0.gguf').exists():
    !wget https://huggingface.co/bartowski/Meta-Llama-3.1-8B-Instruct-GGUF/resolve/main/Meta-Llama-3.1-8B-Instruct-Q8_0.gguf
if not Path('./public.txt').exists():
    !wget https://www.csie.ntu.edu.tw/~ulin/public.txt
if not Path('./private.txt').exists():
    !wget https://www.csie.ntu.edu.tw/~ulin/private.txt

Looking in indexes: https://pypi.org/simple, https://abetlen.github.io/llama-cpp-python/whl/cu122
Collecting websockets
  Downloading websockets-15.0.1-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.8 kB)


In [5]:
import torch
if not torch.cuda.is_available():
    raise Exception('You are not using the GPU runtime. Change it first or you will suffer from the super slow inference speed!')
else:
    print('You are good to go!')

You are good to go!


## Prepare the LLM and LLM utility function

By default, we will use the quantized version of LLaMA 3.1 8B. you can get full marks on this homework by using the provided LLM and LLM utility function. You can also try out different LLM models.

In the following code block, we will load the downloaded LLM model weights onto the GPU first.
Then, we implemented the generate_response() function so that you can get the generated response from the LLM model more easily.

You can ignore "llama_new_context_with_model: n_ctx_per_seq (16384) < n_ctx_train (131072) -- the full capacity of the model will not be utilized" warning.

In [6]:
from llama_cpp import Llama

# Load the model onto GPU
llama3 = Llama(
    "./Meta-Llama-3.1-8B-Instruct-Q8_0.gguf",
    verbose=False,
    n_gpu_layers=-1,
    n_ctx=16384,    # This argument is how many tokens the model can take. The longer the better, but it will consume more memory. 16384 is a proper value for a GPU with 16GB VRAM.
)

def generate_response(_model: Llama, _messages: str) -> str:
    '''
    This function will inference the model with given messages.
    '''
    _messages = _messages[:16383]
    _output = _model.create_chat_completion(
        _messages,
        stop=["<|eot_id|>", "<|end_of_text|>"],
        max_tokens=512,    # This argument is how many tokens the model can generate, you can change it and observe the differences.
        temperature=0,      # This argument is the randomness of the model. 0 means no randomness. You will get the same result with the same input every time. You can try to set it to different values.
        repeat_penalty=2.0,
    )["choices"][0]["message"]["content"]
    return _output

llama_new_context_with_model: n_ctx_per_seq (16384) < n_ctx_train (131072) -- the full capacity of the model will not be utilized


## Search Tool

The TA has implemented a search tool for you to search certain keywords using Google Search. You can use this tool to search for the relevant **web pages** for the given question. The search tool can be integrated in the following sections.

In [7]:
from typing import List
from googlesearch import search as _search
from bs4 import BeautifulSoup
from charset_normalizer import detect
import asyncio
from requests_html import AsyncHTMLSession
import urllib3
urllib3.disable_warnings()

async def worker(s:AsyncHTMLSession, url:str):
    try:
        header_response = await asyncio.wait_for(s.head(url, verify=False), timeout=10)
        if 'text/html' not in header_response.headers.get('Content-Type', ''):
            return None
        r = await asyncio.wait_for(s.get(url, verify=False), timeout=10)
        return r.text
    except:
        return None

async def get_htmls(urls):
    session = AsyncHTMLSession()
    tasks = (worker(session, url) for url in urls)
    return await asyncio.gather(*tasks)

async def search(keyword: str, n_results: int=3) -> List[str]:
    '''
    This function will search the keyword and return the text content in the first n_results web pages.

    Warning: You may suffer from HTTP 429 errors if you search too many times in a period of time. This is unavoidable and you should take your own risk if you want to try search more results at once.
    The rate limit is not explicitly announced by Google, hence there's not much we can do except for changing the IP or wait until Google unban you (we don't know how long the penalty will last either).
    '''
    keyword = keyword[:100]
    # First, search the keyword and get the results. Also, get 2 times more results in case some of them are invalid.
    results = list(_search(keyword, n_results * 2, lang="zh", unique=True))
    # Then, get the HTML from the results. Also, the helper function will filter out the non-HTML urls.
    results = await get_htmls(results)
    # Filter out the None values.
    results = [x for x in results if x is not None]
    # Parse the HTML.
    results = [BeautifulSoup(x, 'html.parser') for x in results]
    # Get the text from the HTML and remove the spaces. Also, filter out the non-utf-8 encoding.
    results = [''.join(x.get_text().split()) for x in results if detect(x.encode()).get('encoding') == 'utf-8']
    # Return the first n results.
    return results[:n_results]

## Test the LLM inference pipeline

In [8]:
# You can try out different questions here.
test_question='請問誰是 Taylor Swift？'

messages = [
    {"role": "system", "content": "你是 LLaMA-3.1-8B，是用來回答問題的 AI。使用中文時只會使用繁體中文來回問題。"},    # System prompt
    {"role": "user", "content": test_question}, # User prompt
]

print(generate_response(llama3, messages))

泰勒絲（Taylor Swift）是一位美國歌手、詞曲作家和音樂製作人。她出生於1989年，來自田納西州。她的音乐风格从乡村摇滚发展到流行搖擺，並且她被誉为当代最成功的女艺人的之一。

泰勒絲早期在鄉郊小鎮演唱會時開始發展音樂事業，她推出了多張專輯，包括《Taylor Swift》、《Fearless》，以及後來更為知名的大熱作如 《1989》（2014年）、_reputation（）和 _Lover （）。她的歌曲經常探討愛情、友誼及自我成長等主題。

泰勒絲獲得了許多獎項，包括13座格萊美奖，並且是史上最快達到百萬銷量的女藝人之一。


## Agents

The TA has implemented the Agent class for you. You can use this class to create agents that can interact with the LLM model. The Agent class has the following attributes and methods:
- Attributes:
    - role_description: The role of the agent. For example, if you want this agent to be a history expert, you can set the role_description to "You are a history expert. You will only answer questions based on what really happened in the past. Do not generate any answer if you don't have reliable sources.".
    - task_description: The task of the agent. For example, if you want this agent to answer questions only in yes/no, you can set the task_description to "Please answer the following question in yes/no. Explanations are not needed."
    - llm: Just an indicator of the LLM model used by the agent.
- Method:
    - inference: This method takes a message as input and returns the generated response from the LLM model. The message will first be formatted into proper input for the LLM model. (This is where you can set some global instructions like "Please speak in a polite manner" or "Please provide a detailed explanation".) The generated response will be returned as the output.

In [9]:
class LLMAgent():
    def __init__(self, role_description: str, task_description: str, llm:str="bartowski/Meta-Llama-3.1-8B-Instruct-GGUF"):
        self.role_description = role_description   # Role means who this agent should act like. e.g. the history expert, the manager......
        self.task_description = task_description    # Task description instructs what task should this agent solve.
        self.llm = llm  # LLM indicates which LLM backend this agent is using.
    def inference(self, message:str, question="") -> str:
        if self.llm == 'bartowski/Meta-Llama-3.1-8B-Instruct-GGUF': # If using the default one.
            # TODO: Design the system prompt and user prompt here.
            # Format the messsages first.
            messages = [
                {"role": "system", "content": f"{self.role_description}\n{question}"},  # Hint: you may want the agents to speak Traditional Chinese only.
                {"role": "user", "content": f"{self.task_description}\n{message}"}, # Hint: you may want the agents to clearly distinguish the task descriptions and the user messages. A proper seperation text rather than a simple line break is recommended.
            ]
            return generate_response(llama3, messages)
        else:
            # TODO: If you want to use LLMs other than the given one, please implement the inference part on your own.
            return ""

TODO: Design the role description and task description for each agent.

In [10]:
# TODO: Design the role and task description for each agent.

# This agent may help you filter out the irrelevant parts in question descriptions.
question_extraction_agent = LLMAgent(
    role_description="你是提問大師，負責優化問題並精準命題，注意:問題不可以無中生有，要依照原題意思。必須遵循以下範例格式:\n'題目:2025年台灣的牌照稅多少錢?\n題目:3+8=?\n題目:台大進階英文免修門檻要求 TOEFL iBT 達到多少分才能申請?'",
    task_description="請根據以下問題優化: ",
)

# This agent may help you extract the keywords in a question so that the search tool can find more accurate results.
keyword_extraction_agent = LLMAgent(
    role_description="你是搜尋引擎專家，負責找出問題中關鍵字用來給搜尋引擎查找資料。請給出關鍵字就好，不需要多做解釋。",
    task_description="請根據以下敘述找出關鍵字: ",
)

# summarize agent
summarize_agent = LLMAgent(
    role_description="你是總結專家，負責連接題目與文章敘述做出對題目合理的總結。你只需要詳細思考後給合理的結果，不要重複'總結'這兩個字。",
    task_description="請根據題目對以下敘述做總結: ",
)

# This agent is the core component that answers the question.
qa_agent = LLMAgent(
    role_description="你是 LLaMA-3.1-8B，是用來回答問題的 AI。使用中文時只會使用繁體中文來回問題，不可以使用簡體中文，遇到名字可以用英文。我會給你一些網路上關於此問題的總結資料，這些資訊可能會有部分錯誤，請看完每一段總結及給定的問題後，統整並詳細思考問題合理的答案，沒信心時就選你認為最可能的答案。只需要簡單回答答案就好，不要描述答案怎麼來的。",
    task_description="請回答以下問題：",
)

## RAG pipeline

TODO: Implement the RAG pipeline.

Please refer to the homework description slides for hints.

Also, there might be more heuristics (e.g. classifying the questions based on their lengths, determining if the question need a search or not, reconfirm the answer before returning it to the user......) that are not shown in the flow charts. You can use your creativity to come up with a better solution!

- Naive approach (simple baseline)

    ![](https://www.csie.ntu.edu.tw/~ulin/naive.png)

- Naive RAG approach (medium baseline)

    ![](https://www.csie.ntu.edu.tw/~ulin/naive_rag.png)

- RAG with agents (strong baseline)

    ![](https://www.csie.ntu.edu.tw/~ulin/rag_agent.png)

In [11]:
async def pipeline(question: str) -> str:
    # TODO: Implement your pipeline.

    # question extraction agent
    question = question_extraction_agent.inference(question)
    print(question)

    # keyward extraction agent
    #keywards = keyword_extraction_agent.inference(question)
    #print("K: ", keywards)

    # use keywards to search
    key_data = await search(question)
    #print(key_answer)

    # summarize agent
    key_answer = []
    print("summarize:\n")
    for data in key_data:
        summarize = summarize_agent.inference(data[:16384], question)
        print(summarize + "\n")
        key_answer.append(summarize)

    # merge question
    augment_question = "\n".join(key_answer) + "\n" + question

    # QA agent
    return qa_agent.inference(augment_question)

## Answer the questions using your pipeline!

Since Colab has usage limit, you might encounter the disconnections. The following code will save your answer for each question. If you have mounted your Google Drive as instructed, you can just rerun the whole notebook to continue your process.

In [12]:
from pathlib import Path

# Fill in your student ID first.
STUDENT_ID = "n96141139"

STUDENT_ID = STUDENT_ID.lower()
with open('./public.txt', 'r') as input_f:
    questions = input_f.readlines()
    questions = [l.strip().split(',')[0] for l in questions]
    for id, question in enumerate(questions, 1):
        if Path(f"./{STUDENT_ID}_{id}.txt").exists():
            continue
        answer = await pipeline(question)
        answer = answer.replace('\n',' ')
        print(id, answer)
        with open(f'./{STUDENT_ID}_{id}.txt', 'w') as output_f:
            print(answer, file=output_f)

with open('./private.txt', 'r') as input_f:
    questions = input_f.readlines()
    for id, question in enumerate(questions, 31):
        if Path(f"./{STUDENT_ID}_{id}.txt").exists():
            continue
        answer = await pipeline(question)
        answer = answer.replace('\n',' ')
        print(id, answer)
        with open(f'./{STUDENT_ID}_{id}.txt', 'a') as output_f:
            print(answer, file=output_f)

題目:「虎山雄風飛揚」是哪間學校的校歌？
summarize:

題目「虎山雄風飛揚」是哪間學校的校歌？

根據提供的情況，答案為光華國小。

根據題目「虎山雄風飛揚」是哪間學校的校歌？，可以得出以下結論：這是一首與某所學 校相關聯 的音樂作品。

「虎山雄風飛揚」是南投縣光華國小的校歌。

1 光華國小的校歌。
題目:2025年初，NCC透過行政命令規定境外郵購自用產品（如無線鍑盤、滑鼠和藍芽耳機）回台的審查費為多少？
summarize:

根據NCC的公告，自2025年初起，如果境外郵購回台使用自己的產品（如無線鍑盤、滑鼠和藍芽耳機），需要繳交審查費高達750元。

2025年初，NCC透過行政命令規定境外郵購自用產品（如無線鍑盤、滑鼠和藍芽耳機）回台的審查費為750元。

2025年初，NCC透過行政命令規定境外郵購自用產品（如無線鍑盤、滑鼠和藍芽耳機）回台的審查費為750元。

2 750元
題目:第一代 iPhone 是由哪位蘋果 CEO 發表？
summarize:

第一代 iPhone 是由史蒂夫·乔布斯在 2007 年的 Macworld Conference & Expo 上发布，同年六月二十九日正式发售。

第一代 iPhone 是由前蘋果公司 CEO 史蒂夫·乔布斯 發表的。

賈伯斯SteveJobs是蘋果Apple的創辦人，他在2007年MacWorldExpo上首次發表初代iPhone，改變了手機和通訊方式。

3 第一代 iPhone 是由史蒂夫·乔布斯 (Steve Jobs) 發表。
題目:台大進階英文免修門檻要求 TOEFL iBT 達到多少分才能申請？
summarize:

台大進階英文免修門檻要求 TOEFL iBT 達到 72 分才能申請。

台大進階英文免修門檻要求為TOEFL iBT達到78分。

Instagram是一個社交媒體平台，讓用戶可以分享照片和視頻，並與朋友、家人或其他使用者互動。

4 台大進階英文免修門檻要求為TOEFL iBT達到78分。
題目:Rugby Union 中觸地 try 可得幾分？
summarize:

觸地 try 可得 5 分。

根據題目，觸地 try 是 Rugby Union 中的一種得分方式。

這個敘述與題目無關，主要是Reddit網站的錯誤訊息。

5 5 分。
題


Assuming this really is an XML document, what you're doing might work, but you should know that using an XML parser will be more reliable. To parse this document as XML, make sure you have the Python package 'lxml' installed, and pass the keyword argument `features="xml"` into the BeautifulSoup constructor.




  results = [BeautifulSoup(x, 'html.parser') for x in results]


summarize:

Xpark水族館的企鵝寶貝最後被命名為「Tomorin」，這個名字來自日本動畫《BanGDream!It’sMyGO!!!!!》中的角色高松燈（日文發音是 「 Tomori n」），而粉絲們在活動中踴躍參與投票，讓它成功以1.3萬多張選擇成為企鵝寶貝的名字。

這個敘述不是關於Xpark水族館的企鵝寶貝命名，實際上是一則提示使用者更換瀏覽器以便正常訪問網站。

Xpark水族館的企鵝寶貝最後被命名為「Tomorin」。

36 因為粉絲們在活動中踴躍參與投票，讓它成功以1.3萬多張選擇成為企鵝寶貝的名字。
題目:國立臺灣大學物理治療學系的正常修業年限為幾個月？
summarize:

國立臺灣大學物理治療學系的正常修業年限為四個月。

國立臺灣大學物理治療學系的正常修業年限為六個月。

亞洲大學物理治療學系的正常修業年限為四個月。

37 國立臺灣大學物理治療學系的正常修業年限為四個月。
題目:《BanG Dream!》中「呼嘿 嘵」是哪位角色的笑聲習慣？
summarize:

《BanG Dream!》中「呼嘿 嘵」是Rimi Ushigome的笑聲習慣。

《BanG Dream!》中「呼嘿 嘵」是大和麻弥的笑聲習慣。

38 大和麻弥
題目:日本戰國時代被稱為「甲斐之虎」的人物是誰？
summarize:

武田信玄是日本戰國時代的一位大名，為清和源氏的後代。他的外號「甲斐之虎」，所舉“风林火山”（其疾如風，其徐 如 林、侵掠似 火，不動若 山）軍旗，是《孙子兵法》中的典故。他積極開發耕地，克服了當時的問題，並利用金礦的事業引入先進技術。武田信玄重視民政，他制定的“甲州分國法律”是戰爭時代著名的一種地方律令。

他在1541年繼承家督後，便開始進行統一行動，擊敗了多個敵對勢力，並與其他大將結盟。武田信玄還參考《今川假面目錄》、《朝倉敏景十七箇條》，創立了一種新的分國法律——“甲州法度次第”。

在1561年，第四場的會戰爆發了，他成功擊退上杉軍，但損失慘重。武田信玄還參與多個其他大名之間的大規模衝突，並且積極發展自己的勢力。

他於1573年的元龜四歲時病逝，享年五十二岁。在他的遺言中，他要求家臣們嚴守秘密，不要讓敵人知道自己死了。

武田信玄是日本战国时代的大名，为清和源氏的后代，甲斐原野第19世家督、 武 田 氏 第 16 代当主

In [None]:
# Combine the results into one file.
with open(f'./{STUDENT_ID}.txt', 'w') as output_f:
    for id in range(1,91):
        with open(f'./{STUDENT_ID}_{id}.txt', 'r') as input_f:
            answer = input_f.readline().strip()
            print(answer, file=output_f)