# ML2025 Homework 1 - Retrieval Augmented Generation with Agents

## Environment Setup

## Environment Setup Phase

### Step 1: Install Required Packages and Download Model

This stage completes the following tasks:
1. **Install LLaMA Model Support Package**: `llama-cpp-python` for running the quantized version of LLaMA 3.1 8B model
2. **Install Web Search Related Packages**:
   - `googlesearch-python`: Google Search API
   - `bs4`: BeautifulSoup web parsing
   - `charset-normalizer`, `requests-html`, `lxml_html_clean`: Web content processing
3. **Download Model Weights**: Approximately 8GB quantized model file `Meta-Llama-3.1-8B-Instruct-Q8_0.gguf`
4. **Download Question Datasets**: `public.txt` and `private.txt` containing questions to be answered

**Note**: Model download requires significant time and sufficient storage space.

In [None]:
# 安裝LLaMA模型支援套件（支援CUDA 12.2）
!python3 -m pip install --no-cache-dir llama-cpp-python==0.3.4 --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu122

# 安裝網路搜尋和網頁解析相關套件
!python3 -m pip install googlesearch-python bs4 charset-normalizer requests-html lxml_html_clean

from pathlib import Path

# 下載LLaMA 3.1 8B量化模型檔案（約8GB）
if not Path('./Meta-Llama-3.1-8B-Instruct-Q8_0.gguf').exists():
    !wget https://huggingface.co/bartowski/Meta-Llama-3.1-8B-Instruct-GGUF/resolve/main/Meta-Llama-3.1-8B-Instruct-Q8_0.gguf

# 下載公開題目資料集
if not Path('./public.txt').exists():
    !wget https://www.csie.ntu.edu.tw/~ulin/public.txt

# 下載私人題目資料集    
if not Path('./private.txt').exists():
    !wget https://www.csie.ntu.edu.tw/~ulin/private.txt

### Step 2: GPU Environment Check

Ensure the runtime environment uses GPU to avoid extremely slow inference speeds. Even the quantized version of LLaMA 3.1 8B model will be very slow on CPU.

In [ ]:
import torch

# 檢查是否正在使用GPU，若否則拋出異常
if not torch.cuda.is_available():
    raise Exception('You are not using the GPU runtime. Change it first or you will suffer from the super slow inference speed!')
else:
    print('You are good to go!')

## Prepare the LLM and LLM utility function

## Model Loading and Inference Phase

### Step 3: Load LLaMA Model and Create Inference Function

This stage establishes the core inference capability of the entire system:

1. **Model Loading Configuration**:
   - `n_gpu_layers=-1`: Load all model layers onto GPU
   - `n_ctx=16384`: Set context window to 16K tokens (suitable for 16GB VRAM GPU)
   - `verbose=False`: Disable verbose logging to reduce output

2. **Inference Function Parameter Explanation**:
   - `max_tokens=512`: Limit generation length to avoid overly long responses
   - `temperature=0`: Set to 0 for reproducible results, eliminating randomness
   - `repeat_penalty=2.0`: Prevent model from repeating identical content

**Important**: Context window size directly affects memory usage and needs adjustment based on hardware.

In the following code block, we will load the downloaded LLM model weights onto the GPU first.
Then, we implemented the generate_response() function so that you can get the generated response from the LLM model more easily.

You can ignore "llama_new_context_with_model: n_ctx_per_seq (16384) < n_ctx_train (131072) -- the full capacity of the model will not be utilized" warning.

In [None]:
from llama_cpp import Llama

# 載入LLaMA 3.1 8B模型到GPU
llama3 = Llama(
    "./Meta-Llama-3.1-8B-Instruct-Q8_0.gguf",  # 模型檔案路徑
    verbose=False,              # 關閉詳細輸出
    n_gpu_layers=-1,           # 將所有層載入GPU（-1表示全部）
    n_ctx=16384,               # 上下文窗口大小：16K tokens，適合16GB VRAM的GPU
)

def generate_response(_model: Llama, _messages: str) -> str:
    '''
    使用LLaMA模型生成回應的函數
    
    參數:
        _model: LLaMA模型實例
        _messages: 格式化後的對話訊息
    
    返回:
        str: 模型生成的回應內容
    '''
    _output = _model.create_chat_completion(
        _messages,
        stop=["<|eot_id|>", "<|end_of_text|>"],  # 停止符號
        max_tokens=512,          # 最大生成token數量
        temperature=0,           # 溫度參數：0表示無隨機性，結果可重現
        repeat_penalty=2.0,      # 重複懲罰：防止模型重複相同內容
    )["choices"][0]["message"]["content"]
    return _output

## Web Search Tool Phase

### Step 4: Implement Google Search and Web Content Extraction

This is the **information retrieval core** of the RAG system, responsible for obtaining relevant information from the web:

**Search Process Explanation**:
1. **Keyword Processing**: Limit search term length to avoid overly long queries
2. **Search Strategy**: Obtain 2x result count to prevent invalid web pages
3. **Asynchronous Processing**: Use `AsyncHTMLSession` to improve multi-page scraping efficiency
4. **Content Filtering**:
   - Check Content-Type to ensure HTML format
   - Set 10-second timeout to avoid hanging on slow websites
   - Filter non-UTF-8 encoded content to ensure proper Chinese processing

**Limitations and Considerations**:
- **HTTP 429 Errors**: Google has search frequency limits; excessive use will result in temporary blocking
- **Uncontrollable Limits**: Google hasn't published specific limitation standards
- **Solutions**: Reduce search frequency or change IP address

The quality of this search tool directly affects the accuracy of the RAG system's answers.

The TA has implemented a search tool for you to search certain keywords using Google Search. You can use this tool to search for the relevant **web pages** for the given question. The search tool can be integrated in the following sections.

### Step 5: Test Basic Inference Pipeline

Before building a complex RAG system, test whether the basic LLM inference functionality works properly. This test ensures:
- Model loads correctly and can perform inference normally
- Chinese output format meets Traditional Chinese requirements
- Inference speed is within acceptable range

## Test the LLM inference pipeline

In [None]:
# 測試基本LLM推理功能
test_question='請問誰是 Taylor Swift？'

# 構建對話訊息格式
messages = [
    {"role": "system", "content": "你是 LLaMA-3.1-8B，是用來回答問題的 AI。使用中文時只會使用繁體中文來回問題。"},    # 系統提示
    {"role": "user", "content": test_question}, # 用戶問題
]

print(generate_response(llama3, messages))

## AI Agent Architecture Phase

### Step 6: LLMAgent Class Design Explanation

This stage establishes the foundational architecture of the **multi-agent collaborative system**. The LLMAgent class is the core component of the entire RAG system:

**Agent Design Philosophy**:
- **Role Separation**: Each agent handles specific tasks (question understanding, keyword extraction, Q&A, etc.)
- **Modularity**: Individual agents can be easily replaced or adjusted
- **Scalability**: Additional specialized agents can be added in the future

**Class Attribute Explanation**:
- `role_description`: Defines the agent's identity and expertise domain
- `task_description`: Clearly specifies the specific task the agent needs to complete
- `llm`: Specifies the language model backend to use

**Inference Method Features**:
- **Prompt Engineering**: Places role description and task description in system and user prompts respectively
- **Format Processing**: Ensures input format matches LLaMA's conversation template
- **Extensibility**: Reserves interface to support other LLM models

This design allows us to create specialized agents to handle different stages in the RAG process.

The TA has implemented the Agent class for you. You can use this class to create agents that can interact with the LLM model. The Agent class has the following attributes and methods:
- Attributes:
    - role_description: The role of the agent. For example, if you want this agent to be a history expert, you can set the role_description to "You are a history expert. You will only answer questions based on what really happened in the past. Do not generate any answer if you don't have reliable sources.".
    - task_description: The task of the agent. For example, if you want this agent to answer questions only in yes/no, you can set the task_description to "Please answer the following question in yes/no. Explanations are not needed."
    - llm: Just an indicator of the LLM model used by the agent.
- Method:
    - inference: This method takes a message as input and returns the generated response from the LLM model. The message will first be formatted into proper input for the LLM model. (This is where you can set some global instructions like "Please speak in a polite manner" or "Please provide a detailed explanation".) The generated response will be returned as the output.

### Step 7: Design Three Specialized Agents

Based on RAG process requirements, create three agents with distinct responsibilities:

**1. Question Extraction Agent (question_extraction_agent)**
- **Function**: Extract core questions from complex descriptions
- **Importance**: Remove interfering information for more precise search
- **Example**: Simplify "School songs are representative songs of schools, which school's song is 'Tiger Mountain Heroic Wind Flying'?" to "Which school's song is 'Tiger Mountain Heroic Wind Flying'?"

**2. Keyword Extraction Agent (keyword_extraction_agent)**
- **Function**: Extract 2-5 most suitable search keywords from questions
- **Strategy**: Focus on entity nouns, proper nouns, and other concrete searchable terms
- **Output Format**: Comma-separated keyword list

**3. Q&A Agent (qa_agent)**
- **Function**: Answer questions based on retrieved data
- **Role**: Serves as the final knowledge integrator
- **Output Requirements**: Use Traditional Chinese, answer based on provided context

This three-stage division design can improve the professionalism and accuracy of each step.

TODO 1: Design the role description and task description for each agent.

In [None]:
# 設計三個專門化的Agent來處理RAG流程

# Agent 1: 問題萃取Agent - 負責從複雜描述中提取核心問題
question_extraction_agent = LLMAgent(
    role_description="你是一位專業的問題分析師，擅長從複雜的敘述中找出真正需要解決的問題。你只會用繁體中文回答。",
    task_description="請從下列敘述中，萃取出最核心、需要解答的問題，並忽略與問題無關的背景或多餘資訊。只需輸出精簡明確的問題句。",
)

# Agent 2: 關鍵字萃取Agent - 負責提取適合搜尋的關鍵字
keyword_extraction_agent = LLMAgent(
    role_description="你是一位專業的關鍵字萃取專家，擅長從問題中找出最適合用來搜尋的關鍵字。你只會用繁體中文回答。",
    task_description="請從下列問題中，萃取出最適合用來搜尋的 2~5 個關鍵字或短語。只需輸出關鍵字，並以逗號分隔。",
)

# Agent 3: 問答Agent - 負責基於檢索到的資料回答問題
qa_agent = LLMAgent(
    role_description="你是 LLaMA-3.1-8B，是用來回答問題的 AI。使用中文時只會使用繁體中文來回問題。",
    task_description="請回答以下問題：",
)

## RAG pipeline

TODO 2: Implement the RAG pipeline.

Please refer to the homework description slides for hints.

Also, there might be more heuristics (e.g. classifying the questions based on their lengths, determining if the question need a search or not, reconfirm the answer before returning it to the user......) that are not shown in the flow charts. You can use your creativity to come up with a better solution!

- Naive approach (simple baseline)

    ![](https://www.csie.ntu.edu.tw/~ulin/naive.png)

## RAG Core Implementation Phase

### Step 8: Install RAG-Related Packages

To implement Retrieval-Augmented Generation, the following key packages need to be installed:

**Core Package Explanation**:
- `sentence-transformers`: Pre-trained models for text vectorization
- `chromadb`: Lightweight vector database supporting similarity search
- `langchain`: Provides RAG toolchain and embedding wrappers
- `langchain-community`: Extends LangChain functionality

These packages will help us:
1. Convert text into high-dimensional vector representations
2. Store and quickly retrieve similar documents
3. Calculate semantic similarity scores

## RAG核心實作階段

### 第八步：安裝RAG相關套件

為了實現檢索增強生成，需要安裝以下關鍵套件：

**核心套件說明**：
- `sentence-transformers`：用於文本向量化的預訓練模型
- `chromadb`：輕量級向量資料庫，支援相似性搜尋
- `langchain`：提供RAG工具鏈和embedding封裝
- `langchain-community`：擴展LangChain功能

這些套件將幫助我們：
1. 將文本轉換為高維向量表示
2. 儲存和快速檢索相似文檔
3. 計算語義相似性分數

### Step 9: Load Multilingual Embedding Model

**Model Selection Rationale**:
- `paraphrase-multilingual-MiniLM-L12-v2` is specifically designed for multilingual sentence transformers
- Supports Chinese semantic understanding, suitable for Traditional Chinese questions
- Moderate model size (~471MB), balancing performance and resource usage

**Embedding Function**:
- Converts text into 384-dimensional vectors
- Semantically similar texts have closer distances in vector space
- Supports cross-lingual semantic search capabilities

This step lays the foundation for subsequent similarity calculations.

In [ ]:
# 安裝RAG所需的額外套件
!pip install sentence-transformers chromadb langchain
!pip install -U langchain-community

### 第十步：實作完整RAG Pipeline

這是整個系統的**核心函數**，整合所有組件完成端到端的問答流程：

**RAG流程詳解**：

**階段1：問題理解與預處理**
- 使用問題萃取Agent移除無關背景資訊
- 透過關鍵字萃取Agent生成搜尋用關鍵字

**階段2：資訊檢索**
- 執行Google搜尋獲取相關網頁（預設5筆結果）
- 將長文檔切分成500字chunks避免上下文溢出
- 過濾過短文檔片段（少於50字）

**階段3：向量化與相似性搜尋**
- 建立Chroma向量資料庫儲存所有文檔chunks
- 計算問題與文檔的cosine相似度
- 取出最相關的前5個文檔片段

**階段4：內容摘要與答案生成**
- 對每個相關文檔進行100字摘要，控制輸入長度
- 將摘要內容作為context提供給QA Agent
- 生成基於檢索資料的最終答案

**設計考量**：
- **異步處理**：搜尋操作使用async提升效率
- **記憶體管理**：透過摘要避免超出模型上下文限制
- **品質控制**：多階段過濾確保資料品質

### Step 10: Implement Complete RAG Pipeline

This is the **core function** of the entire system, integrating all components to complete the end-to-end Q&A process:

**RAG Process Detailed Explanation**:

**Stage 1: Question Understanding and Preprocessing**
- Use Question Extraction Agent to remove irrelevant background information
- Generate search keywords through Keyword Extraction Agent

**Stage 2: Information Retrieval**
- Execute Google search to obtain relevant web pages (default 5 results)
- Split long documents into 500-character chunks to avoid context overflow
- Filter overly short document fragments (less than 50 characters)

**Stage 3: Vectorization and Similarity Search**
- Build Chroma vector database to store all document chunks
- Calculate cosine similarity between questions and documents
- Extract the most relevant top 5 document fragments

**Stage 4: Content Summarization and Answer Generation**
- Summarize each relevant document into 100 characters to control input length
- Provide summarized content as context to QA Agent
- Generate final answer based on retrieved data

**Design Considerations**:
- **Asynchronous Processing**: Search operations use async for improved efficiency
- **Memory Management**: Avoid exceeding model context limits through summarization
- **Quality Control**: Multi-stage filtering ensures data quality

## 批量處理與結果輸出階段

### 第十二步：批量處理所有題目

**處理策略說明**：
- **斷點續傳機制**：檢查已存在的答案檔案，避免重複處理
- **逐題保存**：每題答案立即保存，防止因中斷而遺失進度
- **記憶體管理**：處理完每題後釋放相關資源

**檔案命名規則**：
- 個別答案：`{STUDENT_ID}_{題號}.txt`
- 方便追蹤進度和除錯

**注意事項**：
- Colab環境可能因使用限制而中斷連線
- 掛載Google Drive可確保檔案持久保存
- 重新執行時會自動跳過已完成的題目

### Step 11: Test RAG Pipeline

Use the 2024 Paris Olympics date as a test case to verify basic RAG system functionality:
- Test whether search function works properly
- Check answer generation quality
- Ensure the entire process runs smoothly

In [12]:
async def pipeline(question: str) -> str:
    # 1. 問題萃取
    core_question = question_extraction_agent.inference(question)
    
    # 2. 關鍵字萃取
    keywords = keyword_extraction_agent.inference(core_question)
    
    # 3. 搜尋網頁
    search_results = await search(keywords, n_results=5)  # 多抓幾筆，增加資訊多樣性
    
    # 4. 將每個搜尋結果切成小段（每500字一段）
    chunk_size = 500
    docs = []
    for doc in search_results:
        for i in range(0, len(doc), chunk_size):
            chunk = doc[i:i+chunk_size]
            if len(chunk) > 50:  # 過短的段落略過
                docs.append(chunk)
    
    # 5. 建立 Chroma 向量資料庫
    vector_db = Chroma.from_texts(texts=docs, embedding=embedding_model)
    
    # 6. 查詢最相關的段落（例如取前5段）
    top_k = 5
    relevant_docs_and_scores = vector_db.similarity_search_with_score(core_question, k=top_k)
    relevant_docs = [doc[0].page_content for doc in relevant_docs_and_scores]
    
    # 7. 對每段做摘要（可選，讓 context 更精簡）
    summaries = []
    for chunk in relevant_docs:
        summary = qa_agent.inference(f"請將以下資料摘要成100字重點：\n{chunk}")
        summaries.append(summary)
    context = "\n".join(summaries)
    
    # 8. 最終問答
    final_input = f"根據以下資料回答問題：\n{context}\n問題：{core_question}"
    answer = qa_agent.inference(final_input)
    return answer

In [ ]:
# 測試RAG pipeline是否正常運作
result = await pipeline("請問2024年巴黎奧運的舉辦日期是什麼？請詳細說明。")
print(result)

## Batch Processing and Result Output Phase

### Step 12: Batch Process All Questions

**Processing Strategy Explanation**:
- **Resume Mechanism**: Check existing answer files to avoid duplicate processing
- **Per-Question Saving**: Save each answer immediately to prevent progress loss due to interruption
- **Memory Management**: Release related resources after processing each question

**File Naming Convention**:
- Individual answers: `{STUDENT_ID}_{question_number}.txt`
- Convenient for tracking progress and debugging

**Important Notes**:
- Colab environment may disconnect due to usage limits
- Mounting Google Drive ensures persistent file storage
- Re-execution will automatically skip completed questions

### 第十五步：打包所有結果檔案

**打包內容**：
- 主要CSV結果檔案
- 90個個別答案檔案（方便除錯和檢查）

**下載功能**：
- 自動生成下載連結
- 清理暫存資料夾節省空間
- 適合Colab環境的檔案匯出方式

**檔案組織**：
- 系統性地管理所有輸出檔案
- 確保提交時不遺漏任何內容

### Step 13: Integrate Results and Generate CSV File

**Output Format Explanation**:
- **CSV Format**: Contains Question (Q) and Answer (A) columns
- **Encoding Handling**: Use UTF-8 to ensure proper Chinese display
- **Question Source**: Merge public.txt (first 30 questions) and private.txt (last 60 questions)

**File Purpose**:
- Convenient result viewing and analysis
- Meets assignment submission format requirements
- Can be imported into Excel and other tools for further processing

In [ ]:
import csv

STUDENT_ID = "20250707"
output_csv = f'./{STUDENT_ID}.csv'

# 讀取所有題目（public.txt + private.txt）
questions = []
with open('./public.txt', 'r', encoding='utf-8') as f:
    questions += [l.strip().split(',')[0] for l in f.readlines()]  # 只取問題部分
with open('./private.txt', 'r', encoding='utf-8') as f:
    questions += [l.strip().split(',')[0] for l in f.readlines()]

# 將結果寫入CSV檔案
with open(output_csv, 'w', newline='', encoding='utf-8') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(['Q', 'A'])  # 寫入標題行

    for idx, question in enumerate(questions, 1):
        ans_path = f'./{STUDENT_ID}_{idx}.txt'
        try:
            # 讀取對應的答案檔案
            with open(ans_path, 'r', encoding='utf-8') as ans_f:
                answer = ans_f.readline().strip()
        except FileNotFoundError:
            answer = ''  # 如果答案檔不存在，留空
        writer.writerow([question, answer])

### Step 14: Merge All Answers into Single Text File

Combine all 90 question answers into one text file in order, one answer per line. This format is convenient for:
- Quick browsing of all answers
- Batch processing or analysis
- Use as backup file

In [None]:
import shutil
import os
from IPython.display import FileLink, display

STUDENT_ID = "20250707"

# 1. 指定要打包的檔案清單
files_to_zip = [f"{STUDENT_ID}.csv"]  # 主要CSV結果檔
files_to_zip += [f"{STUDENT_ID}_{i}.txt" for i in range(1, 91)]  # 90個個別答案檔

# 2. 建立暫存資料夾並複製檔案
tmp_dir = "tmp_zip"
os.makedirs(tmp_dir, exist_ok=True)
for file in files_to_zip:
    if os.path.exists(file):
        shutil.copy(file, tmp_dir)

# 3. 壓縮成zip檔案
zip_name = f"{STUDENT_ID}_all_answers"
shutil.make_archive(zip_name, 'zip', tmp_dir)

# 4. 產生下載連結（適用於Colab環境）
display(FileLink(f"{zip_name}.zip"))

# 5. 清理暫存資料夾以節省空間
shutil.rmtree(tmp_dir)

### Step 15: Package All Result Files

**Package Contents**:
- Main CSV result file
- 90 individual answer files (convenient for debugging and checking)

**Download Functionality**:
- Automatically generate download links
- Clean temporary folders to save space
- Suitable file export method for Colab environment

**File Organization**:
- Systematically manage all output files
- Ensure no content is missed during submission

In [16]:
import csv

STUDENT_ID = "20250707"
output_csv = f'./{STUDENT_ID}.csv'

# 讀取 public.txt 和 private.txt 的題目
questions = []
with open('./public.txt', 'r', encoding='utf-8') as f:
    questions += [l.strip().split(',')[0] for l in f.readlines()]
with open('./private.txt', 'r', encoding='utf-8') as f:
    questions += [l.strip().split(',')[0] for l in f.readlines()]

# 寫入 CSV
with open(output_csv, 'w', newline='', encoding='utf-8') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(['Q', 'A'])  # 標題

    for idx, question in enumerate(questions, 1):
        ans_path = f'./{STUDENT_ID}_{idx}.txt'
        try:
            with open(ans_path, 'r', encoding='utf-8') as ans_f:
                answer = ans_f.readline().strip()
        except FileNotFoundError:
            answer = ''  # 若該題還沒跑完，答案留空
        writer.writerow([question, answer])

## System Performance Analysis and Problem Summary

### Main Reasons for Slow Runtime

**1. Multiple LLM Inference Calls**
- Each question requires 6+ model inferences:
  - Question Extraction Agent: 1 time
  - Keyword Extraction Agent: 1 time  
  - Summary generation: 5 times (for each relevant document)
  - Final Q&A: 1 time
- Each inference requires GPU computation, accumulating significant time

**2. Sequential Execution Bottleneck**
- RAG pipeline stages execute serially, cannot be parallelized
- Must wait for web search results before proceeding with subsequent processing
- Vectorization and similarity calculations need to be completed step by step

**3. Network I/O Overhead**
- Google Search API call latency
- Network latency from parallel web page scraping
- HTTP request retry mechanisms increase waiting time

**4. Vector Operation Cost**
- Document embedding computation (each chunk needs vectorization)
- Distance calculations for similarity search
- ChromaDB creation and query operations

### Answer Accuracy Problem Analysis

**Core Issue**: Fundamental reasons why the RAG system cannot accurately answer questions

**1. Keyword Extraction Failure**
- Example: "Which school's song is 'Tiger Mountain Heroic Wind Flying'?"
- System-extracted keywords may be too broad
- Causes search results to deviate from question core

**2. Low Search Result Relevance**
- Google search returns web content that doesn't match questions
- Particularly for specific, detailed questions
- Lacks verification mechanism for search result quality

**3. Semantic Similarity Misjudgment**
- Embedding model may not correctly understand Chinese semantic differences
- Vector similarity search finds document fragments that aren't truly relevant
- Fixed 500-character splitting may break semantic integrity

**4. Answer Generation Drift**
- QA Agent generates answers based on incorrect or irrelevant context
- Lacks assessment of retrieved content credibility
- Model tends to answer "based on data" even when data is irrelevant

### Implementation Improvement Suggestions

**Short-term Improvements**:
1. Adjust keyword extraction strategy, add entity recognition
2. Increase search result relevance filtering
3. Implement multi-round search mechanism (re-search when first attempt fails)
4. Improve document segmentation method (semantic boundary cutting)

**Long-term Optimization**:
1. Use specialized Chinese embedding models
2. Build question type classification system
3. Implement answer confidence assessment
4. Add knowledge graph-assisted retrieval

### System Applicability Assessment

**Suitable Question Types**:
- General knowledge questions
- Current event-related queries
- Latest information requiring web search

**Unsuitable Question Types**:
- Questions requiring precise answers
- Local, detailed professional knowledge
- Mathematical questions requiring reasoning or calculation

This analysis shows that the current RAG implementation is more suitable as a general Q&A system rather than a precise Q&A tool for specific domains.

In [None]:
import shutil
import os
from IPython.display import FileLink, display

STUDENT_ID = "20250707"

# 1. 指定要壓縮的檔案清單
files_to_zip = [f"{STUDENT_ID}.csv"]  # 先加總表
files_to_zip += [f"{STUDENT_ID}_{i}.txt" for i in range(1, 91)]  # 加入每題答案

# 2. 建立一個暫存資料夾，將所有檔案複製進去
tmp_dir = "tmp_zip"
os.makedirs(tmp_dir, exist_ok=True)
for file in files_to_zip:
    if os.path.exists(file):
        shutil.copy(file, tmp_dir)

# 3. 壓縮成 zip 檔
zip_name = f"{STUDENT_ID}_all_answers"
shutil.make_archive(zip_name, 'zip', tmp_dir)

# 4. 產生下載連結
display(FileLink(f"{zip_name}.zip"))

# 5. 清理暫存資料夾（可選）
shutil.rmtree(tmp_dir)