<a href="https://colab.research.google.com/github/hahaamg/Generative_AI/blob/main/Week_8/RAG_System.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## RAG System

### 從 Google Drive 下載檔案

In [2]:
URL='https://drive.google.com/uc?export=download&id=1egucKTAI2gjnfyQrFsrPmpX8WMJHWKZY'
!wget -O faiss_db.zip "$URL"

--2025-04-16 16:23:49--  https://drive.google.com/uc?export=download&id=1egucKTAI2gjnfyQrFsrPmpX8WMJHWKZY
Resolving drive.google.com (drive.google.com)... 74.125.137.101, 74.125.137.113, 74.125.137.138, ...
Connecting to drive.google.com (drive.google.com)|74.125.137.101|:443... connected.
HTTP request sent, awaiting response... 303 See Other
Location: https://drive.usercontent.google.com/download?id=1egucKTAI2gjnfyQrFsrPmpX8WMJHWKZY&export=download [following]
--2025-04-16 16:23:49--  https://drive.usercontent.google.com/download?id=1egucKTAI2gjnfyQrFsrPmpX8WMJHWKZY&export=download
Resolving drive.usercontent.google.com (drive.usercontent.google.com)... 142.251.2.132, 2607:f8b0:4023:c0d::84
Connecting to drive.usercontent.google.com (drive.usercontent.google.com)|142.251.2.132|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 335560 (328K) [application/octet-stream]
Saving to: ‘faiss_db.zip’


2025-04-16 16:23:51 (4.72 MB/s) - ‘faiss_db.zip’ saved [335560/33556

In [3]:
!unzip faiss_db.zip

Archive:  faiss_db.zip
   creating: faiss_db/
  inflating: faiss_db/index.faiss    
  inflating: faiss_db/index.pkl      


### 安裝並引入必要套件

In [None]:
!pip install -U langchain langchain-community sentence-transformers faiss-cpu gradio openai

In [5]:
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.chat_models import ChatOpenAI
from langchain.chains import ConversationalRetrievalChain

In [6]:
from openai import OpenAI
import gradio as gr

### 自訂 E5 embedding 類別

In [7]:
class CustomE5Embedding(HuggingFaceEmbeddings):
    def embed_documents(self, texts):
        texts = [f"passage: {t}" for t in texts]
        return super().embed_documents(texts)

    def embed_query(self, text):
        return super().embed_query(f"query: {text}")

### 查看 faiss 基本資訊

In [8]:
import faiss

# 載入 index
index = faiss.read_index("faiss_db/index.faiss")

# 印出基本資訊
print("向量維度:", index.d)
print("總筆數:", index.ntotal)
print("索引類型:", type(index))

# 例如取出前 5 筆向量（如果有的話）
vectors = index.reconstruct_n(0, min(5, index.ntotal))
print("前幾筆向量：", vectors)


向量維度: 384
總筆數: 196
索引類型: <class 'faiss.swigfaiss_avx512.IndexFlatL2'>
前幾筆向量： [[ 0.07144093  0.00418101 -0.07644258 ...  0.05103176  0.07116929
   0.06415357]
 [ 0.08832071  0.02076687 -0.08648844 ...  0.07845695  0.05574175
   0.06062891]
 [ 0.09389171 -0.01096293 -0.06845181 ...  0.08108832  0.07432593
   0.05513625]
 [ 0.08489361  0.00102893 -0.07083521 ...  0.06170662  0.06771576
   0.05746834]
 [ 0.05797962 -0.02049389 -0.05799684 ...  0.06483547  0.04998692
   0.05814803]]


### 載入 `faiss_db`

In [None]:
embedding_model = CustomE5Embedding(model_name="intfloat/multilingual-e5-small")
db = FAISS.load_local("faiss_db", embedding_model, allow_dangerous_deserialization=True)
retriever = db.as_retriever()

### 設定 LLM

In [10]:
import os
from google.colab import userdata

In [11]:
api_key = userdata.get('Groq')
os.environ["OPENAI_API_KEY"] = api_key

這裡的模型和 `base_url` 是用 Groq, 如果用其他服務請自行修改。

In [12]:
model = "llama3-70b-8192"
# model = "gemma3:27b"
base_url="https://api.groq.com/openai/v1"

In [13]:
client = OpenAI(
    base_url=base_url # 使用 OpenAI 本身不需要這段
)

### Prompt 設計

In [14]:
system_prompt = "你是部落格的內容行銷專員，請根據資料來回應的問題。請親切、簡潔並附帶具體建議。請用繁體中文回應。"

prompt_template = """
你是一位資深的知識專家助理，擅長用繁體中文從文件中找出重點並加以整合說明。

以下是知識庫中找到的資訊片段（可能來自不同文件）請你閱讀全部資訊後，再進行統整，不要只選擇部分資料作答：
-------------------------------------
{retrieved_chunks}
-------------------------------------

請根據上方資料，完整、有條理地回答使用者的問題。
1. 若資訊足夠，請詳細解釋並輔以舉例。
2. 若資訊不足，也請指出無法回答的原因。

使用者的提問如下：
{question}
"""

### 6. 使用 RAG 來回應

搜尋與使用者問題相關的資訊，根據我們的 prompt 樣版去讓 LLM 回應。

In [15]:
chat_history = []

def chat_with_rag(user_input):
    global chat_history
    # 取回相關資料
    # 從向量資料庫（FAISS）中取回幾筆最相關的資料片段（documents）
    retriever = db.as_retriever(search_kwargs={"k": 10})  # 可拉高為 8~10
    docs = retriever.get_relevant_documents(user_input)
    #retrieved_chunks = "\n\n".join([doc.page_content for doc in docs])
    retrieved_chunks = "\n\n".join([f"[來源 {i+1}]\n{doc.page_content}" for i, doc in enumerate(docs)])


    # 將自定 prompt 套入格式
    final_prompt = prompt_template.format(retrieved_chunks=retrieved_chunks, question=user_input)

    # 呼叫 OpenAI API
    response = client.chat.completions.create(
    model=model,
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": final_prompt},
    ],
    max_tokens=1536,  # 加大 token 數量，可更大
    temperature=0.7
    )
    answer = response.choices[0].message.content

    chat_history.append((user_input, answer))
    return answer

### 7. 用 Gradio 打造 Web App

In [16]:

with gr.Blocks() as demo:
    gr.Markdown("# AI 部落格內容管理")
    chatbot = gr.Chatbot()
    msg = gr.Textbox(placeholder="請輸入你的問題...")

    def respond(message, chat_history_local):
        response = chat_with_rag(message)
        chat_history_local.append((message, response))
        return "", chat_history_local

    msg.submit(respond, [msg, chatbot], [msg, chatbot])

demo.launch(debug=True)

  chatbot = gr.Chatbot()


It looks like you are running Gradio on a hosted a Jupyter notebook. For the Gradio app to work, sharing must be enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://78b8451609f77c3198.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


  docs = retriever.get_relevant_documents(user_input)


Keyboard interruption in main thread... closing server.
Killing tunnel 127.0.0.1:7860 <> https://78b8451609f77c3198.gradio.live


