# MLflow Part 2


🎯 **本章學完你將能學會什麼：**

- 掌握 MLflow CallbackHandler 的使用方式，將 LangChain 的輸入/輸出與模型紀錄自動化  
- 理解 Autolog 模式下的自動追蹤原理與應用場景  
- 能夠在 MLflow UI 中觀察 LLMChain 的運作過程、輸入輸出與執行時間  

📘 **最終你將具備的能力：**  
能夠建立可追蹤的 LLM 實驗環境，將 LangChain 與 MLflow 整合，為模型開發、除錯與實驗管理奠定基礎。  


mlflow server --host 127.0.0.1 --port 8080

## 紀錄內容

In [None]:
import os

os.chdir("../../../")

In [None]:
import mlflow
from textwrap import dedent
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate, ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate
from langchain_community.callbacks import MlflowCallbackHandler
from langchain_openai import ChatOpenAI
from langchain.chains import LLMChain

from src.initialization import credential_init

def build_standard_chat_prompt_template(kwargs):

    messages = []
 
    if 'system' in kwargs:
        content = kwargs.get('system')
        prompt = PromptTemplate(**content)
        message = SystemMessagePromptTemplate(prompt=prompt)
        messages.append(message)  

    if 'human' in kwargs:
        content = kwargs.get('human')
        prompt = PromptTemplate(**content)
        message = HumanMessagePromptTemplate(prompt=prompt)
        messages.append(message)
        
    chat_prompt_template = ChatPromptTemplate.from_messages(messages)
    
    return chat_prompt_template

experiment = "Week-4"
tracking_url = "http://127.0.0.1:8080"

credential_init()

In [None]:
mlflow.set_tracking_uri(uri=tracking_url)

# Start or get an MLflow run explicitly
mlflow.set_experiment(experiment)

### MlflowCallbackHandler

追蹤並記錄語言模型的輸入和輸出

In [None]:
with mlflow.start_run(run_name="my-llm-run") as run:
    run_id = run.info.run_id
    print(f"Using run_id={run_id}")

    # Attach the run_id so all logs go into this run
    mlflow_cb = MlflowCallbackHandler(
        experiment=experiment,
        run_id=run_id,
        tracking_uri=tracking_url,
    )

    model = ChatOpenAI(
        model_name="gpt-4o-mini",
        temperature=0,
        callbacks=[mlflow_cb]
    )

    prompt = PromptTemplate(
        input_variables=["product"],
        template="What is a good name for a company that makes {product}?",
    )

    chain = LLMChain(llm=model, prompt=prompt)

    # First call logs into this run
    chain.invoke({"product": "陽電子攻城炮"})

    # Second call also logs into the SAME run_id
    chain.invoke({"product": "旋風魚雷 (Warhammer 40k, Exterminatus)"})

    chain.invoke({"product": "人形MS/Gundam"})
    
    # Finally flush once
    mlflow_cb.flush_tracker()

In [None]:
run_id = run_id
artifact_path = "table_session_analysis.html"   # artifacts 內的相對路徑位置

# Download to a local directory
local_dir = mlflow.artifacts.download_artifacts(run_id=run_id, artifact_path=artifact_path,
                                                dst_path="tutorial/LLM+Langchain/Week-4", 
                                                tracking_uri=tracking_url,
                                                )

print("Downloaded to:", local_dir)

In [None]:
# !pip install lxml

In [None]:
import pandas as pd

# 記得要加上encoding='utf-8',否則中文會變成亂碼
df = pd.read_html("tutorial/LLM+Langchain/Week-4/table_session_analysis.html", encoding='utf-8')

In [None]:
df[0]

In [None]:
print(df[0].iloc[0]['output'])

## Autolog

此模式完全不會寫入任何 JSON 檔案 —— 相反地，它會將你的 LangChain 執行過程（traces/spans）捕捉並記錄到 MLflow 的實驗追蹤與追蹤（tracing）介面中。這表示你可以在 MLflow 的使用者介面中看到輸入/輸出、執行時間以及巢狀結構，而不是以 .json 檔案的形式儲存。

In [None]:
run_id

In [None]:
# Enable autologging — this instruments LangChain automatically
mlflow.langchain.autolog()

model = ChatOpenAI(
    model_name="gpt-4o-mini",
    temperature=0
)

prompt = PromptTemplate(
    input_variables=["product"],
    template="What is a good name in traditional Chinese for a company that makes {product}?",
)

chain = LLMChain(llm=model, prompt=prompt)

# Run the chain
mlflow.set_tracking_uri(uri=tracking_url)

# Create a new MLflow Experiment
mlflow.set_experiment(experiment)

with mlflow.start_run(run_id=run_id) as run:
    chain.invoke({"product": "光茅 (戰槌40k)"})
    chain.invoke({"product": "旋風魚雷 (Warhammer 40k, Exterminatus)"})
    chain.invoke({"product": "人形MS/Gundam"})

## Reflection + Revision Pipeline

一個模型不夠，你可以用兩個。上面兩個範例都只用了一個模型，還沒外加RAG之類的

🎯 **本章學完你將能學會什麼：**

- 理解如何利用多個模型建立反饋（Reflection）與修訂（Revision）流程  
- 學會設計結構化輸出（PydanticOutputParser）以生成可解析的結果  
- 掌握 RunnablePassthrough 與 RunnableLambda 的應用，打造可組合的 LangChain Pipeline  
- 理解如何在 MLflow 中記錄多階段模型的追蹤數據  

📘 **最終你將具備的能力：**  
能夠構建具備反思與修訂能力的複合式 LLM Pipeline，並在 MLflow 內完整追蹤其運行歷程。  

### Reflection

使用某年的學測/指考的作文作為範例

In [None]:
query = dedent("""
俗話說：「龍生九子，各有不同。」在廣闊浩瀚的海洋之中，就有一頭孤獨的鯨魚——五十二赫茲鯨魚。牠聲音的頻率天生便比同伴還要高，這項特別之處，也導致了牠與同伴產生了無法溝通的鴻溝。看見這則故事的我，不禁思考，在如此多元的人間，是否也有像五十二赫茲鯨魚一般，天生便與眾不同？

回首童年，我印象最為深刻的一刻，是初識字時，與文字互相理解的那一瞬、是當我第一次讀完一個句子時，它將自身的意義傳入我腦中的那一瞬。自此，我便對文字、語言抱有特殊的感情，也十分享受閱讀與朗誦。那種將自身與文字經由一點一滴積累而連接起來的感情，使我心靈感到十分富足。

而當我步入校園接觸同儕時，驀然驚覺我與別人的閱讀速度十分不同。每當我已讀完一篇文章，但同學可能只完成了一半甚至三分之二。同時，我在生字讀音方面也異常的執著，因此被同學抱怨有「文字潔癖」。面對同儕抱怨的我，也只好強忍對耳邊時而出現字錯讀音的不適，開始刻意忽略心裡對它的執念，只為想要與別人一樣，想要和朋友互相理解。

直到多年前，因緣際會之下認識了「五十二赫茲」這獨特的存在。牠的身影在我心中烙下一道深刻的痕跡。因為牠，我開始接受自己與他人的不同；也因為牠，我明白了，我對文字的執著，並不是一種負面的特質，而是上天賜予我的禮物，我開始在寫作上揮灑自如。這讓我知道，不要在一開始便用否定的眼光看待自己的特質。也許這特別之處，會使我們與五十二赫茲鯨魚一般孤獨，會使我們遭受他人的不理解與排斥，但也會讓我們與眾不同。

關於此，我想說的是，勇敢地綻放自己的特別，也讓自己成為自己和世人眼中，最閃耀的五十二赫茲鯨魚。
""")


def create_feedback_pipeline(mlflow_callback):

    ## Teacher LLM
    system_template = dedent("""
    你是一個教學與寫作經驗豐富的台灣大學中文系教授，你要來負責給予作文評分與回饋。
    """)
    
    human_template = dedent("""
    Title: {title}
    
    Article:
    {article}
    """)
    
    model = ChatOpenAI(openai_api_key=os.environ['OPENAI_API_KEY'],
                       model_name="gpt-4o-mini", temperature=0,
                       callbacks=[mlflow_callback])
    
    input_ = {"system": {"template": system_template},
              "human": {"template": human_template,
                        "input_variable": ["title", "article"],
                        }}
    
    chat_prompt_template = build_standard_chat_prompt_template(input_)
    
    feedback_pipeline = chat_prompt_template|model|StrOutputParser()

    return feedback_pipeline

In [None]:
from pydantic import BaseModel, Field
from langchain_core.output_parsers import PydanticOutputParser


class Output(BaseModel):
    name: str = Field(description="The revised article in traditional Chinese (繁體中文), please do not include the title.")


output_parser = PydanticOutputParser(pydantic_object=Output)
format_instructions = output_parser.get_format_instructions()


def create_revision_pipeline(mlflow_callback):
    ## Generate
    system_template = dedent("""
    你是一個在準備考試的高中生，你將根據反饋強化的作文內容。
    """)
    
    human_template = dedent("""
    Title: {title}
    
    Old Article:
    {article}
    
    Feedback:
    {feedback}

    Output format instructions: {format_instructions}
    
    Revised Article:
    """)
    
    input_ = {"system": {"template": system_template},
              "human": {"template": human_template,
                        "input_variable": ["title", "article", "feedback"],
                        "partial_variables": {'format_instructions': format_instructions}}}
    
    chat_prompt_template = build_standard_chat_prompt_template(input_)

    model = ChatOpenAI(openai_api_key=os.environ['OPENAI_API_KEY'],
                       model_name="gpt-4o-mini", temperature=0,
                       callbacks=[mlflow_callback])
    
    revision_pipeline = chat_prompt_template|model|output_parser

    return revision_pipeline

在呼叫MLflow後，進行 Reflection -> Revision

In [None]:
from langchain_core.runnables import RunnablePassthrough, RunnableLambda

run_name = "Reflection_Revision"

mlflow.set_tracking_uri(uri=tracking_url)

# Start or get an MLflow run explicitly
mlflow.set_experiment(experiment)
with mlflow.start_run(run_name=run_name) as run:
    run_id = run.info.run_id
    print(f"Using run_id={run_id}")

    # Attach the run_id so all logs go into this run
    mlflow_cb = MlflowCallbackHandler(
        experiment=experiment,
        run_id=run_id,
        tracking_uri=tracking_url,
    )

    feedback_pipeline = create_feedback_pipeline(mlflow_callback=mlflow_cb)
    revision_pipeline = create_revision_pipeline(mlflow_callback=mlflow_cb)
    
    whole_pipeline = RunnablePassthrough.assign(feedback=feedback_pipeline)|revision_pipeline|RunnableLambda(lambda x: x.name)

    result = whole_pipeline.invoke({"article": query,
                                    "title": "關於五十二赫茲，我想說的是…"},
                                    # config={"callbacks": [mlflow_cb]} 
                                  )
    
    
# Finally flush once
mlflow_cb.flush_tracker()

結合上週的內容，將這個Pipeline打包成一個Artifact上傳到MLflow Server，然後藉由MLflow調用Pipeline

## Upload model as a python script

🎯 **本章學完你將能學會什麼：**

- 理解 MLflow ModelSignature 的用途與 schema 定義方法  
- 學會如何設定模型的輸入與輸出格式，確保部署時類型驗證正確  
- 掌握將模型與源代碼上傳為 Artifact 的流程  
- 瞭解如何註冊與載入 MLflow 模型以進行推論  

📘 **最終你將具備的能力：**  
能夠定義、封裝並上傳可重用的 LLM 模型，並以 MLflow 作為模型版本與部署的核心平台。


### ModelSignature

    ModelSignature 在 MLflow 裡的主要作用，正是用來定義與驗證模型的輸入與輸出格式（schema）。
    
    ModelSignature 是 MLflow 的一個物件，用來描述模型的：
    
    輸入（inputs）：模型預期接收的資料格式（欄位名稱、資料型別等）
    
    輸出（outputs）：模型預期回傳的資料格式
    
    它讓 MLflow 能：
    
        - 在 模型儲存（log_model / save_model） 時記錄這些資訊。
    
        - 在 模型部署或推論（predict） 時自動檢查輸入資料是否符合定義。
    
        - 在 MLflow UI 中清楚顯示模型的「輸入/輸出結構」


這個例子中模型需要兩個輸入欄位：

- title（型別：string）

- article（型別：string）

模型會輸出一個欄位：

- 無名稱（或預設名稱）但型別是 string。

換句話說，這個 signature 明確說明了模型的 輸入結構 與 輸出結構，
可幫助 MLflow 在紀錄或部署模型時自動進行型別驗證與追蹤。

In [None]:
from textwrap import dedent

import pandas as pd
from mlflow.models.signature import ModelSignature
from mlflow.types.schema import Schema, ColSpec

from src.io.path_definition import get_project_dir

model_path = os.path.join(get_project_dir(), 'tutorial', 'LLM+Langchain', 
                          "Week-4", "llmchain_mlflow_experiment_tracing.py")

# You need to know what you will put into it and what you will get out of it.
input_schema = Schema([ColSpec("string", "title"),
                       ColSpec("string", "article")])
output_schema = Schema([ColSpec("string")])
signature = ModelSignature(inputs=input_schema, outputs=output_schema)


query = dedent("""
俗話說：「龍生九子，各有不同。」在廣闊浩瀚的海洋之中，就有一頭孤獨的鯨魚——五十二赫茲鯨魚。牠聲音的頻率天生便比同伴還要高，這項特別之處，也導致了牠與同伴產生了無法溝通的鴻溝。看見這則故事的我，不禁思考，在如此多元的人間，是否也有像五十二赫茲鯨魚一般，天生便與眾不同？

回首童年，我印象最為深刻的一刻，是初識字時，與文字互相理解的那一瞬、是當我第一次讀完一個句子時，它將自身的意義傳入我腦中的那一瞬。自此，我便對文字、語言抱有特殊的感情，也十分享受閱讀與朗誦。那種將自身與文字經由一點一滴積累而連接起來的感情，使我心靈感到十分富足。

而當我步入校園接觸同儕時，驀然驚覺我與別人的閱讀速度十分不同。每當我已讀完一篇文章，但同學可能只完成了一半甚至三分之二。同時，我在生字讀音方面也異常的執著，因此被同學抱怨有「文字潔癖」。面對同儕抱怨的我，也只好強忍對耳邊時而出現字錯讀音的不適，開始刻意忽略心裡對它的執念，只為想要與別人一樣，想要和朋友互相理解。

直到多年前，因緣際會之下認識了「五十二赫茲」這獨特的存在。牠的身影在我心中烙下一道深刻的痕跡。因為牠，我開始接受自己與他人的不同；也因為牠，我明白了，我對文字的執著，並不是一種負面的特質，而是上天賜予我的禮物，我開始在寫作上揮灑自如。這讓我知道，不要在一開始便用否定的眼光看待自己的特質。也許這特別之處，會使我們與五十二赫茲鯨魚一般孤獨，會使我們遭受他人的不理解與排斥，但也會讓我們與眾不同。

關於此，我想說的是，勇敢地綻放自己的特別，也讓自己成為自己和世人眼中，最閃耀的五十二赫茲鯨魚。
""")

run_name = "Reflection_Revision_Py"

with mlflow.start_run(run_name=run_name) as run:

    os.environ['experiment'] = experiment
    os.environ['run_id'] = run.info.run_id
    os.environ['run_name'] = run_name
    
    mlflow.log_artifact(model_path, artifact_path="source_code")

    input_example = pd.DataFrame(data=[[query, "關於五十二赫茲，我想說的是…"]], columns=['article', 'title'])
    
    model_info = mlflow.pyfunc.log_model(
        python_model=model_path,  # Define the model as the path to the Python file
        name="langchain_model",
        input_example=input_example,
        signature=signature,
        registered_model_name="Generation_Reflection_Demo"
    )

In [None]:
query = dedent("""\
鳴大鐘一次！推動杠杆，啟動活塞和泵
鳴大鐘兩次！按下按鈕，發動引擎，點燃渦輪，注入生命
鳴大鐘三次！齊聲歌唱，讚美萬機之神！
""")


with mlflow.start_run(run_name=run_name) as run:

    os.environ['experiment'] = experiment
    os.environ['run_id'] = run.info.run_id
    os.environ['run_name'] = run_name

    loaded_model = mlflow.pyfunc.load_model("models:/Generation_Reflection_Demo/1")
    
    input_ = pd.DataFrame(data=[[query, '關於五十二赫茲，我想說的是…']], columns=['article', 'title'])
    
    output = loaded_model.predict(input_)

In [None]:
print(output)

# LangServe API

🎯 **本章學完你將能學會什麼：**

- 理解 LangServe 的架構與 API 調用流程  
- 學會建立簡單的後端伺服器，並從客戶端發送請求取得模型回應  
- 掌握 RemoteRunnable 的應用，讓模型能夠遠端呼叫  
- 瞭解如何結合 MLflow 模型與 LangServe API 進行整合部署  

📘 **最終你將具備的能力：**  
能夠設計一個具備後端 API 介面的 LLM 系統，支援遠端推論與模組化部署。  

## 1. 客戶端 (client) 呼叫後端 API

In [None]:
import requests

response = requests.post(
    "http://localhost:5000/openai/invoke",
    json={'input': "Where is Taiwan?"}
)

In [None]:
response.json()

In [None]:
response.json()['output']['content']

在Windows的CLI(command line interface)中:

curl -X POST "http://localhost:5000/openai/invoke" -H "Content-Type: application/json" -d "{""input"": ""Where is Taiwan?""}"

## 2. 結合之前MLflow的應用。從MLflow server上下載模型，然後從客戶端呼叫

In [None]:
import requests

response = requests.post(
    "http://localhost:5000/demo/invoke",
    json={'input': {"article": query,
                    "title": "關於五十二赫茲，我想說的是…"}}
)

response.json()

## 3 RemoteRunnable

In [None]:
from langserve import RemoteRunnable

remote_llm = RemoteRunnable("http://localhost:5000/openai/")

In [None]:
# Supports astream
async for msg in remote_llm.astream("Where is Taiwan?"):
    print(msg.content, end="", flush=True)

In [None]:
remote_llm.invoke("花蓮縣光復鄉因為馬太鞍溪堰塞湖潰堤，導致被泥石流淹過。就安全的考量，沒接受過專業訓練的平民是否應該去花蓮縣光復鄉參與救災。")

In [None]:
import os

os.chdir("../../../")

# ChatBot 本體與記憶機制

🎯 **本章學完你將能學會什麼：**

- 了解 Stateless 與 Stateful 對話系統的差異  
- 學會使用 ChatMessageHistory 管理對話記錄  
- 掌握 ChatPromptTemplate 與 MessagesPlaceholder 的應用  
- 能實作簡單的多輪對話機制 

📘 **最終你將具備的能力：**  
能夠設計一個具備「對話記憶」的智能 ChatBot，能夠在上下文中延續並理解對話語境。  


1. N-Shot Learning 與對話歷史

    - 歷史對話可以看成一個 Q&A pair 列表
    - 當前模型在推理時，會把「之前的對話內容」作為 prompt 的一部分，再加上使用者最新的輸入，整合後丟進模型。這其實就是一種 few-shot / N-shot 的學習方式：模型從範例中抽取語境來理解「現在該怎麼回答」。

2. Stateless vs. Stateful
    - Stateless：如果每次請求都完全獨立，沒有任何歷史對話被帶入，那就叫無狀態 (stateless)。
    - Stateful：如果系統會保存對話歷史（不論是把歷史傳回模型，還是外部記憶系統存起來），那就是有狀態的 (stateful)。
    - 所以是否「能記住」過去，取決於設計，而不是模型本身自帶的能力。

3. Tools 的角色
    - 讓 ChatBot 強大的是 tools
    - 模型本身雖然能生成語言，但 結合外部工具（例如資料庫查詢、計算器、網路搜尋、代碼執行、圖片生成）後，ChatBot 才能真正做到「會推理 + 會行動」，不再只受限於參數內的知識。

    - 可以理解成：模型是「大腦」，Tools 是「手腳」。

In [None]:
from IPython.display import Image

Image(url="https://python.langchain.com/v0.1/assets/images/chat_use_case-eb8a4883931d726e9f23628a0d22e315.png")

先學怎麼調動工具: 模型就像是一個訓練有素的阿斯塔特，工具就像是動力甲，噴射背包，爆彈槍，和鏈鋸劍。

## 工具綁定與工具呼叫 (Tools & ToolMessage)

🎯 **本章學完你將能學會什麼：**

- 理解 Tool 與 LLM 的互動方式  
- 學會使用 LangChain 的 @tool 與 StructuredTool 綁定外部函式  
- 掌握 ToolMessage 的設計與呼叫邏輯  
- 能整合多個工具（如計算、查詢、WebSearch）讓模型具備「行動能力」  

📘 **最終你將具備的能力：**  
能夠打造能「思考 + 行動」的 ChatBot，結合多種工具完成自動化任務。  

In [None]:
import os

os.chdir("../../../")

from langchain_core.prompts import PromptTemplate, ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from langchain_google_genai import ChatGoogleGenerativeAI

from src.initialization import credential_init

credential_init()

# Define a calculator tool
@tool
def add_numbers(a: int, b: int) -> int:
    """Adds two numbers together."""
    return a + b

# # Create the LLM
# llm = ChatOpenAI(model="gpt-4o-mini")
llm = ChatGoogleGenerativeAI(
    model="gemini-2.0-flash",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=6,
    disable_streaming=False
    # other params...
)

# Bind the tool to the model
llm_with_tools = llm.bind_tools([add_numbers])

# Run
resp = llm_with_tools.invoke("What is 42 + 58?")

In [None]:
resp

In [None]:
tool_call = resp.tool_calls[0]
print(tool_call)

根據tool_call計算結果

In [None]:
result = eval(tool_call['name'])(tool_call['args'])

In [None]:
result

建立ToolMessage

In [None]:
from langchain_core.messages import ToolMessage

tool_msg = ToolMessage(
    content=str(result),          # usually a string or simple text
    tool_call_id=tool_call['id']    # must match the AIMessage tool_call id
)

In [None]:
resp

In [None]:
tool_msg

最後，綁定AIMessage和ToolMessage，在進行一次invoke得到結果

In [None]:
llm_with_tools.invoke([resp, tool_msg])

## OpenAI WebSearch API 與網路搜尋應用

🎯 **本章學完你將能學會什麼：**

- 學會調用 OpenAI WebSearch 工具取得即時資料  
- 理解 WebSearch API 的 action、annotations、sources 參數  
- 掌握如何在回應中引用來源並分析結果  
- 了解 WebSearch 的優缺點與應用策略  

📘 **最終你將具備的能力：**  
能夠設計可即時檢索、分析並生成報告的 AI 助手，並根據情境選擇最佳資訊來源。  


### 基本使用

In [None]:
from openai import OpenAI

client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])

In [None]:
response = client.responses.create(
         model="gpt-4o-mini",
         tools=[{"type": "web_search"}],
         input="幫我查詢HunterXHunter 最新的進度"
    )

response

1. A web_search_call output item with the ID of the search call, along with the action taken in web_search_call.action. The action is one of:
    - ***search***, which represents a web search. It will usually (but not always) includes the search query and domains which were searched. Search actions incur a tool call cost (see pricing).
    - ***open_page***, which represents a page being opened. Supported in reasoning models.
    - ***find_in_page***, which represents searching within a page. Supported in reasoning models.
2. A message output item containing:
    - The text result in message.content[0].text
    - Annotations message.content[0].annotations for the cited URLs

In [None]:
response.output[0]

In [None]:
response.output[1].annotationscontent[0].

websearch 結果:

In [None]:
response.output_text

#### Source 參數
若要查看在網路搜尋過程中擷取的所有網址，可使用 sources 欄位。
與只顯示最相關參考資料的「行內引用（inline citations）」不同，sources 會回傳模型在生成回應時所參考的完整網址清單。

In [None]:
response = client.responses.create(
         model="gpt-4o-mini",
         tools=[{"type": "web_search"}],
         include=["web_search_call.action.sources"],
         input="幫我查詢HunterXHunter 最新的進度"
    )

# response

In [None]:
response.output[0].action.sources

In [None]:
print(response.output_text)

#### User location
To refine search results based on geography, you can specify an approximate user location using country, city, region, and/or timezone.

- The city and region fields are free text strings, like Minneapolis and Minnesota respectively.
- The country field is a two-letter ISO country code, like US. (ISO 3166-1 alpha-2)
- The timezone field is an IANA timezone like America/Chicago.

In [None]:
response = client.responses.create(
         model="gpt-4o-mini",
         tools=[{"type": "web_search",
                 "user_location": {
                    "type": "approximate",
                    "country": "US",
                     },
                 "search_context_size": "medium"
                }],
         include=["web_search_call.action.sources"],
         input="幫我查詢HunterXHunter 最新的進度"
    )
                
print(response.output_text)

其他的argument:

- Domain filtering (gpt-5 and o-series models only)
- reasoning (gpt-5 and o-series models only)
    - effort:
        - ***minimal***
        - ***low***
        - ***medium***
        - ***high***
    -  summary:
        - ***auto***
        - ***concise***
        - ***detailed***  
- tool_choice
    - ***none*** means the model will not call any tool and instead generates a message.
    - ***auto*** means the model can pick between generating a message or calling one or more tools.
    - ***required*** means the model must call one or more tools. 

In [None]:
response = client.responses.create(
  model="gpt-5",
  reasoning={"effort": "low"},
  tools=[
      {
          "type": "web_search",
          "filters": {
              "allowed_domains": [
                  "pubmed.ncbi.nlm.nih.gov",
                  "clinicaltrials.gov",
                  "www.who.int",
                  "www.cdc.gov",
                  "www.fda.gov",
              ]
          },
      }
  ],
  tool_choice="auto",
  include=["web_search_call.action.sources"],
  input="Please perform a web search on how semaglutide is used in the treatment of diabetes.",
)


### 建立websearch工具

In [None]:
@tool
def websearh_tool(query: str) -> str:
    """Use this tool to find the latest information or information you are not sure"""

    response = client.responses.create(
                    model="gpt-4o-mini",
                    tools=[
                        {"type": "web_search",}
                    ],
                    tool_choice="auto",
                    input=query)
    
    return response.output_text

In [None]:
# Bind the tool to the model
llm = ChatGoogleGenerativeAI(
    model="gemini-2.0-flash",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=6,
    disable_streaming=False
    # other params...
)
llm_with_tools = llm.bind_tools([websearh_tool])

# Run
resp = llm_with_tools.invoke("台灣2024總統大選結果")

In [None]:
resp

In [None]:
tool_call = resp.tool_calls[0]
result = eval(tool_call['name'])(tool_call['args'])

print(result)

In [None]:
def call_function(tool_call):
    
    return eval(tool_call['name'])(tool_call['args'])


def follow_up_answer(aimessage):

    tool_call = aimessage.tool_calls[0]
    
    result = call_function(tool_call)

    tool_msg = ToolMessage(
        content=str(result),          # usually a string or simple text
        tool_call_id=tool_call['id']    # must match the AIMessage tool_call id
    )
    
    follow_up = llm_with_tools.invoke([aimessage, tool_msg])

    return follow_up

follow_up_answer(aimessage=resp)

## ChatBot 本體

### LLM 沒有記憶性

In [None]:
from langchain_core.messages import HumanMessage, AIMessage

model = ChatGoogleGenerativeAI(
    model="gemini-2.0-flash",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=6,
    disable_streaming=False
    # other params...
)

message = HumanMessage(
            content="Translate this sentence from English to Chinese (繁體中文): I love programming."
        )

model.invoke(
    [message]
)

In [None]:
model.invoke([HumanMessage(content="What did you just say?")])

### 外部記憶

如何將外部記憶加入?

In [None]:
model.invoke(
    [
        HumanMessage(
            content="Translate this sentence from English to  Chinese (繁體中文): I love programming."
        ),
        AIMessage(content="我愛程式設計."),
        HumanMessage(content="What did you just say?"),
    ]
)

透過 MessagePlaceholder接收外部記憶

In [None]:
from langchain_core.prompts import PromptTemplate, ChatPromptTemplate, MessagesPlaceholder, SystemMessagePromptTemplate

system_prompt = PromptTemplate(template=("You are a helpful assistant. Answer all questions to the best of your "
                                         "ability."))
system_message = SystemMessagePromptTemplate(prompt=system_prompt)

prompt = ChatPromptTemplate.from_messages(
    [
        system_message,
        MessagesPlaceholder(variable_name="messages"),
    ]
)

### 建立邏輯鍊條

In [None]:
pipeline_ = prompt | model

In [None]:
from langchain_core.messages import HumanMessage, AIMessage

pipeline_.invoke(
    {
        "messages": [
            HumanMessage(
            content="Translate this sentence from English to Chinese (繁體中文): I love programming."
            ),
            AIMessage(content="我愛程式設計."),
            HumanMessage(content="What did you just say?"),
            ],
    }
)

## 將對話記錄存入ChatMessageHistory裡

### 導入並創建 ChatMessageHistory。

In [None]:
from langchain.memory import ChatMessageHistory

demo_chat_history = ChatMessageHistory()

### 添加用戶和 AI 消息

In [None]:
demo_chat_history.add_user_message("Translate this sentence from English to Chinese (繁體中文): I love programming.")

demo_chat_history.add_ai_message("我愛程式設計.")

demo_chat_history.messages

In [None]:
demo_chat_history.add_user_message(
    "What did you just say?"
)

response = pipeline_.invoke({"messages": demo_chat_history.messages})

response

### 最小範例

In [None]:
from langchain_core.output_parsers import StrOutputParser

chat_history = ChatMessageHistory()
pipeline_ = prompt|model|StrOutputParser()

while True:
    question = input("What do you want to ask: ")
    if question == "QUIT":
        break
    chat_history.add_user_message(question)
    response = pipeline_.invoke({"messages": chat_history.messages})

    print(response)
    chat_history.add_ai_message(response)
    

In [None]:
# import os

# os.chdir("../../../")

## ChatBot + 檢索系統整合

🎯 **本章學完你將能學會什麼：**

- 理解如何結合 FAISS 向量資料庫與 ChatBot  
- 學會建立可檢索文本的 RAG（Retrieval-Augmented Generation）工作流  
- 掌握 StructuredTool 與 Retriever 工具的實際應用  
- 能讓 ChatBot 根據知識庫回答特定問題（如唐詩檢索）  

📘 **最終你將具備的能力：**  
能夠構建一個具備「外部記憶與知識檢索」能力的智慧 ChatBot，結合語意搜尋與生成。  

In [None]:
from textwrap import dedent

from langchain_openai import ChatOpenAI
from langchain_community.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings
from langchain_core.runnables import ConfigurableField
from langchain_core.tools import tool
from langchain.tools import StructuredTool
from pydantic import BaseModel, Field
from langchain.memory import ChatMessageHistory
from langchain_core.prompts import PromptTemplate, ChatPromptTemplate, MessagesPlaceholder, SystemMessagePromptTemplate, HumanMessagePromptTemplate

from src.io.path_definition import get_project_dir
from src.initialization import credential_init


credential_init()

# 引入唐詩向量數據庫
filename = os.path.join(get_project_dir(), "tutorial", "LLM+Langchain", "Week-2", "poem_faiss_index")

embeddings = HuggingFaceEmbeddings(model_name="BAAI/bge-m3")

vectorstore = FAISS.load_local(
    filename, embeddings, allow_dangerous_deserialization=True
)

retriever = vectorstore.as_retriever(seearch_type='similarity').configurable_fields(\
        search_kwargs=ConfigurableField(id="search_kwargs")
    )


class PoemRetrieverArgs(BaseModel):
    query: str = Field(description="The keyword or phrase to search for Tang poems. 用來搜尋唐詩的關鍵字或是句子")
    k: int = Field(1, description="The number of poems to retrieve.")


def _poem_retriever(query: str, k: int):
    output = retriever.invoke(query, config={"configurable": {"search_kwargs": {"k": k}}})
    return output


# 使用 StructuredTool建立工具
# 並且通過args_schema來告知輸入的格式

poem_retriever = StructuredTool.from_function(
    func=_poem_retriever,
    args_schema=PoemRetrieverArgs,
    description="使用這個工具來搜尋唐詩; Use this tool to search for Tang poems.",
)

In [None]:
from langchain_google_genai import ChatGoogleGenerativeAI

system_prompt = PromptTemplate(template=dedent("""You are a helpful assistant. 
Answer all questions to the best of your ability.
"""))

system_message = SystemMessagePromptTemplate(prompt=system_prompt)

prompt = ChatPromptTemplate.from_messages(
    [
        system_message,
        MessagesPlaceholder(variable_name="messages"),
    ]
)

# model = ChatOpenAI(openai_api_key=os.environ['OPENAI_API_KEY'],
#                    model_name="gpt-4o-2024-05-13", temperature=0)

model = ChatGoogleGenerativeAI(
    model="gemini-2.0-flash",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=6,
    disable_streaming=False
    # other params...
)

model_with_tools = model.bind_tools([poem_retriever])

chatbot_pipeline = prompt | model_with_tools

In [None]:
chat_history = ChatMessageHistory()

question = "幫我找3首關於對於人生感嘆的唐詩"

chat_history.add_user_message(question)

output = chatbot_pipeline.invoke({"messages": chat_history.messages})

In [None]:
output

In [None]:
from langchain_core.messages import ToolMessage

def call_function(tool_call):
    
    return eval(tool_call['name'])(**tool_call['args'])


def follow_up_answer(aimessage):

    tool_call = aimessage.tool_calls[0]
    
    result = call_function(tool_call)

    tool_msg = ToolMessage(
        content=str(result),          # usually a string or simple text
        tool_call_id=tool_call['id']    # must match the AIMessage tool_call id
    )
    
    follow_up = model_with_tools.invoke([aimessage, tool_msg])

    return follow_up

output = follow_up_answer(aimessage=output)

print(output)
# follow_up_answer(human_message=question, ai_message=output.content, additional_kwargs=output.additional_kwargs)

In [None]:
chat_history = ChatMessageHistory()

while True:
    question = input("What do you want to ask: ")
    if question == "QUIT":
        break
    chat_history.add_user_message(question)
    output = chatbot_pipeline.invoke({"messages": chat_history.messages})

    if output.tool_calls != []:
        response = follow_up_answer(output).content
    else:
        response = output.content

    print("***********************")
    print(response)
    print("***********************")
    
    chat_history.add_ai_message(response)

### 使用代碼解決數學問題工具

- OpenAI API: https://platform.openai.com/docs/guides/tools-code-interpreter
- 自己玩玩看。但我們也可以自己手搓寫代碼的工具

1. 代碼產生的邏輯鍊條

In [None]:
import os

os.chdir("../../../")

from src.initialization import credential_init

In [None]:
import re
from textwrap import dedent

from langchain.memory import ChatMessageHistory
from langchain_openai import ChatOpenAI
from langchain_core.runnables import chain
from langchain_core.output_parsers import StrOutputParser, PydanticOutputParser
from langchain_core.tools import tool
from langchain_core.messages import ToolMessage
from langchain.tools import StructuredTool
from langchain_core.prompts import PromptTemplate, ChatPromptTemplate, MessagesPlaceholder, SystemMessagePromptTemplate, HumanMessagePromptTemplate
from pydantic import BaseModel, Field

from src.initialization import credential_init

credential_init()


def build_standard_chat_prompt_template(kwargs):

    messages = []
 
    if 'system' in kwargs:
        content = kwargs.get('system')
        prompt = PromptTemplate(**content)
        message = SystemMessagePromptTemplate(prompt=prompt)
        messages.append(message)  

    if 'human' in kwargs:
        content = kwargs.get('human')
        prompt = PromptTemplate(**content)
        message = HumanMessagePromptTemplate(prompt=prompt)
        messages.append(message)
        
    chat_prompt = ChatPromptTemplate.from_messages(messages)
    
    return chat_prompt


@chain
def code_execution(code):
    
    match = re.findall(r"python\n(.*?)\n```", code, re.DOTALL)
    python_code = match[0]
    
    lines = python_code.strip()#.split('\n')
    # *stmts, last_line = lines

    local_vars = {}
    exec(lines, local_vars)

    return local_vars


def call_function(tool_call):
    
    return eval(tool_call['name'])(**tool_call['args'])


def follow_up_answer(aimessage):

    tool_call = aimessage.tool_calls[0]
    
    result = call_function(tool_call)

    tool_msg = ToolMessage(
        content=str(result),          # usually a string or simple text
        tool_call_id=tool_call['id']    # must match the AIMessage tool_call id
    )

    try:
        follow_up = model_with_tools.invoke([aimessage, tool_msg])
    except:
        raise ValueError(f"aimessage={aimessage}\ntool_msg={tool_msg}")
    
    return follow_up


system_template = (
    "You are a highly skilled Python developer. Your task is to generate Python code strictly based on the user's instructions.\n"
    "Leverage statistical and mathematical libraries such as `statsmodels`, `scipy`, and `numpy` where appropriate to solve the problem.\n"
    "Your response must contain only the Python code — no explanations, comments, or additional text.\n\n"
)

human_template = dedent("""{query}\n\n
                            Always copy the final answer to a variable `answer`
                            Code:
                        """)


input_ = {"system": {"template": system_template},
          "human": {"template": human_template,
                    "input_variable": ["query"]}}


model = ChatOpenAI(model="gpt-4o-mini")

# model = ChatGoogleGenerativeAI(
#     model="gemini-2.0-flash",
#     temperature=0,
#     max_tokens=None,
#     timeout=None,
#     max_retries=6,
#     disable_streaming=False
#     # other params...
# )


chat_prompt_template = build_standard_chat_prompt_template(input_)

code_generation = chat_prompt_template|model|StrOutputParser()

code_pipeline = code_generation|code_execution

In [None]:
class CodeArgs(BaseModel):
    query: str = Field(description="User request; 用戶需求")


def _calculator(query: str,):
    output = code_pipeline.invoke(query)
    return output


mathematic_tool = StructuredTool.from_function(
    func=_calculator,
    args_schema=CodeArgs,
    description="Use this tool to solve mathematic related problem; 使用這個工具解決數學問題",
)

In [None]:
from langchain_core.prompts import MessagesPlaceholder

model_with_tools = model.bind_tools([mathematic_tool])

system_prompt = PromptTemplate(template=dedent("""You are a helpful assistant. 
Answer all questions to the best of your ability.
"""))

system_message = SystemMessagePromptTemplate(prompt=system_prompt)

prompt = ChatPromptTemplate.from_messages(
    [
        system_message,
        MessagesPlaceholder(variable_name="messages"),
    ]
)

chatbot_pipeline = prompt | model_with_tools

### 挑戰: 北一女段考考題

https://drive.google.com/file/d/1csHdgvc5WtbJZ4n39eozogVVPIkPWABf/view

- Q1: 以下敘述是否正確: 滿足方程式 x^2 + y^2 + 2x −10y + 30 = 0 之點(x, y)的圖形是一個圓
- Q2: 以下敘述是否正確: 過三點 A( 1, − 3), B( 2, 6 ), C( 4, 24 )的圓恰有一個
- Q3: 以下敘述是否正確: 直線 3x −4y + 7 = 0 與圓 (x − 2)^2 + (y + 3)^2 = 5 恰有一交點
- Q4: 以下敘述是否正確: 圓(x − 2)^2 + (y + 3)^2 = 5 上恰有二點與直線 3x −4y −13= 0 的距離等於 2
- Q5: 以下敘述是否正確: P(a, b) 為 圓 (x − 2)^2 + ( y + 3)^2 = 4 上的點,則使 (a^2 + b^2)^0.5 為整數的點共有 8 個

In [None]:
from langchain.memory import ChatMessageHistory

chat_history = ChatMessageHistory()

In [None]:
while True:
    question = input("Your question: ")
    if question == 'quit':
        break
    chat_history.add_user_message(question)
    output = chatbot_pipeline.invoke({"messages": chat_history.messages})
    
    try:
        if output.tool_calls != []:
            print("Call tool")
            final_answer = follow_up_answer(aimessage=output)
        else:
            print("No Tool")
            final_answer = output
        print(f"AI: {final_answer.content}")
        chat_history.add_ai_message(final_answer.content)
    except KeyError:
        print(f"AI: {output.content}")
        chat_history.add_ai_message(output.content)
    
    

### 有辦法加入一些基本的機械學習來進行分析嗎?

我還不知道，應該是會蠻有趣的

到這裡你應該可以認識到，寫ChatBot本體並不困難，但一個ChatBot好不好用是由他所綑綁的工具決定。

# Websearch 策略與資料來源設計

🎯 **本章學完你將能學會什麼：**

- 理解 Websearch 的優勢、限制與常見風險  
- 學會依據應用場景選擇正確的資訊來源（Wikipedia、Fandom、API）  
- 掌握資料品質控制與來源過濾策略  
- 掌握利用 LLM 動態生成 Pydantic Schema 的技巧  
- 能建立一個可從 Wikipedia / Fandom 抽取結構化資料的自動化 Pipeline  

📘 **最終你將具備的能力：**  
- 具備以 LLM 為核心的網路資訊擷取與分析能力，能建立高品質、多來源的 AI 研究與內容生成系統。
- 能夠設計結合爬蟲、LLM 代碼生成與結構化資料擷取的智慧資料分析系統。   

## Websearch 優缺點與應用策略

### 優點
- **多元化來源**：涵蓋範圍廣，能提供多角度資訊。
- **即時性**：能快速取得公開網頁上的最新內容。
- **靈活性**：適合需要「多來源比對」的問題。

### 缺點
- **碎片化**：資訊分散、格式不一，難以直接系統化使用。
- **品質參差不齊**：來源可靠度不同，可能存在錯誤或過時資訊。
- **限制與風險**：部分 API 或搜尋過程可能因政策、安全或授權而阻擋特定內容。

---

### 何時適合使用 Websearch
- 需要 **多角度觀點**（如新聞、論壇、社群資訊）。
- **開放探索**，對來源精確度要求不高。
- 無法透過單一可靠資料庫滿足需求時。

---

### 何時更適合使用特定來源
- **專門領域**：如 Wikipedia、Fandom Wiki（例如遊戲、小說、Warhammer 40k）。
- **結構化資料需求**：來源有規則的網址與內容組織，便於程式化檢索。
- **高可信度需求**：減少處理過多雜訊。

---

### API 使用注意事項
- **安全審查阻擋**：可能因涉及「不允許或敏感內容」而無法獲取公開資料。
- **授權與限制**：包含付費牆、Rate Limit、隱私規範等。
- **備援角色**：Websearch 可作為補充方案，但不一定是萬能解決方式。

---

### 總結
Websearch 提供了 **廣泛而多元的資訊**，但也帶來 **碎片化與品質問題**。  
若需求明確、可依靠結構化且可信的來源（如 Wikipedia、Fandom），應優先選用。  
若需要多角度、開放探索或無特定資料庫可依賴時，Websearch 才能發揮最大價值。


## 返無 歸一

- 假設你確定在某個來源肯定有資訊的時候，取得該來源的網址
- 使用第一周和第三周的技巧爬取網址的內容
- 透過LLM提取你要的訊息

In [None]:
from textwrap import dedent
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate, ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate
from langchain_openai import ChatOpenAI

from src.initialization import credential_init


def build_standard_chat_prompt_template(kwargs):

    messages = []
 
    if 'system' in kwargs:
        content = kwargs.get('system')
        prompt = PromptTemplate(**content)
        message = SystemMessagePromptTemplate(prompt=prompt)
        messages.append(message)  

    if 'human' in kwargs:
        content = kwargs.get('human')
        prompt = PromptTemplate(**content)
        message = HumanMessagePromptTemplate(prompt=prompt)
        messages.append(message)
        
    chat_prompt_template = ChatPromptTemplate.from_messages(messages)
    
    return chat_prompt_template

credential_init()

### 抽取Wikipedia的內容

In [None]:
import requests

session = requests.Session()

query = "獵人中的念能力系統"

# Wikipedia語言選擇
URL = "https://ja.wikipedia.org/w/api.php"


PARAMS = {
    "action": "parse",
    # Wikipedia 頁面的 關鍵字
    "page": "HUNTER×HUNTER",
    "prop": "text",
    "format": "json"
}

HEADERS = {
    "User-Agent": "AI Tutorial Bot/1.0 (mengchiehling@gmail.com)"
}

response = session.get(url=URL, params=PARAMS, headers=HEADERS)

data = response.json()

使用 bs4 處理數據 

In [None]:
from bs4 import BeautifulSoup

html_content = data['parse']['text']["*"]

soup = BeautifulSoup(html_content, "html.parser")

# 移除 style 和 script
for tag in soup(["style", "script"]):
    tag.decompose()

# 提取文字
text_content = soup.get_text(separator="\n")

# 清理空白與空行
cleaned_text = "\n".join(
    line.strip() for line in text_content.splitlines() if line.strip()
)

慢慢寫Pydantic物件是可以實現的目標的，但是在應用中我們希望有更好的自動化: 根據使用者的需求自動生成物件

我們嘗試結合代碼生成來輔助完成目標

In [None]:
code_example = dedent("""
    # Example 1: Name extraction using Pydantic and LangChain

    from pydantic import BaseModel, Field
    from langchain_core.output_parsers import PydanticOutputParser

    class NameExtractor(BaseModel):
        name: str = Field(description="The extracted name from the input text")

    output_parser = PydanticOutputParser(pydantic_object=NameExtractor)
    format_instructions = output_parser.get_format_instructions()

    ---
    # Example 2: Multiple product information extraction using Pydantic and Langchain

    from typing import List
    
    class Product(BaseModel):
        name: str = Field(description="Product")
        brand: str = Field(description="The brand name")
        country_code: str = Field(description="ISO 3166-1 alpha-2 of the country of the brand")

    class ProductOutput(BaseModel):
        products: List[Product] = Field(description="A list of products")

    output_parser = PydanticOutputParser(pydantic_object=NameExtractor)
    format_instructions = output_parser.get_format_instructions()
    
""")


system_template = dedent(f"""
    You are an expert Python developer specializing in large language models and the LangChain framework.
    Your objective is to generate **only valid, executable Python code** that solves the user's request.

    Requirements:
    - Use Pydantic models when defining structured outputs.
    - Ensure imports are correct and minimal.
    - Follow PEP 8 formatting standards.
    - Do not include any explanations, markdown, comments, or extra text outside the code block.
    - You have have the output_parser and the format_instruction of the Pydantic models.

    Example structure:
    {code_example}
""")

human_template = dedent("""
                        {query}
                        Code:
                        """)


input_ = {"system": {"template": system_template},
          "human": {"template": human_template,
                    "input_variable": ["query"]}}

chat_prompt_template = build_standard_chat_prompt_template(input_)

model = ChatOpenAI(model="gpt-4o-mini")

code_generation = chat_prompt_template|model|StrOutputParser()

In [None]:
generated_code = code_generation.invoke({"query": query})
print(generated_code)

### 代碼執行工具

In [None]:
import re

from langchain_core.runnables import chain

@chain
def code_execution(code):
    
    match = re.findall(r"python\n(.*?)\n```", code, re.DOTALL)
    python_code = match[0]
    
    lines = python_code.strip()#.split('\n')
    # *stmts, last_line = lines

    local_vars = {}
    exec(lines, local_vars)

    return local_vars

執行產生的代碼

In [None]:
local_vars = code_execution.invoke(generated_code)

產生需要的pipeline

In [None]:
def build_answer_pipeline(output_parser, format_instructions):

    human_template = dedent("""{query}
                           
                           context:
                           {context}
                           output format instruction = {format_instruction} 
                        """)


    input_ = {"system": {"template": "You are a helpful AI assistant."},
              "human": {"template": human_template,
                        "input_variable": ["query"],
                        "partial_variables": {'format_instruction': format_instructions}
                        }}
    
    chat_prompt_template = build_standard_chat_prompt_template(input_)
    
    model = ChatOpenAI(model="gpt-4o-mini")
    
    answer_pipeline = chat_prompt_template|model|output_parser

    return answer_pipeline

執行

In [None]:
answer_pipeline = build_answer_pipeline(output_parser=local_vars['output_parser'], format_instructions=local_vars['format_instructions'])

output = answer_pipeline.invoke({"query": query, "context": cleaned_text})

### Fan Wiki

萬機之神歐姆尼賽亞的化身

https://warhammer40k.fandom.com/wiki/Titan

In [None]:
import requests

from bs4 import BeautifulSoup


def parsing_process(url):
    """
    Fetches and extracts text content from a given URL.

    Parameters:
    url (str): The URL of the web page to fetch and parse.

    Returns:
    str: Cleaned text content extracted from the web page.

    Raises:
    requests.exceptions.RequestException: If an error occurs while fetching the URL.

    Notes:
    - This function sends a GET request to the specified URL.
    - It uses BeautifulSoup to parse the HTML content of the response.
    - Any <style> tags in the HTML are removed to extract only textual content.
    - The extracted text is cleaned by removing extra whitespace and empty lines.
    """
    # Send a GET request to the URL
    response = requests.get(url)

    # Get the content of the response
    html_content = response.text
    
    soup = BeautifulSoup(html_content, 'html.parser')
    
    # 移除 style 和 script
    for tag in soup(["style", "script"]):
        tag.decompose()

    # Extract and print only the text content
    text_content = soup.get_text(separator='\n')

    # Clean up the text (optional)
    cleaned_text = '\n'.join(line.strip() for line in text_content.splitlines() if line.strip())
    
    return cleaned_text


url = "https://warhammer40k.fandom.com/wiki/Titan"

提取網頁內容

In [None]:
cleaned_text = parsing_process(url)

#### 基本問題

In [None]:
query = "幫我找出所有忠誠派泰坦級別"

In [None]:
generated_code = code_generation.invoke({"query": query})

local_vars = code_execution.invoke(generated_code)

建立新的pipeline並且執行

In [None]:
answer_pipeline = build_answer_pipeline(output_parser=local_vars['output_parser'], format_instructions=local_vars['format_instructions'])

output = answer_pipeline.invoke({"query": query, "context": cleaned_text})

#### 試試看更具有挑戰的問題

In [None]:
query = "幫我找出所有陣營的所有泰坦級別"

generated_code = code_generation.invoke({"query": query})

In [None]:
local_vars = code_execution.invoke(generated_code)

answer_pipeline = build_answer_pipeline(output_parser=local_vars['output_parser'], format_instructions=local_vars['format_instructions'])

In [None]:
output = answer_pipeline.invoke({"query": query, "context": cleaned_text})

## 加入Callback 進行追蹤ChatBot