# Image Generation

OpenAI 提供 文生圖 (text to image) 和 圖生圖 (image to image) API。

## GPT-Image-1

The latest image generation model released by OpenAI. Therefore we will work with this.

Please update your 'openai' package to 1.97.0 to see the latest documentation

優點:
- 出圖穩定 (似乎固定了Random Seed)
- 出圖品質高，細節好

建議使用英文作為Prompt。

### OpenAI Image API Parameters:

https://platform.openai.com/docs/guides/image-generation?image-generation-model=gpt-image-1
https://cookbook.openai.com/examples/generate_images_with_gpt_image

- model: gpt-image-1
    - size (str): 1024x1024 (square), 1536x1024 (landscape), 1024x1536 (portrait) or auto (default)
    - quality: low, medium, high or auto
    - moderation: auto, low

In [None]:
import os

os.chdir("../../../")

範例

In [None]:
from openai import OpenAI

from src.initialization import credential_init

credential_init()

client = OpenAI()

In [None]:
import base64

from IPython.display import display, HTML

prompt = ("A Sumi-e style watercolor painting of mountains during sunset. The sky is depicted with bold "
          "splashes of orange, pink, and purple hues, blending and overlapping in a dynamic composition. "
          "The mountains are represented with expressive brushstrokes, emphasizing their majestic and serene "
          "presence. The focus is on capturing the essence and mood of the scene rather than detailed realism. "
          "The overall effect is serene and contemplative, with a harmonious balance of color and form.")

response = client.images.generate(
    model="gpt-image-1",
    prompt=prompt,
    size="1024x1024",
    # quality="hd",
    quality='high',
    n=1,
    # response_format = 'b64_json'
)

image_base64 = response.data[0].b64_json

# 將返回的 base64字串轉換為圖像並且儲存
HTML(f'<img src="data:image/png;base64,{image_base64}"/>')

## 挑戰：

### 如何有效地撰寫 Text-to-Image 提示詞

在使用 AI 生成圖像（例如 OpenAI 的 Image-1）時，提示詞（prompt）的寫法對結果有決定性影響。主要有兩種提示詞格式：

- 標籤式提示（Danbooru Tag):

    - 範例:    

        masterpiece, best quality, beautiful eyes, clear eyes, detailed eyes, Blue-eyes, 1girl, 20_old, full-body, break, smoking, break, high_color, blue-hair, beauty, black-boots,break, break, Flat vector art, Colorful art, white_shirt, simple_background, blue_background, Ink art, peeking out upper body, Eyes


    - 特點與注意事項：

        - 生效與否取決於模型，不同模型對同一個標籤的理解可能不同。
        - 某些標籤是通用的，例如 1girl、ulzzang，但呈現效果可能差異很大。
        - 一些標籤需要專業知識，例如 chiaroscuro（明暗對照法）。
        - 需要多次嘗試與微調，才能找到最佳組合。

2. 自然語言提示（Natural Language Prompt):

    - 範例:

       A Japanese idol with a breathtakingly glamorous ulzzang appearance,  She has a slim, v-shaped face with large, almond-shaped eyes that sparkle with a lustrous, captivating charm, exuding an aura of youth and ethereal beauty. Her expression is innocent yet alluring, with flawless porcelain skin that enhances her delicate, anime-inspired features. The setting is carefully crafted to complement her enchantment, with soft, diffused lighting that accentuates her mesmerizing, glamorous presence, creating a dreamy and youthful, anime-like allure.


    - 特點與注意事項：

        - 句子寫得流暢、語言優美，能提升生成圖像的質感。

        - 對非母語使用者來說，整合多個描述性詞彙是一大挑戰。

        - 部分詞彙在監控嚴格的模型下可能會被屏蔽，例如 serafuku。

        - Image-1 等模型可能會對過於明顯的 NSFW 提示詞進行攔截。若想生成 NSFW 的內容，建議可以參考開源社群，例如 TensorArt。

### 2. 如何融入LCEL裡?

### 讓語言模型幫忙提升Prompt品質

#### Step1

可以給予內容，並且讓文字模型幫忙寫提示詞。並且可以考慮使用mlflow監視產出的提示詞

In [None]:
from langchain_core.prompts.image import ImagePromptTemplate
from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate, SystemMessagePromptTemplate
from langchain_core.output_parsers import StrOutputParser


def build_standard_chat_prompt_template(kwargs):
    messages = []

    if 'system' in kwargs:
        content = kwargs.get('system')

        # allow list of prompts for multimodal
        if isinstance(content, list):
            prompts = [PromptTemplate(**c) for c in content]
        else:
            prompts = [PromptTemplate(**content)]

        message = SystemMessagePromptTemplate(prompt=prompts)
        messages.append(message)

    if 'human' in kwargs:
        content = kwargs.get('human')

        # allow list of prompts for multimodal
        if isinstance(content, list):
            prompts = []
            for c in content:
                if c.get("type") == "image":
                    prompts.append(ImagePromptTemplate(**c))
                else:
                    prompts.append(PromptTemplate(**c))
        else:
            if content.get("type") == "image":
                prompts = [ImagePromptTemplate(**content)]
            else:
                prompts = [PromptTemplate(**content)]

        message = HumanMessagePromptTemplate(prompt=prompts)
        messages.append(message)

    chat_prompt_template = ChatPromptTemplate.from_messages(messages)
    
    return chat_prompt_template



system_template = ("You are a helpful AI assistant and an art expert with extensive knowledge of photography "
                   "and illustration. You excel at creating breathtaking masterpieces with the DALLE-3 model. "
                   "For this task, you will be provided with a description of an image, and you will generate a "
                   "corresponding DALLE-3 prompt. The prompt should be detailed and descriptive, capturing the "
                   "essence of the image.")

human_template = "{image_desc}"

input_ = {"system": {"template": system_template},
          "human": {"template": human_template,
                    "input_variable": ["image_desc"]}}
    
chat_prompt_template = build_standard_chat_prompt_template(input_)

nl_prompt_generation_chain = chat_prompt+template | model | StrOutputParser()

#### Step2

將生成的提示詞放入影像生成API中

In [None]:
from operator import itemgetter
from typing import Dict

from langchain_core.runnables import chain, RunnableLambda, RunnableParallel, RunnablePassthrough


@chain
def gpt_image_worker(kwargs: Dict):

    """
    Generates an image using OpenAI's GPT-Image-1 model based on the provided prompt and optional parameters.
    
    Parameters:
    kwargs (Dict): A dictionary containing the following keys:
        - 'nl_prompt' (str): The natural language prompt describing the image to be generated.
        - 'size' (str, optional): The size of the generated image. Default is "1024x1024".
        - 'quality' (str, optional): The quality of the generated image. Default is "medium".
    
    Returns:
    str: image base64 string
    """
    
    print("Start generating image...")
    print(f"prompt: {kwargs['nl_prompt']}")
    client = OpenAI()

    response = client.images.generate(
        model="gpt-image-1",
        prompt=kwargs['nl_prompt'],
        size=kwargs.get("size", "1024x1024"),
        quality=kwargs.get('quality', 'medium'),
        moderation=kwargs.get('moderation', 'auto'),
        n=1)

    image_base64 = response.data[0].b64_json
    
    return image_base64


@chain
def base64_to_file(kwargs):

    """
    Save the image from a base64 string
    """
    
    image_base64 = kwargs['image_base64']
    filename = kwargs['filename']
    
    with open(f"{filename}", "wb") as fh:
        fh.write(base64.b64decode(image_base64))

In [None]:
# step 1: 生成依照你想要的圖像描述圖像提示詞
step_1 = RunnablePassthrough.assign(nl_prompt=itemgetter('image_desc')|nl_prompt_generation_chain)

# step 2: 生成圖像，並將base64字串放入image_base64變數中
step_2 = RunnablePassthrough.assign(image_base64=gpt_image_worker)

# step 3: 將base64字串儲存為圖像
step_3 = base64_to_file

# 將三個步驟由水管符號(|)連結起來
gpt_image_chain =  step_1|step_2|step_3

In [None]:
gpt_image_chain.invoke({"size": "1024x1536",
                     "quality": "medium",
                     "image_desc": dedent("""warhammer 40k, astartes, power armor, chain sword, purity seal, 
                     oil painting, cinematic view, battle field, black templars, sacred light upon the """),
                     "filename": "tutorial/LLM+Langchain/Week-8/astartes.png"
                    })

# 圖像渲染(Image Render)

「圖像渲染」(Image to Image, 簡稱 Img2Img) 指的是：
在已有圖片的基礎上，搭配新的提示詞 (prompt)，生成另一張風格或內容有所變化的圖片。

## ✨ 特點

1. 輸入與輸出

    - 輸入：一張已有的圖片 + 提示詞

    - 輸出：根據提示詞改造過的圖片

2. 靈活性

    - 可以保留原圖的結構（例如人物姿勢），只改變細節（如髮色、衣服、場景）。

    - 也可以做風格轉換，讓照片變成油畫風、漫畫風、插畫風。

3. 常見應用

    - 修圖：去除背景、修改臉部細節、換衣服。

    - 風格化：將現實照片轉成動漫風、插畫風。

    - 迭代設計：快速嘗試不同的角色服裝、髮型或環境。

    - 局部修改 (Inpainting)：在圖片上指定區域，僅對該區域進行替換或修補。

In [None]:
import os

os.chdir("../../../")

In [None]:
from pathlib import Path
from IPython.display import display, HTML

# Build HTML string
html = '<div style="display: flex; flex-direction: column;">'

html += '<div style="display: flex; justify-content: space-around; margin-bottom: 10px;">'
html += f'''
    <div>
        <img src="Eve_Stellar_Blade.png" style="width: 300px; height: auto; border-radius: 8px; box-shadow: 2px 2px 6px rgba(0,0,0,0.2);" />
    </div>
'''
html += '</div>'

html += '</div>'

# Display the HTML
display(HTML(html))

In [None]:
from textwrap import dedent

prompt = dedent("""
Please rending this image as a realistic photo of a girl cosplaying. A Korean girl with a
slim, v-shaped face with large, almond-shaped eyes that sparkle with captivating charm, exuding 
an aura of youth and ethereal beauty. With flawless skin that enhances her delicate, 
anime-inspired features. The setting is carefully crafted to complement her enchantment, with 
soft, diffused lighting that accentuates her mesmerizing, glamorous presence, creating a dreamy 
and youthful, anime-like allure. Her makeup should resemble the features of K-beauty, such as pale skin tones 
and dewed skin texture. 
""")


image_path = os.path.join("tutorial", "LLM+Langchain", "Week-7", "Eve_Stellar_Blade.png")

result_edit = client.images.edit(
    model="gpt-image-1",
    image=open(image_path, "rb"), 
    prompt=prompt,
    size="1024x1536",
    input_fidelity="high", # 這個選項需要openai 1.97.0以上版本
    #quality="high"
)

image_base64 = result_edit.data[0].b64_json

In [None]:
client.images.edit?

In [None]:
HTML(f'<img src="data:image/png;base64,{image_base64}" />')

You can use one or more images as a reference to generate a new image.

In this example, we'll use 4 input images to generate a new image of a gift basket containing the items in the reference images.

- Noshiro 能代 (Azur Lane)

In [None]:
html = '<div style="display: flex; flex-direction: column;">'

html += '<div style="display: flex; justify-content: space-around; margin-bottom: 10px;">'
html += f'''
    <div>
        <div>
            <img src="Noshiro - Spring.png" style="width: 300px; height: auto; border-radius: 8px; box-shadow: 2px 2px 6px rgba(0,0,0,0.2);" />
            <img src="Noshiro - Summer.png" style="width: 300px; height: auto; border-radius: 8px; box-shadow: 2px 2px 6px rgba(0,0,0,0.2);" />
        </div>
        <div>
            <img src="Noshiro - Fall.png" style="width: 300px; height: auto; border-radius: 8px; box-shadow: 2px 2px 6px rgba(0,0,0,0.2);" />
            <img src="Noshiro - Winter.png" style="width: 300px; height: auto; border-radius: 8px; box-shadow: 2px 2px 6px rgba(0,0,0,0.2);" />
        </div>
    </div>
'''
html += '</div>'

html += '</div>'

# Display the HTML
display(HTML(html))

In [None]:
image_1 = os.path.join("tutorial", "LLM+Langchain", "Week-7", "Noshiro - Spring.png")
image_2 = os.path.join("tutorial", "LLM+Langchain", "Week-7", "Noshiro - Summer.png")
image_3 = os.path.join("tutorial", "LLM+Langchain", "Week-7", "Noshiro - Fall.png")
image_4 = os.path.join("tutorial", "LLM+Langchain", "Week-7", "Noshiro - Winter.png")

result_edit = client.images.edit(
    model="gpt-image-1",
    image=[
        # open(image_1, "rb"),
        open(image_2, "rb"),
        # open(image_3, "rb"),
        # open(image_4, "rb"),
    ],
    prompt=dedent("""
    Create an advertisement of a high end perfume based on the reference image. 
    The advertisement should deliver a mesmerizingly glamorous texture. 
    """),
    size="1024x1536",
    input_fidelity="high", # 這個選項需要openai 1.97.0以上版本
    quality="high"
)

image_base64 = result_edit.data[0].b64_json

In [None]:
HTML(f'<img src="data:image/png;base64,{image_base64}"/>')

In [None]:
with open(os.path.join("tutorial", "LLM+Langchain", "Week-7", "Noshiro - Summer - Advertisement.png"), "wb") as fh:
    fh.write(base64.b64decode(image_base64))

### 將不同圖片的內容融合在一起

圖片來源: https://tensor.art/u/629260971684229814

In [None]:
html = '<div style="display: flex; flex-direction: column;">'

html += '<div style="display: flex; justify-content: space-around; margin-bottom: 10px;">'
html += f'''
    <div>
        <div>
            <img src="maehara-1.jpg" style="width: 300px; height: auto; border-radius: 8px; box-shadow: 2px 2px 6px rgba(0,0,0,0.2);" />
            <img src="maehara-2.jpg" style="width: 300px; height: auto; border-radius: 8px; box-shadow: 2px 2px 6px rgba(0,0,0,0.2);" />
        </div>
    </div>
'''
html += '</div>'

html += '</div>'

# Display the HTML
display(HTML(html))

In [None]:
image_a = os.path.join("tutorial", "LLM+Langchain", "Week-7", "maehara-1.jpg")
image_b = os.path.join("tutorial", "LLM+Langchain", "Week-7", "maehara-2.jpg")

In [None]:
result_edit = client.images.edit(
    model="gpt-image-1",
    image=[
        open(image_a, "rb"),
        open(image_b, "rb"),
    ],
    prompt=dedent("""
    Fusion the two images to create a high definition 8k movie poster with the text as the background. 
    """),
    size="1024x1536",
    input_fidelity="high", # 這個選項需要openai 1.97.0以上版本
    quality="high"
)

image_base64 = result_edit.data[0].b64_json

In [None]:
HTML(f'<img src="data:image/png;base64,{image_base64}"/>')

### 局部修補 (Inpaint)

你可以提供一個遮罩 (mask) 來指定圖像中要被編輯的區域。

當在 GPT Image 中使用遮罩時，額外的指令會一併傳送給模型，以便更好地引導編輯過程。

#### 遮罩的要求

要編輯的圖片與遮罩必須為相同的格式與尺寸，且檔案大小需小於 50MB。

遮罩圖片必須包含 Alpha 通道。如果你是使用圖像編輯工具來製作遮罩，請確保在儲存時保留 Alpha 通道。


#### Bug Report

https://community.openai.com/t/gpt-image-1-problems-with-mask-edits/1240639/15

Image-1 在 inpainting 似乎做的很糟糕。

In [None]:
html = '<div style="display: flex; flex-direction: column;">'

html += '<div style="display: flex; justify-content: space-around; margin-bottom: 10px;">'
html += f'''
    <div>
        <div>
            <img src="Noshiro - Winter - Mask.png" style="width: 300px; height: auto; border-radius: 8px; box-shadow: 2px 2px 6px rgba(0,0,0,0.2);" />
        </div>
    </div>
'''
html += '</div>'

html += '</div>'

# Display the HTML
display(HTML(html))

In [None]:
image_in = os.path.join("tutorial", "LLM+Langchain", "Week-7", "Noshiro - Winter.png")
image_mask = os.path.join("tutorial", "LLM+Langchain", "Week-7", "Noshiro - Winter - Mask.png")

result_edit = client.images.edit(
    model="gpt-image-1",
    image=open(image_in, "rb"),
    mask=open(image_mask, "rb"),
    prompt=dedent("""
    In the winter, a girl walking on water and holding Mjölnir. Mjölnir is surrounded with electricity and current. 
    """),
    size="1024x1536",
    input_fidelity="high", # 這個選項需要openai 1.97.0以上版本
    quality="high"
)

image_base64 = result_edit.data[0].b64_json

In [None]:
HTML(f'<img src="data:image/png;base64,{image_base64}"/>')

# Agent

代理型系統是一種能自己動作的 AI，它可以理解輸入、思考、規劃，並執行任務來達成目標。
和一般只能依照單一提示回應的模型不同，代理型系統能自己產生提示、使用外部工具、記住對話內容，並且透過規劃和反思來調整行為。
這讓它能更自動化地解決問題，並把不同功能組合起來，幫助使用者更有效率地完成事情。

## ReAct Framework

- ReAct: Reasoning - Action

- ReAct Agent 的運作流程大致是：

    1. 思考 (Reasoning)：根據當前的上下文，生成內部的推理或計劃。

    2. 行動 (Acting)：根據推理的結果，決定要採取的動作（例如查詢工具、呼叫 API、檢索知識）。

    3. 觀察 (Observation)：得到工具或環境回饋。

    4. 迭代：將觀察結果再輸入回去，進入下一輪思考。

    直到：

    a. 達到最終答案，或

    b. 遇到設置的停止條件（例如 token 限制、步數限制、明確的 "結束" 信號）。

In [None]:
from IPython.display import Image

Image(url='https://statusneo.com/wp-content/uploads/2024/01/fe9fa1ac-dfde-4d91-8b5b-4497b742c414_1400x686.jpg')

### ReAct Template

Answer the following questions as best you can. You have access to the following tools:

{tools}

Use the following format:

Question: the input question you must answer

Thought: you should always think about what to do

Action: the action to take, should be one of [{tool_names}]

Action Input: the input to the action

Observation: the result of the action

... (this Thought/Action/Action Input/Observation can repeat N times)

Thought: I now know the final answer

Final Answer: the final answer to the original input question

Begin!

Question: {input}

Thought:{agent_scratchpad}

In [1]:
import os

os.chdir("../../../")

### 建立MLflow監控

mlflow server --host 127.0.0.1 --port 8080

In [2]:
import mlflow
from langchain_community.callbacks import MlflowCallbackHandler
from langchain_openai import ChatOpenAI

from src.initialization import credential_init

credential_init()

experiment = "Week-7"
uri = "http://127.0.0.1:8080"

mlflow.set_tracking_uri(uri=uri)

# Start or get an MLflow run explicitly
mlflow.set_experiment(experiment)

  from .autonotebook import tqdm as notebook_tqdm


<Experiment: artifact_location='mlflow-artifacts:/211253271335259748', creation_time=1761307703813, experiment_id='211253271335259748', last_update_time=1761307703813, lifecycle_stage='active', name='Week-7', tags={'mlflow.experimentKind': 'custom_model_development'}>

In [3]:
class CleanMlflowCallbackHandler(MlflowCallbackHandler):

    def __init__(self, experiment, run_id, tracking_uri, name="CleanMLflow"):
        super().__init__(experiment=experiment, run_id=run_id, tracking_uri=tracking_uri, name=name)
    
    def on_llm_new_token(self, token: str, **kwargs):
        # Suppress per-token logging to MLflow
        # 若你不這麼做的話
        # 1. artifacts 資料夾會被成千上萬個小 JSON 檔案塞滿。
        # 2. 更嚴重的是，過多的逐 token I/O 會大幅拖慢執行速度。
        return None

In [4]:
run = mlflow.start_run(run_name="react-agent")

# then use this handler instead of the default one
mlflow_cb_model = CleanMlflowCallbackHandler(
    experiment=experiment,
    run_id=run.info.run_id,
    tracking_uri=uri,
)


model = ChatOpenAI(openai_api_key=os.environ['OPENAI_API_KEY'],
                   model_name="gpt-4o", temperature=0, 
                   callbacks=[mlflow_cb_model],
                   name="my_model"
                  )

Could not import spacy python package. Please install it with `pip install spacy`.
Could not import textstat python package. Please install it with `pip install textstat`.


In [5]:
from langchain.prompts import PromptTemplate
from langchain.agents import AgentExecutor, create_react_agent

from src.agent.react_zero_shot import prompt_template as zero_shot_prompt_template

prompt = PromptTemplate(template=zero_shot_prompt_template)

zero_shot_agent = create_react_agent(
    llm=model, ## llm是 Agent的思考中樞，這個llm會決定agent總體上的大致表現，建議越強越好
    tools=[],
    prompt=prompt,
)

class DebugMlflowCallbackHandler(MlflowCallbackHandler):
    
    def __init__(self, experiment, run_id, tracking_uri, name="CleanMLflow"):
        super().__init__(experiment=experiment, run_id=run_id, tracking_uri=tracking_uri)
        self.name = name
    
    def on_chain_error(self, error, **kwargs):
        print(f"Chain error: {error}")
        super().on_chain_error(error, **kwargs)

    def on_tool_error(self, error, **kwargs):
        print(f"Tool error: {error}")
        super().on_tool_error(error, **kwargs)


mlflow_cb_agent = DebugMlflowCallbackHandler(
    experiment=experiment,
    run_id=run.info.run_id,
    tracking_uri=uri,
)


agent_executor = AgentExecutor(agent=zero_shot_agent, tools=[], verbose=True, callbacks=[mlflow_cb_agent], name='my_agent')

Could not import spacy python package. Please install it with `pip install spacy`.
Could not import textstat python package. Please install it with `pip install textstat`.


In [6]:
agent_executor.invoke({"input": "Please calculate the area of a circle that has a radius of 10.923mm"})

Error in DebugMlflowCallbackHandler.on_chain_start callback: AttributeError("'NoneType' object has no attribute 'items'")




[1m> Entering new my_agent chain...[0m
[32;1m[1;3mTo calculate the area of a circle, we use the formula:

\[ \text{Area} = \pi \times r^2 \]

where \( r \) is the radius of the circle.

Given that the radius \( r = 10.923 \) mm, we can substitute this value into the formula.

Action: Calculate the area using the formula

Action Input: \( \pi \times (10.923)^2 \)
[0mCalculate the area using the formula is not a valid tool, try one of [].[32;1m[1;3mI need to manually calculate the area of the circle using the given formula.

Given:
- Radius \( r = 10.923 \) mm

The formula for the area of a circle is:
\[ \text{Area} = \pi \times r^2 \]

Substitute the given radius into the formula:
\[ \text{Area} = \pi \times (10.923)^2 \]

Calculate \( (10.923)^2 \):
\[ (10.923)^2 = 119.349729 \]

Now, multiply by \( \pi \) (approximately 3.14159):
\[ \text{Area} \approx 3.14159 \times 119.349729 \]

\[ \text{Area} \approx 374.860 \, \text{mm}^2 \]

Final Answer: The area of the circle is appro

{'input': 'Please calculate the area of a circle that has a radius of 10.923mm',
 'output': 'The area of the circle is approximately 374.860 mm².'}

In [None]:
mlflow.end_run()

## 調動Tools

我們知道LLM不是讓你算數學用的。整數的加減法可能可以，

In [None]:
run = mlflow.start_run(run_name="react-agent-with-tool")

In [None]:
import re
from math import pi
from typing import Union
from textwrap import dedent

from langchain.tools import BaseTool
from langchain_core.runnables import chain
from langchain_core.prompts import PromptTemplate, ChatPromptTemplate, \
SystemMessagePromptTemplate, HumanMessagePromptTemplate
from langchain_core.output_parsers import StrOutputParser


def build_standard_chat_prompt_template(kwargs):

    messages = []
 
    if 'system' in kwargs:
        content = kwargs.get('system')
        prompt = PromptTemplate(**content)
        message = SystemMessagePromptTemplate(prompt=prompt)
        messages.append(message)  

    if 'human' in kwargs:
        content = kwargs.get('human')
        prompt = PromptTemplate(**content)
        message = HumanMessagePromptTemplate(prompt=prompt)
        messages.append(message)
        
    chat_prompt = ChatPromptTemplate.from_messages(messages)
    
    return chat_prompt


@chain
def code_execution(code):
    
    match = re.findall(r"python\n(.*?)\n```", code, re.DOTALL)
    python_code = match[0]
    
    lines = python_code.strip()#.split('\n')
    # *stmts, last_line = lines

    local_vars = {}
    exec(lines, {}, local_vars)

    return local_vars


system_template = (
    "You are a highly skilled Python developer. Your task is to generate Python code strictly based on the user's instructions.\n"
    "Leverage statistical and mathematical libraries such as `statsmodels`, `scipy`, and `numpy` where appropriate to solve the problem.\n"
    "Your response must contain only the Python code — no explanations, comments, or additional text.\n\n"
)

human_template = dedent("""{query}\n\n
                            Always copy the final answer to a variable `answer`
                            Code:
                        """)


input_ = {"system": {"template": system_template},
          "human": {"template": human_template,
                    "input_variable": ["query"]}}

code_chat_prompt_template = build_standard_chat_prompt_template(input_)


# then use this handler instead of the default one
mlflow_cb_model = CleanMlflowCallbackHandler(
    experiment=experiment,
    run_id=run.info.run_id,
    tracking_uri=uri,
)

model_agent = ChatOpenAI(openai_api_key=os.environ['OPENAI_API_KEY'],
                         model_name="gpt-4o", temperature=0, 
                         callbacks=[mlflow_cb_model]
                        )

model_coder = ChatOpenAI(openai_api_key=os.environ['OPENAI_API_KEY'],
                         model_name="gpt-4o-mini", temperature=0, 
                         callbacks=[mlflow_cb_model]
                        )

code_generation = code_chat_prompt_template|model_coder|StrOutputParser()

code_pipeline = code_generation|code_execution


class MathTool(BaseTool):
    name:str = "Math calculator"
    description:str = dedent("""
    Use this tool to solve algorithmic problem by python programming.
    """)
    
    def _run(self, query: str):
        
        return  code_pipeline.invoke({"query": query})
    
    def _arun(self, radius: Union[int, float]):
        raise NotImplementedError("This tool does not support async")
    

In [None]:
from langchain.agents import AgentExecutor, create_react_agent

from src.agent.react_zero_shot import prompt_template as zero_shot_prompt_template


prompt = PromptTemplate(template=zero_shot_prompt_template)

tools = [MathTool()]

zero_shot_agent = create_react_agent(
    llm=model_agent, ## llm是 Agent的思考中樞，這個llm會決定agent總體上的大致表現，建議越強越好
    tools=tools,
    prompt=prompt,
)

agent_executor = AgentExecutor(agent=zero_shot_agent, tools=tools, verbose=True)

In [None]:
agent_executor.invoke({"input": "Please calculate the area of a circle that has a radius of 10.923mm"})

In [None]:
mlflow.end_run()

## WebSearch Tool

### 如何讓Tool接收複數的變數?

利用之前學過的 Pydantic, 也可以用ResponseSchema.

In [None]:
from openai import OpenAI

client = OpenAI()

In [None]:
from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field


class Inputs(BaseModel):
    query: str = Field(description="User query")
    country_code: str = Field(description="ISO 3166-1 alpha-2 suggested by the language of the user query")

In [None]:
class SearchTool(BaseTool):

    input_output_parser: PydanticOutputParser = PydanticOutputParser(pydantic_object=Inputs)
    input_format_instructions: str = input_output_parser.get_format_instructions()
    
    name:str = "websearch tool"
    description_template:str = dedent("""
    Currently it is 2025.    
    Use this tool to collect information from the internet, when you are not sure you know the answer.
    The input contains the user's question `query` and the ISO 3166-1 alpha-2 `country_code` inferred from the user's language.
    input format instructions: {input_format_instructions}
    """)

    description: str = description_template.format(input_format_instructions=input_format_instructions)
    
    def _run(self, query):
        
        input_ = self.input_output_parser.parse(query)
        
        query = input_.query
        country_code = input_.country_code
        
        messages = [{"role": "user",
                     "content": query}]

        response = client.responses.create(
                    model="gpt-4o-mini",
                    tools=[
                        {"type": "web_search",
                         "user_location":{
                             "type": "approximate",
                             "country": country_code,
                         },
                        "seearch_context_size": "medium"
                        }],
                    tool_choice="auto",
                    input=query)
        
        return response.output_text
    
    def _arun(self, query: str):
        raise NotImplementedError("This tool does not support async")

In [None]:
tools = [SearchTool()]

zero_shot_agent = create_react_agent(
    llm=model,
    tools=tools,
    prompt=prompt,
)

agent_executor = AgentExecutor(agent=zero_shot_agent, tools=tools, verbose=True)

In [None]:
agent_executor.invoke({"input": "現任台灣總統的老家是否是違建?"})

### 維基百科查詢設置

In [None]:
from langchain_core.runnables import chain, Runnable
from langchain_community.tools.wikipedia.tool import WikipediaQueryRun
from langchain_community.utilities.wikipedia import WikipediaAPIWrapper

wikipedia = WikipediaQueryRun(api_wrapper=WikipediaAPIWrapper())

now = datetime.now()
current_time = now.strftime("%Y-%B")

# 一個快速建立tool的方法
search_tool = Tool(
    name="Wikipedia search engine tool",
    func=wikipedia.run,
    description=f'Wikipedia is up to date to {current_time}. Use this tool to help you answer questions.'
)

tools = [search_tool]

In [None]:
isinstance(wikipedia, Runnable)

### 自定義向量儲存庫做為工具

In [None]:
import pandas as pd
from langchain.docstore.document import Document
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(model_name="BAAI/bge-m3")

vectorstore = FAISS.load_local(
    "tutorial/LLM+Langchain/Week-5/warhammer 40k codex", embeddings, 
    allow_dangerous_deserialization=True
)

In [None]:
from langchain_core.runnables import ConfigurableField

retriever = vectorstore.as_retriever(search_type="similarity", k=10).configurable_fields( \
                                        search_kwargs=ConfigurableField(
                                                id="search_kwargs",
                                            )
                                        )

In [None]:
from typing import Literal

class Inputs(BaseModel):
    query: str = Field(description="User query")
    clan: Literal['Adeptus Mechanicus', 'Aeldari', 'Black Templars'] = Field(description="")


class CodexRetrievalTool(BaseTool):

    input_output_parser: PydanticOutputParser = PydanticOutputParser(pydantic_object=Inputs)
    input_format_instruction: str = input_output_parser.get_format_instructions()
    
    name:str = "warhammer 40k codex"
    description_template:str = dedent("""
    This tool can be used to retrieve relevant information about warhammer 40k, 
    particularly Adeptus Mechanicus, Aeldari, Black Templars.
    The inputs contains user's question `query` and the party/clan `clan`.
    input format instructions: {input_format_instruction}
    """)

    description: str = description_template.format(input_format_instruction=input_format_instruction)
    
    def _run(self, query):
        
        input_ = self.input_output_parser.parse(query)

        query = input_.query
        clan = input_.clan
        
        retrievd_documents = retriever.invoke(query, config={"configurable": 
                                                             {"search_kwarg": {"filename": f"Codex - {clan}"}}})
        
        return retrievd_documents
    
    def _arun(self, query: str):
        raise NotImplementedError("This tool does not support async")

In [None]:
tools = [CodexRetrievalTool()]

model = ChatOpenAI(openai_api_key=os.environ['OPENAI_API_KEY'],
                   model_name="gpt-4o", temperature=0, 
                   callbacks=[mlflow_cb_model]
                  )

zero_shot_agent = create_react_agent(
    llm=model,
    tools=tools,
    prompt=prompt,
)

agent_executor = AgentExecutor(agent=zero_shot_agent, tools=tools, verbose=True, callbacks=[mlflow_cb_agent])

In [None]:
agent_executor.invoke({"input": "Who is the leader of Aeldari?"})

### Tools 模組

剛剛的MathTool 和 SearchTool的範例中，我們都必須要一個個的建立客製化工具，但同時我們也發現這些工具都有相同的結構:

{
- runnable: 
- description:
- name: 
- input_parser:
- 
}


那能不能直接寫好需要的內容，然後使用for loop套模板建立工具?

In [None]:
class ToolTemplate(BaseTool):
    runnable: Runnable
    name: str
    input_parser: PydanticOutputParser
    description: str

    @classmethod
    def create(cls, runnable: Runnable, name: str, description: str,
               input_parser: PydanticOutputParser):
        
        input_format_instruction = input_parser.get_format_instructions()
        
        description = description_template.format(
            input_format_instruction=input_format_instruction
        )
        
        return cls(runnable=runnable, name=name, description=description,
                   input_parser=input_parser)
    
    def _run(self, query: str):

        input_ = self.input_parser.parse(query)

        runnable_inputs = input_.model_dump()
        
        if self.input_parser is None:
            return self.runnable.invoke({"query": query}) # 這行暗示了runnable要將query作為一個輸入
        else:
            input_pydantic = self.input_parser.parse(query)
            # 將 pydantic 物件轉換為 python dictionary 物件
            input_dict = input_pydantic.model_dump()
            return self.runnable.invoke(input_dict)

    def _arun(self, query: str):
        raise NotImplementedError("This tool does not support async")

## 聊天代理 (Conversational Agent)

In [None]:
import os

os.chdir("../../../")

In [None]:
from src.agent.react_chat import prompt_template as chat_prompt_template
from src.initialization import credential_init

credential_init()

print(chat_prompt_template)

In [None]:
from langchain.output_parsers import PydanticOutputParser
from langchain.tools import BaseTool
from pydantic import BaseModel, Field
from openai import OpenAI


client = OpenAI()


class Inputs(BaseModel):
    query: str = Field(description="User query")
    country_code: str = Field(description="ISO 3166-1 alpha-2 suggested by the language of the user query")


class SearchTool(BaseTool):

    input_output_parser: PydanticOutputParser = PydanticOutputParser(pydantic_object=Inputs)
    input_format_instructions: str = input_output_parser.get_format_instructions()
    
    name:str = "websearch tool"
    description_template:str = dedent("""
    Currently it is 2025.    
    Use this tool to collect information from the internet, when you are not sure you know the answer.
    The input contains the user's question `query` and the ISO 3166-1 alpha-2 `country_code` inferred from the user's language.
    input format instructions: {input_format_instructions}
    """)

    description: str = description_template.format(input_format_instructions=input_format_instructions)
    
    def _run(self, query):
        
        input_ = self.input_output_parser.parse(query)
        
        query = input_.query
        country_code = input_.country_code
        
        messages = [{"role": "user",
                     "content": query}]

        response = client.chat.completions.create(
            model='gpt-4o-mini-search-preview',
            web_search_options={"search_context_size": 'medium',
                                "user_location": {
                                        "type": "approximate",
                                        "approximate": {
                                            "country": country_code,
                                        }
                                    },
                                },
            messages=messages
        )
        
        return response.choices[0].message.content
    
    def _arun(self, query: str):
        raise NotImplementedError("This tool does not support async")


search_tool = SearchTool()

In [None]:
from textwrap import dedent
from datetime import datetime

from langchain.agents import Tool
from langchain_openai import ChatOpenAI
from langchain_community.tools.wikipedia.tool import WikipediaQueryRun
from langchain_community.utilities.wikipedia import WikipediaAPIWrapper
from langchain.prompts import PromptTemplate
from langchain.agents import AgentExecutor, create_react_agent

#嘗試單純的加入聊天紀錄

template = dedent("""
Answer the following questions as best you can. You have access to the following tools:

{tools}

Use the following format:

Question: the input question you must answer

Thought: you should always think about what to do

Action: the action to take, should be one of [{tool_names}]

Action Input: the input to the action

Observation: the result of the action

... (this Thought/Action/Action Input/Observation can repeat N times)

Thought: I now know the final answer

Final Answer: the final answer to the original input question

Begin!

Previous conversation history:

{chat_history}

Question: {input}

Thought:{agent_scratchpad}
"""
)

tools = [search_tool]

prompt = PromptTemplate.from_template(template)

model = ChatOpenAI(openai_api_key=os.environ['OPENAI_API_KEY'],
                   model_name="gpt-4o", temperature=0, 
                  )

conversation_agent = create_react_agent(
    llm=model,
    tools=tools,
    prompt=prompt,
)

agent_executor = AgentExecutor(agent=conversation_agent, tools=tools, verbose=True,
                               handle_parsing_errors=True)

In [None]:
tools

在第四周學ChatBot時，我們學過如何用ChatMessageHistory這個物件存放聊天紀錄

In [None]:
from langchain.memory import ChatMessageHistory

chat_history = ChatMessageHistory()

while True:
    input_ = input("請輸入你的問題 (輸入 quit 跳出):")
    if input_ == 'quit':
        break
    output = agent_executor.invoke({"chat_history": chat_history,
                                    "input": input_})

    print(f"\n***{output['output']}***\n")
    
    chat_history.add_user_message(input_)
    chat_history.add_ai_message(output['output'])

### 使用Streamlit建立基於Agent的聊天機器人

- Agent 將部屬於 Langserve 上
- Agent 在 Langserve上的應用有些眉角需要注意: 要額外使用pydantic數據模型註明input 和 output 類型
    - BaseMessage 物件（像是 HumanMessage、AIMessage）其實是 Pydantic 模型，所以：
    - c.model_dump() → 會得到 字典 (dict)，這正是 requests.post(..., json=...) 需要的格式。

首先確認如何透過requests和langserve交流

將chat_history物件的內容model_dump(), 這樣 API 會收到一個 chat_history，它是由 dict 組成的列表（裡面有 type + content 欄位），LangServe 就能正確還原成 BaseMessage 物件了。

In [None]:
import requests

from langchain.memory import ChatMessageHistory

chat_history = ChatMessageHistory()

chat_history.add_user_message("請給我橋本有菜的三圍")
chat_history.add_ai_message("橋本有菜的三圍是84-58-84厘米。")

response = requests.post(
    "http://localhost:8080/chatbot/invoke",
    json={'input': {"input": "Where is Taiwan?",
                    "chat_history": [c.model_dump() for c in chat_history.messages]
                   }
         }
)

In [None]:
import json

json.loads(response.text)

In [None]:
response.json()

In [None]:
response = requests.post(
    "http://localhost:8080/chatbot/invoke",
    json={'input': {"input": "請給我之前的問題和你給我的答案",
                    "chat_history": [c.model_dump() for c in chat_history.messages]
                   }
         }
)

In [None]:
[c.model_dump() for c in chat_history.messages]