# GPT-Image-1

The latest image generation model released by OpenAI. Therefore we will work with this.

Please update your 'openai' package to 1.76.0 to see the latest documentation

In [1]:
import os

os.chdir("../../../")

In [2]:
from langchain_openai import ChatOpenAI

from src.initialization import credential_init


credential_init()

model = ChatOpenAI(openai_api_key=os.environ['OPENAI_API_KEY'],
                   model_name="gpt-4o-2024-05-13", temperature=0)

### OpenAI Image API Parameters:

https://platform.openai.com/docs/guides/image-generation?image-generation-model=gpt-image-1
https://cookbook.openai.com/examples/generate_images_with_gpt_image

<!-- - model: dall-e-3
- size (str): 1024x1024, 1024x1792, 1792x1024
- quality: hd, standard
- style: vivid, natural. Default vivid -->

- model: gpt-image-1
    - size (str): 1024x1024 (square), 1536x1024 (landscape), 1024x1536 (portrait) or auto (default)
    - quality: low, medium, high or auto
    - moderation: auto, low

In [3]:
from openai import OpenAI

prompt = ("A Sumi-e style watercolor painting of mountains during sunset. The sky is depicted with bold "
          "splashes of orange, pink, and purple hues, blending and overlapping in a dynamic composition. "
          "The mountains are represented with expressive brushstrokes, emphasizing their majestic and serene "
          "presence. The focus is on capturing the essence and mood of the scene rather than detailed realism. "
          "The overall effect is serene and contemplative, with a harmonious balance of color and form.")

client = OpenAI()

response = client.images.generate(
    model="gpt-image-1",
    prompt=prompt,
    size="1024x1024",
    # quality="hd",
    quality='high',
    n=1,
    # response_format = 'b64_json'
)

image_base64 = response.data[0].b64_json

In [None]:
client.images.generate?

## Save the image in your local computer

In [4]:
import base64

with open("tutorial/LLM+Langchain/Week-8/test.png", "wb") as fh:
    fh.write(base64.b64decode(image_base64))

## Two Challenges:

### 1. How to create prompt more efficiently? 

There are two types of prompt: 

1. Danbooru Tag: masterpiece, best quality, beautiful eyes, clear eyes, detailed eyes, Blue-eyes, 1girl, 20_old, full-body, break, smoking, break, high_color, blue-hair, beauty, black-boots,break, break, Flat vector art, Colorful art, white_shirt, simple_background, blue_background, Ink art, peeking out upper body, Eyes

2. Natural language: A Sumi-e style watercolor painting of mountains during sunset. The sky is depicted with bold splashes of orange, pink, and purple hues, blending and overlapping in a dynamic composition. The mountains are represented with expressive brushstrokes,emphasizing their majestic and serene presence. The focus is on capturing the essence and mood of the scene rather than detailed realism. The overall effect is serene and contemplative, with a harmonious balance of color and form.

As non-native English speakers, we find the natural language prompt challenging, even for native speakers, due to the inclusion of specialized terminologies and advanced vocabulary.

由於涉及專業術語和高級詞彙，我們作為非母語英語使用者，發現這個自然語言提示對我們來說是具有挑戰性的，即使對母語使用者來說也是如此。

### 2. How to make it an LCEL?

## Some websites for natural language prompt

- https://leonardo.ai/: An Image generation SaaS. A lot of works are created with natural language prompt. 
- https://blog.mlq.ai/dalle-prompts/: Some tutorial about how to come up with a natural language prompt.

In [5]:
def build_standard_chat_prompt_template(kwargs):

    system_content = kwargs['system']
    human_content = kwargs['human']
    
    system_prompt = PromptTemplate(**system_content)
    system_message = SystemMessagePromptTemplate(prompt=system_prompt)
    
    human_prompt = PromptTemplate(**human_content)
    human_message = HumanMessagePromptTemplate(prompt=human_prompt)
    
    chat_prompt = ChatPromptTemplate.from_messages([system_message,
                                                     human_message
                                                   ])

    return chat_prompt

### Natural Language Prompt Generation

In [6]:
from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate, SystemMessagePromptTemplate
from langchain_core.output_parsers import StrOutputParser


system_template = ("You are a helpful AI assistant and an art expert with extensive knowledge of photography "
                   "and illustration. You excel at creating breathtaking masterpieces with the DALLE-3 model. "
                   "For this task, you will be provided with a description of an image, and you will generate a "
                   "corresponding DALLE-3 prompt. The prompt should be detailed and descriptive, capturing the "
                   "essence of the image.")

human_template = "{image_desc}"

input_ = {"system": {"template": system_template},
          "human": {"template": human_template,
                    "input_variable": ["image_desc"]}}
    
chat_prompt = build_standard_chat_prompt_template(input_)

nl_prompt_generation_chain = chat_prompt | model | StrOutputParser()

## We wrap OPENAI API call as a function for the langchain usage 

In [7]:
from typing import Dict
from langchain_core.runnables import chain


def gpt_image_worker(kwargs: Dict):

    """
    Generates an image using OpenAI's GPT-Image-1 model based on the provided prompt and optional parameters.
    
    Parameters:
    kwargs (Dict): A dictionary containing the following keys:
        - 'nl_prompt' (str): The natural language prompt describing the image to be generated.
        - 'size' (str, optional): The size of the generated image. Default is "1024x1024".
        - 'quality' (str, optional): The quality of the generated image. Default is "medium".
    
    Returns:
    str: image base64 string
    """
    
    print("Start generating image...")
    print(f"prompt: {kwargs['nl_prompt']}")
    client = OpenAI()

    response = client.images.generate(
        model="gpt-image-1",
        prompt=kwargs['nl_prompt'],
        size=kwargs.get("size", "1024x1024"),
        quality=kwargs.get('quality', 'medium'),
        moderation=kwargs.get('moderation', 'auto'),
        n=1)

    image_base64 = response.data[0].b64_json

    print("Image is generated succesfully.")
    
    return image_base64

@chain
def base64_to_file(kwargs):

    """
    Save the image from a base64 string
    """
    
    image_base64 = kwargs['image_base64']
    filename = kwargs['filename']
    
    with open(f"{filename}", "wb") as fh:
        fh.write(base64.b64decode(image_base64))
    

In [8]:
from operator import itemgetter

from langchain_core.runnables import RunnableLambda, RunnableParallel, RunnablePassthrough

# step 1: create the image prompt
step_1 = RunnablePassthrough.assign(nl_prompt=itemgetter('image_desc')|nl_prompt_generation_chain)

# step 2: image generation process, as a base64
step_2 = RunnablePassthrough.assign(image_base64=gpt_image_worker)

# step 3: save the image
step_3 = base64_to_file

# chain step 1, step 2, step 3 together
gpt_image_chain =  step_1|step_2|step_3

In [10]:
gpt_image_chain.invoke({"size": "1024x1536",
                     "quality": "medium",
                     "image_desc": ("warhammer 40k, astartes, power armor, chain sword, purity seal, oil painting"),
                     "filename": "tutorial/LLM+Langchain/Week-8/astartes.png"
                    })

Start generating image...
prompt: Create an oil painting of a Warhammer 40k Astartes in full power armor, wielding a menacing chain sword. The Astartes should be depicted in a dynamic battle stance, with intricate details on the armor, including purity seals and chapter insignias. The background should be a war-torn battlefield, with smoke, fire, and debris adding to the chaotic atmosphere. The painting style should be rich and textured, capturing the grimdark essence of the Warhammer 40k universe.
Image is generated succesfully.


In [11]:
gpt_image_chain.invoke({"size": "1536x1024",
                     "quality": "medium",
                     "image_desc": ("Tifa Lockhart, kimono, head ornament, looking at viewer, cherry blossom, "
                                    "black-white hightech combat suite, chibi style."),
                     "filename": "tutorial/LLM+Langchain/Week-8/Tifa-01.png",
                     "moderation": 'low'
                    })


Start generating image...
prompt: Create a detailed and descriptive DALLE-3 prompt for the given image:

"Create a chibi-style illustration of Tifa Lockhart wearing a traditional Japanese kimono adorned with intricate patterns and a delicate head ornament. She is looking directly at the viewer with a gentle smile. Surround her with blooming cherry blossom trees, their petals gently falling around her. Tifa's kimono contrasts with a sleek, black-and-white high-tech combat suit visible underneath, blending traditional and futuristic elements seamlessly. The background should be a serene, pastel-colored landscape, enhancing the overall charm and tranquility of the scene."
Image is generated succesfully.


Every model has its strength
In my opinion:

- SDXL: style
- PONY, Illustrious: pose control, view angle control
- FLUX: realistic

In [13]:
gpt_image_chain.invoke({"size": "1024x1536",
                        "quality": "medium",
                        "image_desc": ("1girl, azur lane style outfit, high ponytail, very long hair, sidelocks, bangs, "
                                       "in a whisky bar, rest head on hand, dim lighting, exuding an aura of youth and "
                                       "ethereal beauty, sketch, illustration"),
                        "filename": "tutorial/LLM+Langchain/Week-8/azur_lane_style.png",
                        "moderation": 'low'
                      })

Start generating image...
prompt: "Create a sketch illustration of a young girl with an ethereal beauty, styled in an Azur Lane-inspired outfit. She has a high ponytail with very long hair, sidelocks, and bangs. The scene is set in a dimly lit whisky bar, where she is resting her head on her hand, exuding an aura of youth and elegance. The overall atmosphere should be intimate and warm, with the dim lighting casting soft shadows that enhance the sketch's delicate lines and details."
Image is generated succesfully.


### Image Render

In [None]:
# client.images.edit?

In [14]:
from src.io.path_definition import get_project_dir

In [15]:
image_path = os.path.join(get_project_dir(), "tutorial", "LLM+Langchain", "Week-8", "Prinz_Eugen.png")

result_edit = client.images.edit(
    model="gpt-image-1",
    image=open(image_path, "rb"), 
    prompt="generate a photorealistic image",
    size="1024x1536"
)

image_base64 = result_edit.data[0].b64_json

with open("tutorial/LLM+Langchain/Week-8/Eugen_Prinz_Render.png", "wb") as fh:
    fh.write(base64.b64decode(image_base64))

In [16]:
result_edit = client.images.edit(
    model="gpt-image-1",
    image=open(image_path, "rb"), 
    prompt="Generate a photorealistic image. Make the character have an ulzzang look",
    size="1024x1536"
)

In [17]:
image_base64 = result_edit.data[0].b64_json

with open("tutorial/LLM+Langchain/Week-8/Eugen_Prinz_Render_Ulzzang.png", "wb") as fh:
    fh.write(base64.b64decode(image_base64))

In [None]:
result_edit = client.images.edit(
    model="gpt-image-1",
    image=open(image_path, "rb"), 
    prompt="Generate a photorealistic image. Make the character have a soft, flawless, youthful look with large eyes and gentle features",
    size="1024x1536"
)

image_base64 = result_edit.data[0].b64_json

with open("tutorial/LLM+Langchain/Week-8/Eugen_Prinz_Render_Youth.png", "wb") as fh:
    fh.write(base64.b64decode(image_base64))

In [None]:
result_edit = client.images.edit(
    model="gpt-image-1",
    image=open(image_path, "rb"), 
    prompt="Generate a photorealistic image.\n"
           "Make the character have an ulzzang appearance with soft, flawless, youthful look with large eyes and gentle features. "
           "While keep the polish and porcelain skin texture, and glamorous appearance of the girl.",
    size="1024x1536"
)

image_base64 = result_edit.data[0].b64_json

with open("tutorial/LLM+Langchain/Week-8/Eugen_Prinz_Render_Combined.png", "wb") as fh:
    fh.write(base64.b64decode(image_base64))

You can experiment with different art styles to render your images — there's much more than just the Ghibli style!

Some examples of art styles you can explore:
- Studio Ghibli
- Disney animation
- Pixel art
- Cyberpunk
- Watercolor
- Oil painting
- Dark fantasy aesthetic

## Use this as a tool for Agent

In [None]:
# prompt_template = """
# Answer the following questions as best you can. You have access to the following tools:

# {tools}

# Use the following format:

# Question: the input question you must answer

# Thought: you should always think about what to do

# Action: the action to take, should be one of [{tool_names}]

# Action Input: the input to the action

# Observation: the result of the action

# ... (this Thought/Action/Action Input/Observation can repeat N times)

# Thought: I now know the final answer

# Final Answer: the final answer to the original input question

# Begin!

# Question: {input}

# Thought:{agent_scratchpad}
# """

In [None]:
from langchain.agents import Tool, AgentExecutor, create_react_agent
from langchain.prompts import PromptTemplate
from langchain.tools import BaseTool
from langchain_core.output_parsers import StrOutputParser, PydanticOutputParser
from pydantic import BaseModel, Field

from src.agent.react_zero_shot import prompt_template as zero_shot_prompt_template

# We need both the query and filename (minimal requirement):
# Some control variables
# What we learned last week?

# step 1: create the image prompt
step_1 = RunnablePassthrough.assign(nl_prompt=itemgetter('image_desc')|nl_prompt_generation_chain)

# step 2: image generation process, as a base64
step_2 = RunnablePassthrough.assign(image_base64=gpt_image_worker)

# step 3: save the image
step_3 = base64_to_file

# chain step 1, step 2, step 3 together
gpt_image_chain =  step_1|step_2|step_3


class ImageInput(BaseModel):
    image_desc: str = Field(description=("image description / prompt"))
    filename: str = Field(description="the location at which the image will be saved")
    size: str = Field(description='image size, can be 1024x1024 (square), 1536x1024 (landscape), 1024x1536 (portrait) or auto (default)')
    quality: str = Field(description='image quality, low, medium, high or auto')


class ImageTool(BaseTool):

    name: str = "Image generator with GPT-Image-1"

    input_output_parser: PydanticOutputParser = PydanticOutputParser(pydantic_object=ImageInput)
    
    input_format_instructions: str = input_output_parser.get_format_instructions()

    description_template: str = ("Use this tool when you need to create an image\n\n"
                                 "input format_instructions: {input_format_instructions}")

    description: str = description_template.format(input_format_instructions=input_format_instructions)
    
    def _run(self, query):

        input_ = self.input_output_parser.parse(query)
        
        image_desc = input_.image_desc
        size = input_.size
        quality = input_.quality
        filename = input_.filename
        
        gpt_image_chain.invoke({"image_desc": image_desc,
                                "size": size,
                                "quality": quality,
                                "filename": filename,
                                "moderation": 'low'})
        
        return "Done"

    def _arun(self, radius: int):
        raise NotImplementedError("This tool does not support async")

# Zero Shot 標準模板
prompt = PromptTemplate.from_template(zero_shot_prompt_template)

# 建立工具庫 
tools = [ImageTool()]

# 創造Agent 
zero_shot_agent = create_react_agent(
    llm=model,
    tools=tools,
    prompt=prompt,
)

# 創造Agent Executor
agent_executor = AgentExecutor(agent=zero_shot_agent, tools=tools, verbose=True)

In [None]:
image_prompt = """
brown hair, bangs, two side up, twin ponytails, sidelocks, black hat, jewelry, black cheongsam, intricate golden embroidery, long sleeves,
detached sleeves, sheer long skirt, head ornament, a 17-years-old ethereal and glamorous beautiful japanese idol,
translucent skin tone, anime-like face, profound facial features, bright eyes, faint rosy blush, mesmerizing city view. 
night, photorealistic
"""

filename = "tutorial/LLM+Langchain/Week-8/test_04.png"

In [None]:
agent_executor.invoke({"input": f"Generate in image with the following information: \n {image_prompt}. and save the image at {filename}"})

### OpenAI WebSearch Update:

- https://platform.openai.com/docs/guides/tools-web-search?api-mode=responses

The world changes very fast.

Below are a few notable implementation considerations when using web search.

- Web search is currently not supported in the gpt-4.1-nano model.
- The gpt-4o-search-preview and gpt-4o-mini-search-preview models used in Chat Completions only support a subset of API parameters - view their model data pages for specific information on rate limits and feature support.
- When used as a tool in the Responses API, web search has the same tiered rate limits as the models above.
- Web search is limited to a context window size of 128000 (even with gpt-4.1 and gpt-4.1-mini models).
- Refer to this guide for data handling, residency, and retention information.

In [None]:
from openai import OpenAI
client = OpenAI()

## ACG Characters

### Genshin

In [None]:
from openai import OpenAI
client = OpenAI()

response = client.responses.create(
    model="gpt-4.1",
    tools=[{
        "type": "web_search_preview",
        "search_context_size": "low",
    }],
    input="What is the appearance of Hu Tao from Genshin? Please find the result on the internet.",
)

print(response.output_text)

In [None]:
print(response.output[1].content[0].text)

In [None]:
print(response.output[1].content[0].annotations)

### Stellar Blade

In [None]:
response = client.responses.create(
    model="gpt-4.1",
    tools=[{
        "type": "web_search_preview",
        "search_context_size": "high",
    }],
    input="What is the appearance of Eve of Stellar Blade? Please find the result on the internet.",
)

print(response.output_text)

In [None]:
print(response.output[1].content[0].text)

In [None]:
print(response.output[1].content[0].annotations)

# **** 預計第一個小時結束 ****

## LCEL ACG character appearance chain

In [None]:
from langchain.docstore.document import Document
from langchain.agents import Tool, AgentExecutor, create_react_agent
from langchain.tools import BaseTool
from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate, SystemMessagePromptTemplate
from langchain_core.output_parsers import StrOutputParser

system_template = ("You are a helpful AI assistant with deep knowledge of anime, manga, "
                   "and mobile games. You will generate the face, body, attire, hairstyle, and accessories of a character in great " 
                   "detail. The output should consist of:\n\n"
                   "- Face:\n"
                   "- Body:\n"
                   "- Attire:\n"
                   "- Hairstyle:\n"
                   "- Accessories:\n\n"
                   "If you are not sure about the answer, please find the content from the internet.")

# Make this simple in the beginning
@chain
def gpt_web_search_tool(text):

    client = OpenAI()

    response = client.responses.create(
        model="gpt-4.1",
        tools=[{
            "type": "web_search_preview",
            "search_context_size": "low",
        }],
        input=[{"role": "system",
                "content": [{"type": "input_text",
                             "text": system_template}]},
               {"role": "user",
                "content": [{"type": "input_text",
                             "text": text}]}]
    )
    
    return response

In [None]:
output = gpt_web_search_tool.invoke("What is the appearance of Hu Tao from Genshin")

In [None]:
print(output.output[1].content[0].annotations)

In [None]:
print(output.output[1].content[0].text)

In [None]:
class ACGLLMTool(BaseTool):

    name: str = "Anime character design generator"
    description: str = "Use this tool to generate and explore detailed designs for anime and ACG (Animation, Comics, and Games) characters."

    def _run(self, query: str):
        
        response = gpt_web_search_tool.invoke(query)
        
        return response.output[1].content[0].text

    def _arun(self, radius: int):
        raise NotImplementedError("This tool does not support async")
        
        
class ImageTool(BaseTool):

    name:str = "ACG characters image generator with GPT-Image-1"

    input_output_parser: PydanticOutputParser = PydanticOutputParser(pydantic_object=ImageInput)
    
    input_format_instructions: str = input_output_parser.get_format_instructions()

    description_template: str = ("Use this tool when you need to create an image\n\n"
                                 "input format_instructions: {input_format_instructions}")

    description: str = description_template.format(input_format_instructions=input_format_instructions)
    
    def _run(self, query):
        
        input_ = self.input_output_parser.parse(query)
        
        image_desc = input_.image_desc
        size = input_.size
        quality = input_.quality
        filename = input_.filename
        
        gpt_image_chain.invoke({"image_desc": image_desc,
                                "size": size,
                                "quality": quality,
                                "filename": filename,
                                "moderation": 'low'})
        
        return "Done"

    def _arun(self, radius: int):
        raise NotImplementedError("This tool does not support async")

        
prompt_template = """
Answer the following questions as best you can. You have access to the following tools:

{tools}

Use the following format:

Question: the input question you must answer

Thought: you should always think about what to do

Action: the action to take, should be one of [{tool_names}]

Action Input: the input to the action

Observation: the result of the action

... (this Thought/Action/Action Input/Observation can repeat N times)

Thought: I now know the final answer

Final Answer: the final answer to the original input question

Begin!

Question: {input}

Thought:{agent_scratchpad}
"""        
             
prompt = PromptTemplate.from_template(prompt_template)

tools = [ImageTool(), ACGLLMTool()]

zero_shot_agent = create_react_agent(
    llm=model,
    tools=tools,
    prompt=prompt,
)

agent_executor = AgentExecutor(agent=zero_shot_agent, tools=tools, verbose=True)

In [None]:
agent_executor.invoke({"input": "Generate an image of Hu Tao from Genshim in pastel art style"})

In [None]:
agent_executor.invoke({"input": "Generate an image of Yae Miko from Genshim in impressionism oil painting style with chiaroscuro lighting." })

## Audible 有聲書

- 文轉語音: TTS tool
- 文轉圖: Image tool

### Children Book Image Generator

- Generate image according to the story

In [None]:
system_template = ("You are a helpful AI assistant and an art expert with extensive knowledge of illustration.\n "
                   "You excel at creating Pencil and Ink Style illustrations for 6-year-old children using the GPT-Image-1 model. "
                   "This style is characterized by detailed line work, often in black and white or with minimal color, and has a classic, "
                   "timeless feel. For this task, you will be provided with a paragraph of a story, and you will generate a corresponding "
                   "DALLE-3 prompt which captures the storyline. The prompt should be detailed and descriptive, capturing the essence of "
                   "the image.")


system_prompt = PromptTemplate(template=system_template)

# System prompt
system_message = SystemMessagePromptTemplate(prompt=system_prompt)

human_prompt = PromptTemplate(template="{story}",
                              input_variables=['story'])

# Create a human message prompt template based on the prompt
human_message = HumanMessagePromptTemplate(prompt=human_prompt)

chat_prompt = ChatPromptTemplate.from_messages([system_message, human_message])

nl_prompt_generation_chain = chat_prompt | model | StrOutputParser()     

step_1 = RunnablePassthrough.assign(nl_prompt=itemgetter('story')|nl_prompt_generation_chain)
step_2 = RunnablePassthrough.assign(image_base64=gpt_image_worker)
step_3 = base64_to_file
image_chain = step_1 | step_2 | step_3

In [None]:
# step_1 = RunnablePassthrough.assign(nl_prompt=itemgetter('image_desc')|nl_prompt_generation_chain)

# # step 2: image generation process, as a base64
# step_2 = RunnablePassthrough.assign(image_base64=dalle3_worker)

# # step 3: save the image
# step_3 = RunnableLambda(base64_to_file)

# # chain step 1, step 2, step 3 together
# dalle3_chain =  step_1|step_2|step_3

- Generate the story

In [None]:
system_template = ("You are a helpful AI assistant who likes children. You are great storyteller and know how to create content for kindergarten kids. "
                   "A short chapter is created once at a time.")

system_prompt = PromptTemplate(template=system_template)

# System prompt
system_message = SystemMessagePromptTemplate(prompt=system_prompt)

human_prompt = PromptTemplate(template="{input}",
                              input_variables=['input'])

# Create a human message prompt template based on the prompt
human_message = HumanMessagePromptTemplate(prompt=human_prompt)

chat_prompt = ChatPromptTemplate.from_messages([system_message, human_message])

story_chain = chat_prompt | model | StrOutputParser()     

In [None]:
story = story_chain.invoke({"input": "Create a chapter of a baby owl capturing a rodent in the night as his dinner"})

In [None]:
image_chain.invoke({"story":story,
                    "filename": "tutorial/LLM+Langchain/Week-8/story_2_image.png"})

In [None]:
import json

from src.agent.react_zero_shot import prompt_template as zero_shot_prompt_template
# from langchain_core.prompts import MessagesPlaceholder

client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])

class TTSInput(BaseModel):
    text: str = Field(description=("The story"))
    filename: str = Field(description="the location at which the audio file will be saved")

    
class TTSTool(BaseTool):

    name: str = "Text to Sound (tts) tool"

    input_output_parser: PydanticOutputParser = PydanticOutputParser(pydantic_object=TTSInput)
    
    input_format_instructions: str = input_output_parser.get_format_instructions()

    description_template: str = ("Use this tool to generate an audio file of the story.\n"
                                 "input format: {input_format_instructions}.")

    description: str = description_template.format(input_format_instructions=input_format_instructions)

    
    def _run(self, text: str):

        input_ = self.input_output_parser.parse(text)

        text = input_.text
        filename = input_.filename
        response = self.tts(text)
        
        response.stream_to_file(filename)
        
        return filename

    def _arun(self, radius: int):
        raise NotImplementedError("This tool does not support async")
        
        
    def tts(self, text: str):
        
        response = client.audio.speech.create(
          model="tts-1",
          voice="nova",
          input=text
        )

        return response
           
            
prompt = PromptTemplate.from_template(zero_shot_prompt_template)

tools = [TTSTool(), 
         ImageTool(),
         Tool(name="StoryTeller",
              func=story_chain.invoke,
              description="useful for create a story",
        )]

zero_shot_agent = create_react_agent(
    llm=model,
    tools=tools,
    prompt=prompt,
)

agent_executor = AgentExecutor(agent=zero_shot_agent, tools=tools, verbose=True, handle_parsing_errors=True)

In [None]:
prompt = ("Create a chapter of a baby owl capturing a rodent in the night as his dinner.\n"
         "After having the final answer, please create a corresponding image and record the story as an mp3. "
         "The saved image (.png) and mp3 (.mp3) should have same name in the folder `tutorial/LLM+Langchain/Week-8")

agent_executor.invoke({"input": prompt})

In [None]:
prompt = """
         Assuming that Harry Porter is in the world of Warhammer 40k. He still has his magical power.
         He lives in the lower part of a Hive city. It is about the time for Tithe and the black ship is comming.
         Describe me the fate of Harry Porter. Please keep the darkness of the world view of Warhammer 40k.
        The saved image and mp3 should have same name in the folder `tutorial/LLM+Langchain/Week-8`
        """

agent_executor.invoke({"input": prompt})

## Can we create a story with multiple pages?

I do not know the answer, let me try...

4 pages to save the cost. But it can be extended.

In [None]:
prompt = """
         I want to create an 4 pages story for a child. He likes snow owl.
         For each page, please create a corresponding image and record the story as an mp3.
         After having the final answer, please create a corresponding image and record the story as an mp3. 
         The saved image and mp3 should have same name, following the structure of 
         <Page - idx>, with idx as a number starting from 1, in the folder `tutorial/LLM+Langchain/Week-8`
         """

agent_executor.invoke({"input": prompt})

## Can we create a story in an interactive way: chat based

-- Rolling back...

In [None]:
from langchain.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough

from langchain.output_parsers import StructuredOutputParser, ResponseSchema

output_response_schemas = [
        ResponseSchema(name="story", description="the story content in the page"),
        ResponseSchema(name="page index", description="The page number of the story"),
    ]

output_parser = StructuredOutputParser.from_response_schemas(output_response_schemas)

output_format_instructions = output_parser.get_format_instructions()


template = """
           Create a story page {idx}, based on the description: {text}

           The answer continues from previous content:
           {context}

           After having the final answer, please create a corresponding image and record the story as an mp3. 
           The saved image and mp3 should have same name, following the structure of 
           <Page - idx>, in the folder `tutorial/LLM+Langchain/Week-8`

           The output should have the following format: {output_format_instruction}
           """

prompt_template = PromptTemplate(template=template,
                                 input_variables=["text", "context", "idx"],
                                 partial_variables={"output_format_instruction": output_format_instructions})

agent_chain = RunnablePassthrough.assign(input=prompt_template)|agent_executor

In [None]:
Q = agent_chain.invoke({"text": "A little cat just woke up in the morning",
                        "context": "The beginning of the story:\n",
                        "idx": str(1)})

若是以下步驟失敗，嘗試重新生成。這是大語言模型，沒有保證可以100%產出你希望的格式。我們只能盡可能提高成功輸出的機率。

In [None]:
Q['output']

In [None]:
output_parser.parse(Q['output'])

In [None]:
output_parser.parse(Q['output'])['story']

In [None]:
output_parser.parse(Q['output'])['page index']

### 第二頁

In [None]:
context_list = [output_parser.parse(Q['output'])['story']]
print(context_list)

In [None]:
Q_2 = agent_chain.invoke({"text": "Whisker found a dove and wanted to hunt it down!",
                          "context": ":\n".join(context_list),
                          "idx": str(2)})

In [None]:
context_list

In [None]:
output_parser.parse(Q_2['output'])['story']

### okay, it looks fine, let us see how to make it a interactive

In [None]:
# "前情提要"
context_list = []

# 頁面起始
idx = 1

while True:
    if len(context_list) == 0:
        context = "The beginning of the story:\n"
    else:
        context = "\n".join(context_list)

    text = input("請輸入故事內容: 若想要結束 請輸入 `QUIT`")

    if text == "QUIT":
        break
    
    Q = agent_chain.invoke({"text": text,
                            "context": context,
                            "idx": str(idx)})

    story = output_parser.parse(Q['output'])['story']
    
    # 下一頁
    idx += 1

    context_list.append(output_parser.parse(Q['output'])['story'])