# DALLE-3

In [1]:
import os

os.chdir("../../")

In [2]:
from langchain.chat_models import ChatOpenAI

from src.initialization import credential_init


credential_init()

model = ChatOpenAI(openai_api_key=os.environ['OPENAI_API_KEY'],
                   model_name="gpt-4o-2024-05-13", temperature=0)

  warn_deprecated(


### OpenAI Image API Parameters:

- model: dall-e-3
- size (str): 1024x1024, 1024x1792, 1792x1024
- quality: hd, standard
- style: vivid, natural. Default vivid

In [3]:
from openai import OpenAI

prompt = """
A Sumi-e style watercolor painting of mountains during sunset. The sky is depicted with bold splashes of orange, pink, and purple hues, 
blending and overlapping in a dynamic composition. The mountains are represented with expressive brushstrokes, emphasizing their majestic and serene presence. 
The focus is on capturing the essence and mood of the scene rather than detailed realism. The overall effect is serene and contemplative, with a harmonious 
balance of color and form.
"""

client = OpenAI()

response = client.images.generate(
    model="dall-e-3",
    prompt=prompt,
    size="1024x1024",
    quality="hd",
    n=1,
    response_format = 'b64_json'
)

image_base64 = response.data[0].b64_json

## Save the image in your local computer

In [4]:
import base64

with open("tutorial/Week-8/test.png", "wb") as fh:
    fh.write(base64.b64decode(image_base64))

## Two Challenges:

### 1. How to create prompt more efficiently? 

There are two types of prompt: 

1. Danbooru Tag: masterpiece, best quality, beautiful eyes, clear eyes, detailed eyes, Blue-eyes, 1girl, 20_old, full-body, break, smoking, break, high_color, blue-hair, beauty, black-boots,break, break, Flat vector art, Colorful art, white_shirt, simple_background, blue_background, Ink art, peeking out upper body, Eyes

2. Narutal language: A Sumi-e style watercolor painting of mountains during sunset. The sky is depicted with bold splashes of orange, pink, and purple hues, blending and overlapping in a dynamic composition. The mountains are represented with expressive brushstrokes,emphasizing their majestic and serene presence. The focus is on capturing the essence and mood of the scene rather than detailed realism. The overall effect is serene and contemplative, with a harmonious balance of color and form.

As non-native English speakers, we find the natural language prompt challenging, even for native speakers, due to the inclusion of specialized terminologies and advanced vocabulary.

由於涉及專業術語和高級詞彙，我們作為非母語英語使用者，發現這個自然語言提示對我們來說是具有挑戰性的，即使對母語使用者來說也是如此。

### 2. How to make it an LCEL?

## Some websites for natural language prompt

- https://leonardo.ai/: An Image generation SaaS. A lot of works are created with natural language prompt. 
- https://blog.mlq.ai/dalle-prompts/: Some tutorial about how to come up with a natural language prompt.

### Natural Language Prompt Generation

In [5]:
from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate, SystemMessagePromptTemplate
from langchain_core.output_parsers import StrOutputParser


system_prompt = PromptTemplate.from_template('''You are a helpful AI assistant and an art expert with extensive knowledge of photography and illustration. 
You excel at creating breathtaking masterpieces with the DALLE-3 model. For this task, you will be provided with a description of an image, and you will 
generate a corresponding DALLE-3 prompt. The prompt should be detailed and descriptive, capturing the essence of the image. The length of the prompt should be 
around 100-500 tokens.''')

# System prompt
system_message = SystemMessagePromptTemplate(prompt=system_prompt)

human_prompt = PromptTemplate(template="{image_desc}",
                              input_variables=['image_desc'])

# Create a human message prompt template based on the prompt
human_message = HumanMessagePromptTemplate(prompt=human_prompt)

chat_prompt = ChatPromptTemplate.from_messages([system_message, human_message])

nl_prompt_generation_chain = chat_prompt | model | StrOutputParser()

## We wrap OPENAI API call as a function for the langchain usage 

In [6]:
from typing import Dict


def dalle3_worker(kwargs: Dict):

    """
    Generates an image using OpenAI's DALL-E 3 model based on the provided prompt and optional parameters.
    
    Parameters:
    kwargs (Dict): A dictionary containing the following keys:
        - 'nl_prompt' (str): The natural language prompt describing the image to be generated.
        - 'size' (str, optional): The size of the generated image. Default is "1024x1024".
        - 'quality' (str, optional): The quality of the generated image. Default is "standard".
        - 'style' (str, optional): The style of the generated image. Default is "vivid".
    
    Returns:
    str: The URL of the generated image.
    
    Example:
    >>> kwargs = {
    ...     "nl_prompt": "A futuristic city skyline at sunset",
    ...     "size": "1024x1024",
    ...     "quality": "high",
    ...     "style": "photorealistic"
    ... }
    >>> image_base64 = dalle3_worker(kwargs)
    Start generating image...
    prompt: A futuristic city skyline at sunset
    generated_image: https://example.com/generated_image.png
    >>> print(image_base64)
    https://example.com/generated_image.png
    """
    
    print("Start generating image...")
    print(f"prompt: {kwargs['nl_prompt']}")
    client = OpenAI()

    response = client.images.generate(
        model="dall-e-3",
        prompt=kwargs['nl_prompt'],
        size=kwargs.get("size", "1024x1024"),
        quality=kwargs.get('quality', 'standard'),
        style=kwargs.get('style', 'vivid'),
        n=1,
        response_format = 'b64_json')

    image_base64 = response.data[0].b64_json

    print("Image is generated succesfully.")
    
    return image_base64


def base64_to_file(kwargs):

    """
    Save the image from a base64 string
    """
    
    image_base64 = kwargs['image_base64']
    filename = kwargs['filename']
    
    with open(f"{filename}", "wb") as fh:
        fh.write(base64.b64decode(image_base64))
    

In [8]:
from operator import itemgetter

from langchain_core.runnables import RunnableLambda, RunnableParallel, RunnablePassthrough

# step 1: create the image prompt
step_1 = RunnablePassthrough.assign(nl_prompt=itemgetter('image_desc')|nl_prompt_generation_chain)

# step 2: image generation process, as a base64
step_2 = RunnablePassthrough.assign(image_base64=dalle3_worker)

# step 3: save the image
step_3 = RunnableLambda(base64_to_file)

# chain step 1, step 2, step 3 together
dalle3_chain =  step_1|step_2|step_3

In [9]:
dalle3_chain.invoke({"size": "1024x1792",
                     "quality": "hd",
                     "image_desc": """
                                     masterpiece, best quality, beautiful eyes, clear eyes, detailed eyes, Blue-eyes, 1girl, 20_old, full-body, 
                                     break, smoking, break, high_color, blue-hair, beauty, black-boots,break, break, Flat vector art, Colorful art, white_shirt, 
                                     simple_background, blue_background, Ink art,peeking out upper body,Eyes, portrait
                                     """,
                     
                     "filename": "tutorial/Week-8/test_01.png"
                    })

Start generating image...
prompt: Create a stunning flat vector art masterpiece featuring a 20-year-old woman with striking blue eyes and vibrant blue hair. The artwork should be highly detailed, especially focusing on her beautiful, clear eyes. She is depicted in a full-body pose, taking a break and smoking, exuding a sense of calm and relaxation. She is dressed in a white shirt and black boots, standing against a simple, solid blue background that enhances the colorful and high-contrast nature of the illustration. The style should blend elements of ink art with colorful, high-quality vector art, capturing her beauty and the serene moment. The composition should also include a portrait-like focus on her upper body, with her eyes peeking out, drawing the viewer's attention.
Image is generated succesfully.


In [10]:
dalle3_chain.invoke({"size": "1024x1792",
                     "quality": "hd",
                     "image_desc": """
                                 close-up portrait, black fox ears, animal ear fluff, black fox tail, black hair, red inner hair, hair ornament, 
                                 magatama necklace, fur trim, black short kimono, exquisite design, cat_collar, off-shoulder,wide sleeves, 
                                 long sleeves, obi, miniskirt, perfect model body, a 17-years-old ethereal and glamorously beautiful girl, from above, 
                                 eating donut, holding a donut, a large cup of coffee on table, in a coffee shop, pencil sketch, perfect detail, intricate detail, 
                                 masterpiece, best quality, beauty & aesthetic, sketch
                                 """,
                     "filename": "tutorial/Week-8/test_02.png"
                    })


Start generating image...
prompt: A breathtaking pencil sketch of an ethereal and glamorously beautiful 17-year-old girl with a perfect model body, captured from above in a coffee shop. She has striking black fox ears with fluffy fur, a black fox tail, and long black hair with red inner highlights. Her hair is adorned with a delicate ornament, and she wears a magatama necklace. Her attire is a meticulously designed black short kimono with fur trim, featuring wide, long sleeves and an off-shoulder style. The kimono is paired with an obi and a miniskirt, showcasing an exquisite design. She also wears a cat collar, adding a touch of charm. The girl is holding a donut and eating it, with a large cup of coffee placed on the table in front of her. The sketch is rendered with perfect and intricate detail, capturing the beauty and aesthetic of the scene, making it a true masterpiece.
Image is generated succesfully.


### There is censorship in OpenAI...so I do not like it that much.

Every model has its strength
In my opinion:

- SDXL: style
- PONY: pose control, view angle control
- FLUX: realistic

In [11]:
dalle3_chain.invoke({"size": "1024x1024",
                     "quality": "hd",
                     "image_desc": """
                                 close-up portrait, black fox ears, animal ear fluff, black fox tail, black hair, red inner hair, hair ornament, 
                                 magatama necklace, fur trim, black short kimono, exquisite design, cat_collar, off-shoulder,wide sleeves, 
                                 long sleeves, obi, miniskirt, perfect model body, a 17-years-old ethereal and glamorously beautiful girl, from above, 
                                 eating donut, holding a donut, a large cup of coffee on table, in a coffee shop, pencil sketch, perfect detail, intricate detail, 
                                 masterpiece, best quality, beauty & aesthetic, sketch
                                 """,
                     "filename": "tutorial/Week-8/test_03.png"
                    })

Start generating image...
prompt: A close-up portrait of a 17-year-old ethereal and glamorously beautiful girl with perfect model features, depicted in a pencil sketch style with perfect and intricate detail. She has black fox ears with fluffy fur, a black fox tail, and long black hair with striking red inner highlights. Her hair is adorned with a delicate ornament, and she wears a magatama necklace. Her outfit is a black short kimono with an exquisite design, featuring fur trim, wide and long sleeves, and an off-shoulder style. The kimono is paired with an obi and a miniskirt, and she also wears a cat collar. The scene is set in a coffee shop, viewed from above, where she is holding a donut and eating it, with a large cup of coffee on the table. The sketch captures the beauty and aesthetic of the moment, emphasizing the girl's elegance and the intricate details of her attire and surroundings.
Image is generated succesfully.


## Use this as a tool for Agent

In [12]:
from langchain.agents import Tool, AgentExecutor, create_react_agent
from langchain.prompts import PromptTemplate
from langchain.tools import BaseTool
from langchain.output_parsers import StructuredOutputParser, ResponseSchema

from src.agent.react_zero_shot import prompt_template as zero_shot_prompt_template

# We need both the query and filename (minimal requirement):
# Some control variables
# What we learned last week?


class ImageTool(BaseTool):

    name = "Image generator with DALLE-3"

    input_response_schemas = [
        ResponseSchema(name="image_desc", description="image description / prompt"),
        ResponseSchema(name="filename", description="the location at which the image will be saved"),
        ResponseSchema(name="size", description="image size, can be `1024x1024`, `1024x1792`, `1792x1024`"),
        ResponseSchema(name="quality", description="image quality, can be `hd` or `standard`"),
        ResponseSchema(name="style", description="image style, can be `vivid` or `natural`")]
    
    input_output_parser = StructuredOutputParser.from_response_schemas(input_response_schemas)
    
    input_format_instructions = input_output_parser.get_format_instructions()

    description_template = """
                           Use this tool when you need to create an image:
                           input format_instructions: {input_format_instructions}
                           """

    description = description_template.format(input_format_instructions=input_format_instructions)
    
    def _run(self, query):

        input_ = self.input_output_parser.parse(query)
        
        image_desc = input_['image_desc']
        size = input_['size']
        quality = input_['quality']
        style = input_['style']
        filename = input_['filename']
        
        dalle3_chain.invoke({"image_desc": image_desc,
                             "size": size,
                             "quality": quality,
                             "style": style,
                             "filename": filename})
        
        return "Done"

    def _arun(self, radius: int):
        raise NotImplementedError("This tool does not support async")

# Zero Shot 標準模板
prompt = PromptTemplate.from_template(zero_shot_prompt_template)

# 建立工具庫 
tools = [ImageTool()]

# 創造Agent 
zero_shot_agent = create_react_agent(
    llm=model,
    tools=tools,
    prompt=prompt,
)

# 創造Agent Executor
agent_executor = AgentExecutor(agent=zero_shot_agent, tools=tools, verbose=True)

In [13]:
image_prompt = """
brown hair, bangs, two side up, twin ponytails, sidelocks, black hat, jewelry, black cheongsam, intricate golden embroidery, long sleeves,
wide sleeves, black shorts, hat ornament, hat flower, a 17-years-old ethereal and glamorous beautiful japanese idol,
translucent skin tone, profound facial features, bright eyes, faint rosy blush, ultra realistic, raw photo, award-winning photo, masterpiece, 
best quality, high resolution, official art, 8k uhd, high fidelity, depth of field, on the top of a skyscaper, mesmetizing city view, 
night
"""

filename = "tutorial/Week-8/test_04.png"

In [14]:
agent_executor.invoke({"input": f"Generate in image with the following information: \n {image_prompt}. and save the image at {filename}"})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mTo generate the requested image, I will use the Image generator with DALLE-3 tool. The image will feature a 17-year-old ethereal and glamorous beautiful Japanese idol with specific physical attributes and attire, set against a mesmerizing city view at night from the top of a skyscraper. The image will be saved at the specified location.

Action: Image generator with DALLE-3

Action Input:
```json
{
	"image_desc": "A 17-year-old ethereal and glamorous beautiful Japanese idol with brown hair, bangs, two side up, twin ponytails, sidelocks, wearing a black hat with a hat ornament and hat flower, jewelry, and a black cheongsam with intricate golden embroidery, long wide sleeves, and black shorts. She has translucent skin tone, profound facial features, bright eyes, and a faint rosy blush. The setting is on the top of a skyscraper with a mesmerizing city view at night. The image is ultra realistic, raw photo, award-winning photo, m

{'input': 'Generate in image with the following information: \n \nbrown hair, bangs, two side up, twin ponytails, sidelocks, black hat, jewelry, black cheongsam, intricate golden embroidery, long sleeves,\nwide sleeves, black shorts, hat ornament, hat flower, a 17-years-old ethereal and glamorous beautiful japanese idol,\ntranslucent skin tone, profound facial features, bright eyes, faint rosy blush, ultra realistic, raw photo, award-winning photo, masterpiece, \nbest quality, high resolution, official art, 8k uhd, high fidelity, depth of field, on the top of a skyscaper, mesmetizing city view, \nnight\n. and save the image at tutorial/Week-8/test_04.png',
 'output': 'The image has been generated and saved at the location `tutorial/Week-8/test_04.png`.'}

## ACG Characters



In [28]:
from langchain.utilities.tavily_search import TavilySearchAPIWrapper
from langchain.tools.tavily_search import TavilySearchResults

search = TavilySearchAPIWrapper()
tavily_tool = TavilySearchResults(api_wrapper=search)

outputs = tavily_tool.invoke("What is the appearance of Hu Tao from Genshin")

In [29]:
for output in outputs:
    print("\n**************************\n")
    print(output['content'])
    print("\n")
    


**************************

Hu Tao[Note 2] (Chinese: 胡桃 Hú Táo) is a playable Pyro character in Genshin Impact. Hu Tao's antics and eccentricity belies her role as the 77th Director of the Wangsheng Funeral Parlor and her talent as a poet. Nevertheless, she treats the parlor's operations with utmost importance, and holds funeral ceremonies with the highest dignity and solemnity. Toggle Ascension MaterialsTotal Cost



**************************

Hu Tao (Chinese: 胡桃; pinyin: Hú Táo; lit. 'Walnut') is a playable character in the action role-playing game Genshin Impact.She is voiced by Brianna Knickerbocker in English, Tao Dian [] in Chinese, Rie Takahashi in Japanese, and Kim Ha-ru [] in Korean. In the game, she serves as the 77th Director of the Wangsheng Funeral Parlor within the China-like nation of Liyue.



**************************

Appearance. Hu Tao's most immediately recognizable features are her porkpie hat adorned with a red plum branch with a large wooden talisman at its ce

# **** 預計第一個小時結束 ****

## LCEL ACG character appearance chain

In [17]:
from langchain.docstore.document import Document
from langchain.agents import Tool, AgentExecutor, create_react_agent
from langchain.tools import BaseTool
from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate, SystemMessagePromptTemplate
from langchain_core.output_parsers import StrOutputParser


def tavily_content_parser(outputs):

    """
    Parses the content from a list of outputs into Document objects.
    
    Parameters:
    outputs (list): A list of dictionaries where each dictionary contains a 'content' key 
                    representing the content of a document.
    
    Returns:
    list: A list of Document objects, each initialized with the content from the 'outputs'.
    
    Example:
    >>> outputs = [
    ...     {'content': 'This is document 1 content.'},
    ...     {'content': 'This is document 2 content.'}
    ... ]
    >>> documents = tavily_content_parser(outputs)
    >>> for doc in documents:
    ...     print(doc.page_content)
    This is document 1 content.
    This is document 2 content.
    """
    
    documents = [Document(page_content=output['content']) for output in outputs]
    
    return documents


system_prompt = PromptTemplate.from_template('''You are a helpful AI assistant with deep knowledge of anime, manga, 
and mobile games. You will generate the face, body, attire, hairstyle, and accessories of an character in great 
detail with data provided from the `context` The output should look like:

 - Face:
 - Body:
 - Attire:
 - Hairstyle:
 - Accessories:

''')

system_message = SystemMessagePromptTemplate(prompt=system_prompt)


human_prompt = PromptTemplate(template="context: {context}",
                                  input_variables=['context'])

# Create a human message prompt template based on the prompt
human_message = HumanMessagePromptTemplate(prompt=human_prompt)

# Create a chat prompt template from system and human message prompt templates
chat_prompt = ChatPromptTemplate.from_messages([system_message, human_message])

# Construct the processing chain
step_1 = {'context': tavily_tool|tavily_content_parser}
step_2 = chat_prompt
step_3 = model
step_4 = StrOutputParser()
acg_chain = step_1 | step_2 | step_3 | step_4



In [30]:
output = acg_chain.invoke("What is the appearance of Hu Tao from Genshin")

In [31]:
print(output)

Based on the provided context, here is a detailed description of Hu Tao from Genshin Impact:

- **Face**:
  - Hu Tao has bright scarlet eyes that are one of her most striking features. Her eyes often have a playful and mischievous glint, reflecting her eccentric personality.
  - Her face is youthful and expressive, often showing a range of emotions from playful mischief to solemn dignity.

- **Body**:
  - Hu Tao has a slender and agile build, fitting her role as a nimble and quick Pyro character in the game.
  - Her movements are graceful and fluid, indicative of her agility and dexterity.

- **Attire**:
  - Hu Tao wears a traditional Chinese-inspired outfit that is both elegant and functional. Her attire is primarily dark in color, with intricate red and gold patterns that symbolize her connection to the Pyro element.
  - She dons a long, dark coat with red accents and a high collar, which adds to her mysterious and authoritative presence as the Director of the Wangsheng Funeral Parlo

In [32]:
class ACGLLMTool(BaseTool):

    name = "`Anime character design generator`"
    description = "Use this tool to generate and explore detailed designs for anime and ACG (Animation, Comics, and Games) characters."

    def _run(self, query: str):
        
        description = acg_chain.invoke(query)
        
        return description

    def _arun(self, radius: int):
        raise NotImplementedError("This tool does not support async")
        
        
class ImageTool(BaseTool):

    name = "ACG characters image generator with DALLE-3"

    input_response_schemas = [
        ResponseSchema(name="image_desc", description="image description / prompt"),
        ResponseSchema(name="filename", description="the location at which the image will be saved"),
        ResponseSchema(name="size", description="image size, can be `1024x1024`, `1024x1792`, `1792x1024`"),
        ResponseSchema(name="quality", description="image quality, can be `hd` or `standard`"),
        ResponseSchema(name="style", description="image style, can be `vivid` or `natural`")]
    
    input_output_parser = StructuredOutputParser.from_response_schemas(input_response_schemas)
    
    input_format_instructions = input_output_parser.get_format_instructions()

    description_template = """
                           This is a tool for creating images. 
                           It's best used when you're considering the need for an ACG (anime, comics, games) character design. 
                           Before using this tool, you may want to utilize the `Anime character design generator` to gather 
                           relevant information. The generated image will maintain the specified art style. 
                           input format: {input_format_instructions}
                           """

    description = description_template.format(input_format_instructions=input_format_instructions)
    
    def _run(self, query):
        
        input_ = self.input_output_parser.parse(query)
        
        image_desc = input_['image_desc']
        size = input_['size']
        quality = input_['quality']
        style = input_['style']
        filename = input_['filename']
        
        dalle3_chain.invoke({"image_desc": image_desc,
                             "size": size,
                             "quality": quality,
                             "style": style,
                             "filename": filename})
        
        return "Done"

    def _arun(self, radius: int):
        raise NotImplementedError("This tool does not support async")

        
prompt_template = """
Answer the following questions as best you can. You have access to the following tools:

{tools}

Use the following format:

Question: the input question you must answer

Thought: you should always think about what to do

Action: the action to take, should be one of [{tool_names}]

Action Input: the input to the action

Observation: the result of the action

... (this Thought/Action/Action Input/Observation can repeat N times)

Thought: I now know the final answer

Final Answer: the final answer to the original input question

Begin!

Question: {input}

Thought:{agent_scratchpad}
"""        
             
prompt = PromptTemplate.from_template(prompt_template)

tools = [ImageTool(), ACGLLMTool()]

zero_shot_agent = create_react_agent(
    llm=model,
    tools=tools,
    prompt=prompt,
)

agent_executor = AgentExecutor(agent=zero_shot_agent, tools=tools, verbose=True)

In [33]:
agent_executor.invoke({"input": f"Generate an image of Hu Tao from Genshim in pastol art style"})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mTo generate an image of Hu Tao from Genshin Impact in a pastel art style, I need to first gather detailed design information about the character using the `Anime character design generator`. This will help ensure that the image generated by the `ACG characters image generator with DALLE-3` is accurate and meets the desired specifications.

Action: `Anime character design generator`

Action Input: Generate detailed design information for Hu Tao from Genshin Impact
[0m[33;1m[1;3mBased on the provided context, here is a detailed description of Hu Tao from Genshin Impact:

- **Face**:
  - Hu Tao has a youthful and cheerful face, reflecting her playful and mischievous nature. Her eyes are a striking crimson red, which is a common trait among Pyro characters in Genshin Impact. She often has a playful and slightly mischievous expression, with a hint of a smile that suggests she is always up to something.

- **Body**:
  - Hu Tao h

{'input': 'Generate an image of Hu Tao from Genshim in pastol art style',
 'output': 'The image of Hu Tao from Genshin Impact in a pastel art style has been generated and saved as "hu_tao_pastel_art.png" with the specified details.'}

In [34]:
agent_executor.invoke({"input": "Generate an award-winning portrait photo of a 17-years-old japanese girl cosplaying Hu Tao from Genshim" })



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mTo generate an award-winning portrait photo of a 17-year-old Japanese girl cosplaying Hu Tao from Genshin Impact, I need to first gather detailed design information about the character using the `Anime character design generator`. This will help ensure the cosplay is accurate and detailed.

Action: Anime character design generator

Action Input: Generate a detailed design for Hu Tao from Genshin Impact
[0mAnime character design generator is not a valid tool, try one of [ACG characters image generator with DALLE-3, `Anime character design generator`].[32;1m[1;3mIt seems I made an error in my previous thought. I should directly use the `ACG characters image generator with DALLE-3` to create the image based on the description of Hu Tao from Genshin Impact.

Action: ACG characters image generator with DALLE-3

Action Input: 

```json
{
	"image_desc": "A 17-year-old Japanese girl cosplaying as Hu Tao from Genshin Impact. She is

{'input': 'Generate an award-winning portrait photo of a 17-years-old japanese girl cosplaying Hu Tao from Genshim',
 'output': 'The award-winning portrait photo of a 17-year-old Japanese girl cosplaying Hu Tao from Genshin Impact has been generated and saved as "hu_tao_cosplay_portrait.png". The image features her in Hu Tao\'s traditional Chinese outfit with a dark brown coat, red shorts, and a black hat with a red ribbon, set against a vivid, colorful background.'}

## Audible 有聲書

- 文轉語音: TTS tool
- 文轉圖: Image tool

### Children Book Image Generator

- Generate image according to the story

In [36]:
system_prompt = PromptTemplate.from_template('''You are a helpful AI assistant and an art expert with extensive knowledge of illustration. 
You excel at creating Pencil and Ink Style illustrations for 6-year-old children using the DALLE-3 model. This style is characterized by 
detailed line work, often in black and white or with minimal color, and has a classic, timeless feel. For this task, you will be provided with 
a paragraph of a story, and you will generate a corresponding DALLE-3 prompt which captures the storyline. The prompt should be 
detailed and descriptive, capturing the essence of the image. The length of the prompt should be around 100-500 tokens.''')

# System prompt
system_message = SystemMessagePromptTemplate(prompt=system_prompt)

human_prompt = PromptTemplate(template="{story}",
                              input_variables=['story'])

# Create a human message prompt template based on the prompt
human_message = HumanMessagePromptTemplate(prompt=human_prompt)

chat_prompt = ChatPromptTemplate.from_messages([system_message, human_message])

nl_prompt_generation_chain = chat_prompt | model | StrOutputParser()     

step_1 = RunnablePassthrough.assign(nl_prompt=itemgetter('story')|nl_prompt_generation_chain)
step_2 = RunnableLambda(dalle3_worker)
image_chain = step_1 | step_2

- Generate the story

In [37]:
system_prompt = PromptTemplate.from_template('''You are a helpful AI assistant who likes children. 
You are great storyteller and know how to create content for kindergarten kids. A short chapter is 
created once at a time.''')

# System prompt
system_message = SystemMessagePromptTemplate(prompt=system_prompt)

human_prompt = PromptTemplate(template="{input}",
                              input_variables=['input'])

# Create a human message prompt template based on the prompt
human_message = HumanMessagePromptTemplate(prompt=human_prompt)

chat_prompt = ChatPromptTemplate.from_messages([system_message, human_message])

story_chain = chat_prompt | model | StrOutputParser()     

In [55]:
import json

from src.agent.react_zero_shot import prompt_template as zero_shot_prompt_template
# from langchain_core.prompts import MessagesPlaceholder

client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])

    
class TTSTool(BaseTool):
    
    name = "Text to Sound"
    description = "Use this tool to generate an audio file"

    name = "Text to Sound (tts) tool"

    input_response_schemas = [
        ResponseSchema(name="text", description="The story"),
        ResponseSchema(name="filename", description="the location at which the audio file will be saved")]
    
    input_output_parser = StructuredOutputParser.from_response_schemas(input_response_schemas)
    
    input_format_instructions = input_output_parser.get_format_instructions()

    description_template = """
                           Use this tool to generate an audio file of the story. 
                           input format: {input_format_instructions}
                           """

    description = description_template.format(input_format_instructions=input_format_instructions)

    
    def _run(self, text: str):

        input_ = self.input_output_parser.parse(text)

        text = input_['text']
        filename = input_['filename']
        
        response = self.tts(text)
        
        response.stream_to_file(filename)
        
        return filename

    def _arun(self, radius: int):
        raise NotImplementedError("This tool does not support async")
        
        
    def tts(self, text: str):
        
        response = client.audio.speech.create(
          model="tts-1",
          voice="nova",
          input=text
        )

        return response
    
    
class ImageTool(BaseTool):

    name = "Image generator with DALLE-3"

    input_response_schemas = [
        ResponseSchema(name="image_desc", description="image description / prompt"),
        ResponseSchema(name="filename", description="the location at which the image will be saved"),
        ResponseSchema(name="size", description="image size, can be `1024x1024`, `1024x1792`, `1792x1024`"),
        ResponseSchema(name="quality", description="image quality, can be `hd` or `standard`"),
        ResponseSchema(name="style", description="image style, can be `vivid` or `natural`")]
    
    input_output_parser = StructuredOutputParser.from_response_schemas(input_response_schemas)
    
    input_format_instructions = input_output_parser.get_format_instructions()

    description_template = """
                           An image is generated and saved according to the input instruction.
                           input instruction:\n {input_format_instructions}
                           """

    description = description_template.format(input_format_instructions=input_format_instructions)
    
    def _run(self, query):
        
        input_ = self.input_output_parser.parse(query)
        
        image_desc = input_['image_desc']
        size = input_['size']
        quality = input_['quality']
        style = input_['style']
        filename = input_['filename']
        
        dalle3_chain.invoke({"image_desc": image_desc,
                             "size": size,
                             "quality": quality,
                             "style": style,
                             "filename": filename})
        
        return "Done"

    def _arun(self, radius: int):
        raise NotImplementedError("This tool does not support async")
           
            
prompt = PromptTemplate.from_template(zero_shot_prompt_template)

tools = [TTSTool(), 
         ImageTool(),
         Tool(name="StoryTeller",
              func=story_chain.invoke,
              description="useful for create story",
        )]

zero_shot_agent = create_react_agent(
    llm=model,
    tools=tools,
    prompt=prompt,
)

agent_executor = AgentExecutor(agent=zero_shot_agent, tools=tools, verbose=True, handle_parsing_errors=True)

In [56]:
prompt = """
        Create a chapter of a baby owl capturing a rodent in the night as his dinner. \
        After having the final answer, plrease create a corresponding image and record the story as an mp3. 
        The saved image and mp3 should have same name in the folder `tutorial/Week-8`
        """

agent_executor.invoke({"input": prompt})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mTo create a chapter of a baby owl capturing a rodent in the night as his dinner, I will first generate the story using the StoryTeller tool. After that, I will create an image using the Image generator with DALLE-3 and record the story as an mp3 using the Text to Sound (tts) tool. 

Let's start by generating the story.

Action: StoryTeller
Action Input: "Create a chapter of a baby owl capturing a rodent in the night as his dinner."
[0m[38;5;200m[1;3m**Chapter 1: Ollie the Brave Baby Owl**

Once upon a time, in a cozy nest high up in an old oak tree, lived a baby owl named Ollie. Ollie had big, round eyes that sparkled like the stars and soft, fluffy feathers that kept him warm at night. He loved to listen to the stories his mama told him about the adventures of brave owls.

One night, as the moon shone brightly in the sky, Ollie felt a rumble in his tummy. "Mama, I'm hungry," he hooted softly.

Mama Owl smiled and ruffled 

  response.stream_to_file(filename)


[36;1m[1;3mtutorial/Week-8/ollie_the_baby_owl.mp3[0m[32;1m[1;3mI have successfully generated the story, created the corresponding image, and recorded the story as an mp3 file. Both the image and the mp3 file have been saved in the folder `tutorial/Week-8` with the same name.

Final Answer: The chapter of the baby owl capturing a rodent in the night as his dinner has been created. The corresponding image and mp3 file have been saved in the folder `tutorial/Week-8` with the name `ollie_the_baby_owl`.

Here are the details:
- Image: `tutorial/Week-8/ollie_the_baby_owl.png`
- Audio: `tutorial/Week-8/ollie_the_baby_owl.mp3`[0m

[1m> Finished chain.[0m


{'input': '\n        Create a chapter of a baby owl capturing a rodent in the night as his dinner.         After having the final answer, plrease create a corresponding image and record the story as an mp3. \n        The saved image and mp3 should have same name in the folder `tutorial/Week-8`\n        ',
 'output': 'The chapter of the baby owl capturing a rodent in the night as his dinner has been created. The corresponding image and mp3 file have been saved in the folder `tutorial/Week-8` with the name `ollie_the_baby_owl`.\n\nHere are the details:\n- Image: `tutorial/Week-8/ollie_the_baby_owl.png`\n- Audio: `tutorial/Week-8/ollie_the_baby_owl.mp3`'}

## Can we create a story with multiple pages?

I do not know the answer, let me try...

4 pages to save the cost. But it can be extended.

In [44]:
prompt = """
         I want to create an 4 pages story for a child. He likes snow owl.
         For each page, please create a corresponding image and record the story as an mp3.
         After having the final answer, please create a corresponding image and record the story as an mp3. 
         The saved image and mp3 should have same name, following the structure of 
         <Page - idx>, with idx as a number starting from 1, in the folder `tutorial/Week-8`
        """

agent_executor.invoke({"input": prompt})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mTo create a 4-page story for a child who likes snow owls, I will first generate the story using the StoryTeller tool. Then, I will create corresponding images for each page using the Image generator with DALLE-3. Finally, I will record the story as an mp3 file using the Text to Sound (tts) tool. Each page will have a corresponding image and audio file saved in the specified folder structure.

Let's start by generating the story.

Action: StoryTeller
Action Input: "Create a 4-page story for a child who likes snow owls."
[0m[38;5;200m[1;3m**Page 1: The Snowy Forest**

Once upon a time, in a land far, far away, there was a magical forest covered in sparkling snow. The trees were tall and their branches were heavy with fluffy white snowflakes. In this enchanted forest lived a very special snow owl named Oliver. Oliver had the softest, whitest feathers and big, bright eyes that twinkled like stars.

**Page 2: Oliver's Friends**

  response.stream_to_file(filename)


[36;1m[1;3mtutorial/Week-8/Page-1.mp3[0m[32;1m[1;3mI have successfully generated the audio file for Page 1. Now, I will proceed to create the audio file for Page 2.

Action: Text to Sound (tts) tool
Action Input: 
```json
{
	"text": "Oliver wasn't alone in the snowy forest. He had many friends! There was Bella the bunny, who loved to hop around in the snow, and Max the moose, who was always ready for a fun adventure. Oliver and his friends loved to play together, especially when the moon was full and the snow glowed like silver.",
	"filename": "tutorial/Week-8/Page-2.mp3"
}
```
[0m

  response.stream_to_file(filename)


[36;1m[1;3mtutorial/Week-8/Page-2.mp3[0m[32;1m[1;3mI have successfully generated the audio file for Page 2. Now, I will proceed to create the audio file for Page 3.

Action: Text to Sound (tts) tool
Action Input: 
```json
{
	"text": "One chilly evening, as Oliver and his friends were playing, the wind started to howl. The sky turned dark, and snow began to fall faster and faster. It was a big snowstorm! Oliver knew they needed to find shelter quickly. 'Follow me, everyone!' he hooted. With his keen eyes, Oliver led his friends to a cozy cave where they could stay safe and warm.",
	"filename": "tutorial/Week-8/Page-3.mp3"
}
```
[0m

  response.stream_to_file(filename)


[36;1m[1;3mtutorial/Week-8/Page-3.mp3[0m[32;1m[1;3mI have successfully generated the audio file for Page 3. Now, I will proceed to create the audio file for Page 4.

Action: Text to Sound (tts) tool
Action Input: 
```json
{
	"text": "Inside the cave, Oliver and his friends snuggled close together. Bella the bunny found some soft moss to make a bed, and Max the moose used his big antlers to hang up a lantern. They told stories and sang songs until the storm passed. When the morning sun peeked through the cave entrance, the forest was even more beautiful than before. Oliver and his friends stepped outside, ready for a new day of snowy adventures. And so, Oliver the snow owl and his friends lived happily ever after, always looking out for each other in their magical, snowy forest. The end.",
	"filename": "tutorial/Week-8/Page-4.mp3"
}
```
[0m

  response.stream_to_file(filename)


[36;1m[1;3mtutorial/Week-8/Page-4.mp3[0m[32;1m[1;3mI have successfully generated the audio files for all four pages. Now, I have the final answer with all the required images and audio files saved in the specified folder structure.

Final Answer: The 4-page story for a child who likes snow owls has been created. Each page has a corresponding image and audio file saved in the folder `tutorial/Week-8` with the following structure:

- `tutorial/Week-8/Page-1.png`
- `tutorial/Week-8/Page-1.mp3`
- `tutorial/Week-8/Page-2.png`
- `tutorial/Week-8/Page-2.mp3`
- `tutorial/Week-8/Page-3.png`
- `tutorial/Week-8/Page-3.mp3`
- `tutorial/Week-8/Page-4.png`
- `tutorial/Week-8/Page-4.mp3`

Here is the story:

**Page 1: The Snowy Forest**

Once upon a time, in a land far, far away, there was a magical forest covered in sparkling snow. The trees were tall and their branches were heavy with fluffy white snowflakes. In this enchanted forest lived a very special snow owl named Oliver. Oliver had the s

{'input': '\n         I want to create an 4 pages story for a child. He likes snow owl.\n         For each page, please create a corresponding image and record the story as an mp3.\n         After having the final answer, please create a corresponding image and record the story as an mp3. \n         The saved image and mp3 should have same name, following the structure of \n         <Page - idx>, with idx as a number starting from 1, in the folder `tutorial/Week-8`\n        ',
 'output': 'The 4-page story for a child who likes snow owls has been created. Each page has a corresponding image and audio file saved in the folder `tutorial/Week-8` with the following structure:\n\n- `tutorial/Week-8/Page-1.png`\n- `tutorial/Week-8/Page-1.mp3`\n- `tutorial/Week-8/Page-2.png`\n- `tutorial/Week-8/Page-2.mp3`\n- `tutorial/Week-8/Page-3.png`\n- `tutorial/Week-8/Page-3.mp3`\n- `tutorial/Week-8/Page-4.png`\n- `tutorial/Week-8/Page-4.mp3`\n\nHere is the story:\n\n**Page 1: The Snowy Forest**\n\nOnce 

## Can we create a story in an interactive way: chat based

-- Rolling back...

In [57]:
from langchain.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough

from langchain.output_parsers import StructuredOutputParser, ResponseSchema

output_response_schemas = [
        ResponseSchema(name="story", description="the story content in the page"),
        ResponseSchema(name="page index", description="The page number of the story"),
    ]

output_parser = StructuredOutputParser.from_response_schemas(output_response_schemas)

output_format_instructions = output_parser.get_format_instructions()

template = """
           Create a story paragraph, based on the description: {text}

           Previous content:
           {context}

            
           After having the final answer, please create a corresponding image and record the story as an mp3. 
           The saved image and mp3 should have same name in the folder `tutorial/Week-8`

           output format instruction: {output_format_instruction}
           """

prompt_template = PromptTemplate(template=template,
                                 input_variables=["text", "context"],
                                 partial_variables={"output_format_instruction": output_format_instructions})

agent_chain = RunnablePassthrough.assign(input=prompt_template)|agent_executor

In [58]:
Q = agent_chain.invoke({"text": "A little cat just woke up in the morning",
                        "context": "the beginning of the story, page 1"})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mTo create the story paragraph, I will use the StoryTeller tool. After generating the story, I will create a corresponding image using the Image generator with DALLE-3 and record the story as an mp3 using the Text to Sound (tts) tool. The saved image and mp3 will have the same name in the folder `tutorial/Week-8`.

Let's start by generating the story paragraph.

Action: StoryTeller

Action Input: "A little cat just woke up in the morning"
[0m[38;5;200m[1;3m### Chapter 1: The Morning Adventure of Whiskers the Cat

Once upon a time, in a cozy little house at the edge of a friendly forest, there lived a tiny, fluffy cat named Whiskers. Whiskers had the softest fur, the brightest green eyes, and a curious little nose that twitched whenever he was excited.

One sunny morning, Whiskers woke up to the sound of birds chirping happily outside his window. He stretched his little paws, yawned a big kitty yawn, and blinked his sleepy e

  response.stream_to_file(filename)


[36;1m[1;3mtutorial/Week-8/Whiskers_Morning_Adventure.mp3[0m[32;1m[1;3mI now have the final answer. The story has been created, the corresponding image has been generated, and the story has been recorded as an mp3. The saved image and mp3 have the same name in the folder `tutorial/Week-8`.

Final Answer:
```json
{
	"story": "Once upon a time, in a cozy little house at the edge of a friendly forest, there lived a tiny, fluffy cat named Whiskers. Whiskers had the softest fur, the brightest green eyes, and a curious little nose that twitched whenever he was excited. One sunny morning, Whiskers woke up to the sound of birds chirping happily outside his window. He stretched his little paws, yawned a big kitty yawn, and blinked his sleepy eyes open. The sun was shining, and the day was full of possibilities! Whiskers hopped out of his comfy bed and padded over to the window. He saw his friends, the bluebirds, fluttering around the garden, and the butterflies dancing in the air. 'What a 

In [59]:
Q

{'text': 'A little cat just woke up in the morning',
 'context': 'the beginning of the story, page 1',
 'input': StringPromptValue(text='\n           Create a story paragraph, based on the description: A little cat just woke up in the morning\n\n           Previous content:\n           the beginning of the story, page 1\n\n            \n           After having the final answer, please create a corresponding image and record the story as an mp3. \n           The saved image and mp3 should have same name in the folder `tutorial/Week-8`\n\n           output format instruction: The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":\n\n```json\n{\n\t"story": string  // the story content in the page\n\t"page index": string  // The page number of the story\n}\n```\n           '),
 'output': '```json\n{\n\t"story": "Once upon a time, in a cozy little house at the edge of a friendly forest, there lived a tiny, fluf