why bother learning how to prompt an image generation model when we can mine-tune a LM to do that for us?

In [None]:
%pip install dspy-ai
%pip install ipywidgets
%pip install IPython
%pip install requests
%pip install markdownify
%pip install openai

In [22]:
import dspy
import os
import dotenv
import pydantic
import json
from IPython.display import Markdown, display, Image
from openai import OpenAI
client = OpenAI()

dotenv.load_dotenv() #load via .env in this folder (.env is in .gitignore)
#os.environ['OPENAI_API_KEY'] = 'sk-YOUR_OPENAI_API_KEY' #or set directly here, just remember not to commit to GitHub
assert 'OPENAI_API_KEY' in os.environ

llm = dspy.OpenAI(model='gpt-4o', temperature=0.1, max_tokens=4096) #later need to add vision
dspy.settings.configure(lm=llm)

later: message history, vision, constrain the visual description.

In [None]:
#concept = input("What is your vision?") #from interactive input
concept = "a steampunk chic superhero." #alternatively write the string directly into the code

#signature
class Visualization(dspy.Signature):
    """You are a visual artist. You are always learning and improving based on user feedback. Develop a detailed but concise visual description of the concept. The description cannot exceed 4000 characters."""
    concept = dspy.InputField(description="this is the user's query. use it to generate a detailed visual description.")
    rating = dspy.InputField(desc="This is the user's rating of your last response.")
    feedback = dspy.InputField(desc="This is the user's feedback on your last response.")
    rationale = dspy.OutputField(description="this is your rationale for the visual description.")
    visual_description = dspy.OutputField(description="concise visual description that will be passed to an image generation model. 4000 characters max.") #later can constrain

#module 
class VisualizerModule(dspy.Module):
    def __init__(self, **kwargs):
        super().__init__()
        self.signature = Visualization
        self.predictor = dspy.ChainOfThought(self.signature)
        self.kwargs = {
            **kwargs,
        }

    def forward(self, concept, rating=None, feedback=None):
        result = self.predictor(concept=concept, rating=rating, feedback=feedback)
        return dspy.Prediction(rationale=result.rationale, visual_description=result.visual_description)



In [29]:
#this will initial (or reset) the image log
image_log = []



In [45]:
#instance
visualizer = VisualizerModule()

response = visualizer(concept=concept)
display(Markdown(response.rationale))
display(Markdown(response.visual_description))
visual_description_length = len(response.visual_description)
print(f"Character count of visual_description: {visual_description_length}")


import nest_asyncio
import asyncio
from concurrent.futures import ThreadPoolExecutor

nest_asyncio.apply()

def generate_image_sync(prompt):
    this_image_data = client.images.generate(
        model="dall-e-3",
        prompt=prompt,
        n=1,
        size="1792x1024"
    )
    return this_image_data

async def generate_image(prompt):
    from datetime import datetime
    current_time = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
    #display(f"starting at {current_time}")
    loop = asyncio.get_event_loop()
    this_image_data = await loop.run_in_executor(None, generate_image_sync, prompt)
    for image_data in this_image_data.data:
        image_log.append({
            "concept": concept,
            "rationale": response.rationale,
            "visual_description": response.visual_description,
            "revised_prompt": image_data.revised_prompt,
            "url": image_data.url
        })
        display(Image(url=image_data.url))

async def generate_images(prompt, n):
    tasks = [generate_image(prompt) for _ in range(n)]
    await asyncio.gather(*tasks)






The rationale behind this concept is to blend the intricate, mechanical aesthetics of steampunk with the sleek, stylish elements of chic fashion, all while embodying the heroic and dynamic qualities of a superhero. This character should exude a sense of vintage elegance combined with futuristic innovation, making them both visually striking and functionally formidable.

The steampunk chic superhero stands tall and confident, their presence commanding attention. They are adorned in a meticulously crafted ensemble that seamlessly merges Victorian-era fashion with advanced, steam-powered technology.

**Head and Face:**
The superhero's face is partially obscured by a sleek, brass and leather mask that covers the eyes and nose, featuring intricate gears and small, glowing lenses that enhance vision. Their hair is styled in a sophisticated yet practical manner, perhaps in a high ponytail or a short, tousled look, with streaks of metallic color running through it.

**Upper Body:**
The torso is protected by a fitted, corset-like armor made of dark leather and reinforced with brass plates. The corset is adorned with intricate engravings and small, functional gadgets, such as a retractable grappling hook and a mini steam-powered engine that provides additional strength. Over the corset, they wear a tailored, high-collared jacket with puffed sleeves, made of rich, dark fabric with subtle metallic threads woven throughout.

**Arms and Hands:**
Their arms are covered in long, leather gloves that extend past the elbows, each glove embedded with small, brass gears and tubes that enhance dexterity and strength. The gloves also feature retractable claws and hidden compartments for various tools and weapons. On one wrist, they wear a multi-functional, steam-powered wristwatch that can project holographic maps and communicate with allies.

**Lower Body:**
The lower half of the superhero's outfit consists of fitted, high-waisted trousers made of durable, dark fabric, with brass buttons and buckles adding both style and functionality. The trousers are tucked into knee-high, leather boots that are reinforced with metal plating and equipped with small, steam-powered jets for short bursts of flight or enhanced jumps.

**Accessories:**
A long, flowing cape made of a lightweight, shimmering fabric is attached to the shoulders, providing both dramatic flair and practical use as a glider. The cape is lined with pockets and compartments for storing gadgets and tools. Around their waist, they wear a utility belt with various pouches and holsters, each containing essential items like smoke bombs, a collapsible staff, and a steam-powered pistol.

**Overall Appearance:**
The steampunk chic superhero's overall appearance is a perfect blend of elegance and functionality. Their outfit is a harmonious mix of dark, rich fabrics and gleaming brass, with every detail serving a purpose. The combination of Victorian-inspired fashion and advanced, steam-powered technology creates a unique and captivating look that sets them apart from other superheroes. Their presence is both commanding and inspiring, embodying the spirit of innovation and heroism.

Character count of visual_description: 2790


'starting at 2024-05-25 08:16:29'

'starting at 2024-05-25 08:16:29'

'starting at 2024-05-25 08:16:29'

[{'concept': 'a steampunk chic superhero.',
  'rationale': 'The rationale behind this concept is to blend the intricate, mechanical aesthetics of steampunk with the sleek, stylish elements of chic fashion, all while embodying the heroic and dynamic qualities of a superhero. This character should exude a sense of vintage elegance combined with futuristic innovation, making them both visually striking and functionally formidable.',
  'visual_description': "The steampunk chic superhero stands tall and confident, their presence commanding attention. They are adorned in a meticulously crafted ensemble that seamlessly merges Victorian-era fashion with advanced, steam-powered technology.\n\n**Head and Face:**\nThe superhero's face is partially obscured by a sleek, brass and leather mask that covers the eyes and nose, featuring intricate gears and small, glowing lenses that enhance vision. Their hair is styled in a sophisticated yet practical manner, perhaps in a high ponytail or a short, tous

In [None]:

# how many do you want at a time? dalle-3 is roughly $0.04 per image 
n = 3
await generate_images(response.visual_description, n)

display(image_log)

In [47]:
# Initialize storage for ratings and feedback. re-running this cell will also overwrite it
ratings_feedback = []

In [50]:
import ipywidgets as widgets
from IPython.display import display, clear_output

# Variable to store the current index
current_index = 0

# Variable to store the selected rating
selected_rating = None

# Create buttons for ratings 1 to 5
buttons = [widgets.Button(description=str(i)) for i in range(1, 6)]

# Arrange the buttons horizontally and center them
button_box = widgets.HBox(buttons, layout=widgets.Layout(justify_content='center'))

# Create a Textarea widget for feedback
feedback_area = widgets.Textarea(
    value='',
    placeholder='Type your feedback here',
    description='Feedback:',
    disabled=False,
    layout=widgets.Layout(width='100%', height='100px')
)

# Create a submit button
submit_button = widgets.Button(description="Submit")

# Center the submit button
submit_button_box = widgets.HBox([submit_button], layout=widgets.Layout(justify_content='center'))

# Function to display the current response and feedback widgets
def display_current_image():
    global current_index
    clear_output()
    
    # Display the current image and its details
    current_image = image_log[current_index]
    display(widgets.HTML(f"<div style='text-align: center;'><img src='{current_image['url']}' width='80%'></div>"))
    
    # Display the rating buttons
    display(button_box)
    
    # Display the feedback area
    display(feedback_area)
    
    # Display the submit button
    display(submit_button_box)

# Function to handle submit button click
def on_submit_click(b):
    global current_index, selected_rating
    
    # Get the feedback text
    feedback_text = feedback_area.value
    
    # Save the rating and feedback
    ratings_feedback.append({
        **image_log[current_index],
        'rating': selected_rating,
        'feedback': feedback_text
    })
    
    # Move to the next response
    current_index += 1
    
    # Reset the selected rating and feedback area
    selected_rating = None
    feedback_area.value = ''
    
    # Reset button styles
    for button in buttons:
        button.style.button_color = None
    
    # Check if there are more images to rate
    if current_index < len(image_log):
        display_current_image()
    else:
        clear_output()
        display(widgets.HTML("<b>All images have been rated. Thank you!</b>"))
        display(ratings_feedback)

# Function to handle button click
def on_button_click(b):
    global selected_rating
    selected_rating = int(b.description)
    
    # Reset button styles
    for button in buttons:
        button.style.button_color = None
    
    # Highlight the selected button
    b.style.button_color = 'lightblue'
    print(f"Selected rating: {selected_rating}")

# Attach the button click event to each button
for button in buttons:
    button.on_click(on_button_click)

# Attach the submit button click event
submit_button.on_click(on_submit_click)

# Display the first response
display_current_image()

HTML(value='<b>All images have been rated. Thank you!</b>')

[{'concept': 'a steampunk chic superhero.',
  'rationale': 'The rationale behind this concept is to blend the intricate, mechanical aesthetics of steampunk with the sleek, stylish elements of chic fashion, all while embodying the heroic and dynamic qualities of a superhero. This character should exude a sense of vintage elegance combined with futuristic innovation, making them both visually striking and functionally formidable.',
  'visual_description': "The steampunk chic superhero stands tall and confident, their presence commanding attention. They are adorned in a meticulously crafted ensemble that seamlessly merges Victorian-era fashion with advanced, steam-powered technology.\n\n**Head and Face:**\nThe superhero's face is partially obscured by a sleek, brass and leather mask that covers the eyes and nose, featuring intricate gears and small, glowing lenses that enhance vision. Their hair is styled in a sophisticated yet practical manner, perhaps in a high ponytail or a short, tous

that was a lot of attention. let's make sure to save the feedback.

In [51]:
save = ratings_feedback
display(save)
import json

with open('demo_ratings_feedback.json', 'w') as f:
    json.dump(save, f)


[{'concept': 'a steampunk chic superhero.',
  'rationale': 'The rationale behind this concept is to blend the intricate, mechanical aesthetics of steampunk with the sleek, stylish elements of chic fashion, all while embodying the heroic and dynamic qualities of a superhero. This character should exude a sense of vintage elegance combined with futuristic innovation, making them both visually striking and functionally formidable.',
  'visual_description': "The steampunk chic superhero stands tall and confident, their presence commanding attention. They are adorned in a meticulously crafted ensemble that seamlessly merges Victorian-era fashion with advanced, steam-powered technology.\n\n**Head and Face:**\nThe superhero's face is partially obscured by a sleek, brass and leather mask that covers the eyes and nose, featuring intricate gears and small, glowing lenses that enhance vision. Their hair is styled in a sophisticated yet practical manner, perhaps in a high ponytail or a short, tous

now we need to convert our feedback into a DSPy.example

In [52]:
# Append context to each item in ratings_feedback
for item in ratings_feedback:
    item['rating'] = str(item['rating'])

#convert to a dspy.example
from dspy import Example
ratings_feedback_examples = [
    Example(base=item).with_inputs('concept','rating','feedback') for item in ratings_feedback #we could consider making dalle-3 rewritten prompt an input 
]
#display(ratings_feedback_examples)

try a simple optimizer

In [53]:
from dspy.teleprompt import BootstrapFewShot

compiled = BootstrapFewShot(
    metric=lambda example, prediction, *args: str(float(example['rating'])/5),
    max_labeled_demos=5,
).compile(
    student=VisualizerModule(),
    trainset=ratings_feedback_examples,
)

compiled.save('demo_compiled_first_try.json')

python(52216) MallocStackLogging: can't turn off malloc stack logging because it was not enabled.
 40%|████      | 4/10 [00:43<01:04, 10.82s/it]

Bootstrapped 4 full traces after 5 examples in round 0.





now we see if we like the outputs any better.

In [54]:
response = compiled(concept=concept)
compiled_output = generate_image_sync(response.visual_description)
display(Image(url=compiled_output.data[0].url))



from here, we could:
- save our images with wget
- use a new "concept" prompt to get more variability in the training set, optionally this could be "bootstrapped" from the compiled version, and then re-compiled
- see how this ports over to another image model, like SDXL, etc.