# Python Text2Image (Stable Diffusion) Tutorial

This tutorial demonstrates how to use the Text2Image Python API to generate images from text prompts on Hailo hardware.

The Text2Image API provides direct generation as well as advanced controls over diffusion steps, guidance scale, seed, scheduler type and more.

**Best Practice: context-manager**
This tutorial doesnt use context-manager, to share resources between different cells. Make sure to create VDevice and Text2Image using 'with' statements whenever possible. When not using 'with', use VDevice.release() and Text2Image.release() to cleanup resources.

**Requirements:**

* Run the notebook inside the Python virtual environment: ```source hailo_virtualenv/bin/activate```
* Three HEF files: denoiser, text encoder, and image decoder

**Tutorial Structure:**

* Basic Text2Image initialization and simple generation
* Generation parameters (steps_count, guidance_scale, samples_count, seed)
* Output image metadata and tokenization
* Optional: using a different diffusion scheduler (DDIM)

When inside the ```virtualenv```, use the command ``jupyter-notebook <tutorial-dir>`` to open a Jupyter server that contains the tutorials (default folder on GitHub: ``hailort/libhailort/bindings/python/platform/hailo_tutorials/notebooks/``).



In [None]:
# Text2Image Tutorial: Setup and Configuration

from hailo_platform import VDevice
from hailo_platform.genai import Text2Image
from hailo_platform.pyhailort.pyhailort import HailoDiffuserSchedulerType

# Configuration - Update these paths for your setup
DENOISE_HEF = "/your/hef/path/denoise.hef"           # Update this path
TEXT_ENCODER_HEF = "/your/hef/path/text_encoder.hef"  # Update this path
IMAGE_DECODER_HEF = "/your/hef/path/image_decoder.hef"# Update this path

print("DENOISE_HEF: {}".format(DENOISE_HEF))
print("TEXT_ENCODER_HEF: {}".format(TEXT_ENCODER_HEF))
print("IMAGE_DECODER_HEF: {}".format(IMAGE_DECODER_HEF))

vdevice = VDevice()
print("Initializing Text2Image... this may take a moment...")
t2i = Text2Image(vdevice, DENOISE_HEF, TEXT_ENCODER_HEF, IMAGE_DECODER_HEF)
print("Text2Image initialized successfully!")


## Basic Generation Example

Generate a single image from a short prompt with default parameters.


In [None]:
# It is best practice to use negative prompt to guide what should be avoided in the generated image
NEGATIVE_PROMPT = "ugly, tiling, poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, extra limbs, "\
    "disfigured, deformed, body out of frame, bad anatomy, watermark, signature,cut off, low contrast, underexposed, "\
    "overexposed, bad art, beginner, amateur, distorted face"

images = t2i.generate(
    prompt="A scenic mountain landscape at sunrise",
    negative_prompt=NEGATIVE_PROMPT,
    samples_count=1,
    steps_count=5,
)
print("Generated {} image(s)".format(len(images)))
img = images[0]
print("Image shape: {} | dtype: {}".format(img.shape, img.dtype))


## Generation Parameters Example

Tune quality, style adherence, reproducibility, and samples-count size using parameters.


In [None]:
# Multiple samples with guidance and reproducible seed
images_multi = t2i.generate(
    prompt="A cat wearing sunglasses",
    negative_prompt=NEGATIVE_PROMPT,
    samples_count=2,
    steps_count=8,
    guidance_scale=7.5,
)
print("Batch generated {} images".format(len(images_multi)))

# Seed reproducibility
seed = 42
img_a = t2i.generate("A red sports car", samples_count=1, steps_count=6, seed=seed)[0]
img_b = t2i.generate("A red sports car", samples_count=1, steps_count=6, seed=seed)[0]
print("Deterministic with same seed:", (img_a == img_b).all())


## Output Image Metadata & Tokenization

Query output shape and type, and use built-in tokenizer for prompt preprocessing.


In [None]:
# Output metadata
shape = t2i.output_sample_shape()
dtype = t2i.output_sample_format_type()
order = t2i.output_sample_format_order()
print("Expected output shape: {} | dtype: {} | order: {}".format(shape, dtype, order))

# Tokenization
for text in ["Hello world", "A beautiful sunset"]:
    tokens = t2i.tokenize(text)
    print("'{}' -> {} tokens".format(text, len(tokens)))


## Optional: Different Diffusion Scheduler (DDIM)

Create `Text2Image` with a different scheduler to vary denoising behavior.


In [None]:
# Re-initialize with DDIM scheduler (optional)
t2i.release()

t2i = Text2Image(vdevice, DENOISE_HEF, TEXT_ENCODER_HEF, IMAGE_DECODER_HEF,
                 scheduler_type=HailoDiffuserSchedulerType.DDIM)

# Quick generation with DDIM
img_ddim = t2i.generate("A watercolor painting of a city skyline", negative_prompt=NEGATIVE_PROMPT, samples_count=1, steps_count=6)[0]
print("Generated image with DDIM | shape: {} | dtype: {}".format(img_ddim.shape, img_ddim.dtype))


## Clean up resources (best practice: use context managers when possible)


In [None]:
t2i.release()
vdevice.release()
print("Resources released successfully")
print("Text2Image tutorial completed!")
