# Z-Image-Turbo: Text-to-Image Generation

This notebook provides a simple interface to generate images using the **Tongyi-MAI/Z-Image-Turbo** model.

- **Model**: 6B-parameter text-to-image generation model
- **Speed**: Lightning-fast inference (optimized for 8-step generation)
- **Resolution**: Supports up to 2048×2048 (2MP)
- **Languages**: English and Chinese text rendering

---

## 1. Install Dependencies

Install required packages including diffusers from source (required for ZImagePipeline).

In [None]:
!pip install -q torch transformers accelerate protobuf sentencepiece
!pip install -q git+https://github.com/huggingface/diffusers.git

## 2. Load Model

Load the Z-Image-Turbo pipeline. This will download the model on first run (~12GB).

In [None]:
import torch
from diffusers import ZImagePipeline
from IPython.display import display

# Load the pipeline
print("Loading Z-Image-Turbo model...")
device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.bfloat16 if device == "cuda" else torch.float32

pipe = ZImagePipeline.from_pretrained(
    "Tongyi-MAI/Z-Image-Turbo",
    torch_dtype=dtype,
    low_cpu_mem_usage=False
)
pipe.to(device)

print(f"Model loaded successfully on {device}!")

## 3. Configure Generation Parameters

Set your prompt and generation parameters here.

In [None]:
# ============================================
# GENERATION PARAMETERS - Edit these!
# ============================================

# Your text prompt
prompt = "A serene landscape with mountains and a lake at sunset, highly detailed, 4k"

# Image dimensions (must be divisible by 16)
width = 1024
height = 1024

# Generation settings
steps = 8              # Number of inference steps (1-50, default: 8)
guidance_scale = 0.0   # Guidance scale (0-10, default: 0.0)
seed = -1              # Seed for reproducibility (-1 for random)

# ============================================

print(f"Prompt: {prompt}")
print(f"Resolution: {width}×{height}")
print(f"Steps: {steps}, Guidance: {guidance_scale}, Seed: {seed}")

## 4. Generate and Display Image

Run this cell to generate the image based on your prompt.

In [None]:
# Validate dimensions
if height % 16 != 0 or width % 16 != 0:
    raise ValueError("Height and Width must be divisible by 16")

# Set up generator for reproducibility
generator = None
if seed != -1:
    generator = torch.Generator(device).manual_seed(seed)

# Generate image
print("Generating image...")
image = pipe(
    prompt=prompt,
    height=height,
    width=width,
    num_inference_steps=steps,
    guidance_scale=guidance_scale,
    generator=generator,
).images[0]

print("Done! Displaying image...")

# Display the generated image
display(image)

# Optionally save the image
output_filename = "generated_image.png"
image.save(output_filename)
print(f"Image saved as: {output_filename}")

## 5. Common Presets (Optional)

Here are some common resolution presets you can use:

In [None]:
# Uncomment the preset you want to use, then re-run the generation cell

# Square formats
# width, height = 512, 512      # Standard square
# width, height = 768, 768      # Medium square
# width, height = 1024, 1024    # Large square (default)

# Portrait formats (3:4)
# width, height = 896, 1152

# Landscape formats (4:3)
# width, height = 1152, 896

# Widescreen formats (16:9)
# width, height = 1344, 768
# width, height = 1280, 720     # HD
# width, height = 1920, 1088    # Full HD

print(f"Current resolution: {width}×{height}")

## Tips for Best Results

1. **Prompt Writing**: Be descriptive and specific. Include style keywords like "highly detailed", "4k", "photorealistic", etc.
2. **Steps**: The model is optimized for 8 steps, but you can experiment with 4-20 steps.
3. **Guidance Scale**: Start with 0.0 (default). Increase if you want stronger prompt adherence.
4. **Seed**: Use a specific seed (e.g., 42) to reproduce the same image with the same parameters.
5. **Resolution**: Higher resolutions require more VRAM. Start with 1024×1024 on most GPUs.

---

## Model Information

- **Model ID**: Tongyi-MAI/Z-Image-Turbo
- **Architecture**: S3-DiT (Scalable Single-Stream Diffusion Transformer)
- **Text Encoder**: Qwen 4B LLM
- **License**: Apache 2.0
- **GitHub**: [https://github.com/Aaryan-Kapoor/z-image-turbo](https://github.com/Aaryan-Kapoor/z-image-turbo)