Muhammad Mansoor Alam

Task 02: "Image Generation with Pre-trained Models"

Utilize pre-trained generative models like DALL-E-mini or Stable Diffusion to create images from text prompts.

**How it works?**

StableDiffusion is a model that generates images from text prompts. It's based on a concept called "latent diffusion," which involves a few key steps:

Super-Resolution Concept: Imagine you have a blurry image, and you train a model to make it clearer. The model doesn't recover lost details magically; it guesses what those details might be based on its training data. This process is called "super-resolution."

Denoising Noise: If you take a noisy image (pure random noise) and apply the same idea, the model can "denoise" it, gradually turning the noise into a clear image by making educated guesses.

Latent Diffusion Model: This idea was expanded in a model called "latent diffusion," which repeatedly refines a noisy image to make it clear and high-resolution.

Text-to-Image Generation: To create images from text, you add another step called "conditioning." This involves adding a text representation to the noise and training the model with image-caption pairs.

The StableDiffusion architecture consists of three parts:

Text Encoder: Converts your text prompt into a vector.
Diffusion Model: Denoises a 64x64 image patch repeatedly.
Decoder: Converts the final 64x64 image into a high-resolution 512x512 image.
Here's the process in simple steps:

Text Prompt: Your text is converted into a vector by the text encoder.
Denoising: This vector is combined with random noise, and the diffusion model refines this noise over several steps.
Rendering: The final refined image patch is upscaled to a high-resolution image by the decoder.
Even though the system seems magical, it’s based on lots of training with billions of images and captions. The code for this system is relatively short, showing that complexity comes from the vast amount of data and training, not the code itself.

**Imports and Downloads**

In [None]:
%pip install --quiet --upgrade diffusers transformers accelerate invisible_watermark mediapy omegaconf
!pip install omegaconf
use_refiner = False

import mediapy as media
import random
import sys
import torch
from diffusers import DiffusionPipeline


**Preparing The Model**

In [None]:
model = "stabilityai/stable-diffusion-xl-base-1.0"

pipe = DiffusionPipeline.from_pretrained(
    model,
    torch_dtype=torch.float16,
    safety_checker=None,
    requires_safety_checker=False
)

**Prompt**

**NOTE: Collab will be using CPU to porcess the model, which will take around 35-40 mins to run , for a single 40x40 image.**

**For optimal results, Download ipynb file, and then run it with NVIDIA CUDA, if you have a NVIDIA GPU, this will enable you to process upto 50 images at once and with a resolution of 720p, upto 4k! within minutes.**

In [None]:
import os
prompt = "purple sky" #@param {type:"string"}
seed = 9181757 # @param {type:"slider", min:0, max:9000000000, step:1}

negative_prompt = "bad-picture-chill-75v, ng_deepnegative_v1_75t, badhandv4, (worst quality:2), (low quality:2), (normal quality:2), (lowres:2), (bad anatomy:2), (bad hands:2), (watermark:2), (mole:1.5), (freckles:1.5)" #@param {type:"string"}

width = 40  #@param {type:"slider", min:8, max:2048, step:8}
height = 40  #@param {type:"slider", min:8, max:2048, step:8}

width = int(width)
height = int(height)

images = pipe(
    prompt=prompt,
    width=width,
    height=height,
    negative_prompt=negative_prompt,
    output_type="latent" if use_refiner else "pil",
    generator=torch.Generator().manual_seed(seed)
).images

if use_refiner:
  images = refiner(
      prompt = prompt,
      negative_prompt = negative_prompt,
      image = images,
      ).images

print(f"Prompt:\t{prompt}\nSeed:\t{seed}")

base_filename = "output.jpg"
new_filename = base_filename

if os.path.exists(base_filename):
    index = 1
    while True:
        new_filename = f"output_{index}.jpg"
        if not os.path.exists(new_filename):
            break
        index += 1

images[0].save(new_filename)
media.show_images(images)
