# Week 6a: Image generation with stable diffusion

In this notebook, you will be looking at text to image generation with Stable Diffusion and later some different ways to make animations with text to image diffusion techniques. Following that, there are some tasks to build an interface around different parts of functionality in streamlit. 

Today you will be using [kjsman's simplified (and hackable!) stable diffusion PyTorch implementation](https://github.com/kjsman/stable-diffusion-pytorch).

### Setting up your Python environment

Before you work through this notebook, please follow the instructions in [Setup-and-test-conda-environment.ipynb](Setup-and-test-conda-environment.ipynb)

Once you have done that you will need to make sure that the environment selected to run this notebook and all the other notebooks used in this unit is called `aim`. 

To do this click the **Select kernel** button in the top right corner of this notebook, and then select `aim`.

To make sure that is configured properly, Hit the run cell button (▶) on the cell below:

In [1]:
import os
print(os.environ['CONDA_DEFAULT_ENV'])

aim


Does it output the text `aim`?

If it does not output the text `aim`, please revisit and follow the instructions in [Setup-and-test-conda-environment.ipynb](Setup-and-test-conda-environment.ipynb).

If you still cannot get it working, please raise this with the course instructor. 

### Download model weights

Next, try downloading the weights for stable diffusion with curl, if that doesn't work, just copy and paste the url directly into your browser and it will start downloading. You will need to move the file into the same folder as this notebook to run the next command.

In [2]:
!curl -OL https://huggingface.co/jinseokim/stable-diffusion-pytorch-data/resolve/main/data.v20221029.tar

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100  1185  100  1185    0     0   6459      0 --:--:-- --:--:-- --:--:--  6546

  0 4069M    0 32768    0     0   102k      0 11:18:03 --:--:-- 11:18:03  102k
  0 4069M    0 5228k    0     0  3994k      0  0:17:23  0:00:01  0:17:22 5212k
  0 4069M    0 11.8M    0     0  5261k      0  0:13:11  0:00:02  0:13:09 6067k
  0 4069M    0 19.3M    0     0  5972k      0  0:11:37  0:00:03  0:11:34 6583k
  0 4069M    0 26.7M    0     0  6351k      0  0:10:56  0:00:04  0:10:52 6839k
  0 4069M    0 32.9M    0     0  6362k      0  0:10:54  0:00:05  0:10:49 6753k
  0 4069M    0 37.9M    0     0  6166k      0  0:11:15  0:00:06  0:11:09 6735k
  1 4069M    1 41.6M    0     0  5826k      0  0:11:55  0:00:07  0:11:48 6086k
  1 4069M    1 44.3M    0     0  5465k      0  0:1

We can then unpack the following file using the following instruction.

In [3]:
!tar -xf data.v20221029.tar

### Imports

In [8]:
#@title Preload models (takes about ~20 seconds on default settings)
import os
import numpy as np
import IPython.display

from PIL import Image
from matplotlib.pyplot import imshow

from stable_diffusion_pytorch import pipeline
from stable_diffusion_pytorch import model_loader

%matplotlib inline
models = model_loader.preload_models('cpu')

## Generate images from text

The first task is to familiarise yourself with image generation using stable diffusion. 

To begin with we will be generating images from random noise using a text string. Experiment with different prompts and different generation parameters to familarise yourself with all of the possible ways of configuring this kind of model. 

In [9]:
import os
import numpy as np
from PIL import Image
import torch

# Your prompt and generation setup
prompt = "A Cloudy day in London"
prompts = [prompt]
uncond_prompt = ""
uncond_prompts = [uncond_prompt] if uncond_prompt else None
device = 'cpu'
strength = 0.8
do_cfg = True
cfg_scale = 7.5
height = 512
width = 512
sampler = "k_lms"
n_inference_steps = 20
use_seed = False
seed = 42 if use_seed else None

# Generate the image
image = pipeline.generate(prompts=prompts, uncond_prompts=uncond_prompts,
                  input_images=[], strength=strength,
                  do_cfg=do_cfg, cfg_scale=cfg_scale,
                  height=height, width=width, sampler=sampler,
                  n_inference_steps=n_inference_steps, seed=seed,
                  models=models, device=device, idle_device='cpu')[0]

# ✅ Convert to a PIL Image if needed
image_pil = Image.fromarray(np.asarray(image))

# ✅ Create a folder to save the image
output_folder = "generated_images"
os.makedirs(output_folder, exist_ok=True)

# ✅ Create a filename (can include the prompt or timestamp)
filename = os.path.join(output_folder, "cloudy_london.png")

# ✅ Save the image
image_pil.save(filename)

print(f"Image saved at: {filename}")


100%|██████████| 20/20 [02:59<00:00,  8.99s/it]


Image saved at: generated_images\cloudy_london.png
