In [1]:
!pip install -q transformers diffusers accelerate safetensors --upgrade 

This command is used to install or upgrade several Python packages using pip, the Python package installer. Let's break it down:

1. `!pip`: The exclamation mark `!` is typically used in Jupyter notebooks or Google Colab to run shell commands. In a regular Python script or command line, you would omit the `!`.

2. `install`: This tells pip to install the packages.

3. `-q`: This flag stands for "quiet". It reduces the output verbosity of pip, showing less information during the installation process.

4. The packages being installed or upgraded are:
   - `transformers`: A popular library for natural language processing tasks.
   - `diffusers`: A library for state-of-the-art diffusion models in computer vision and audio.
   - `accelerate`: A library to easily write distributed deep learning code.
   - `safetensors`: A library for serializing and deserializing tensors safely.

5. `--upgrade`: This flag tells pip to upgrade these packages to their latest versions if they're already installed.

This command is commonly used in machine learning and deep learning projects, particularly those involving natural language processing or image generation tasks. It sets up a development environment with the latest versions of these essential libraries.


In [2]:
from diffusers import StableDiffusionXLPipeline
import torch
import itertools
import cv2
import pandas as pd
from random import sample


The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.


0it [00:00, ?it/s]

This code is importing several Python libraries and modules. Let's break it down line by line:

1. `from diffusers import StableDiffusionXLPipeline`
   This line imports the `StableDiffusionXLPipeline` class from the `diffusers` library. Stable Diffusion XL is an advanced text-to-image model, and this pipeline provides an easy way to use it for image generation.

2. `import torch`
   This imports PyTorch, a popular deep learning library. It's likely being used here because Stable Diffusion models typically run on GPU and PyTorch provides GPU acceleration.

3. `import itertools`
   This imports the `itertools` module from Python's standard library. It provides various functions for working with iterators, which can be useful for tasks like generating combinations or permutations.

4. `import cv2`
   This imports OpenCV (cv2), a library for computer vision tasks. It's commonly used for image processing, such as reading, writing, or manipulating images.

5. `import pandas as pd`
   This imports the pandas library and aliases it as `pd`. Pandas is widely used for data manipulation and analysis, particularly with structured data like CSV files or SQL tables.

6. `from random import sample`
   This imports the `sample` function from Python's `random` module. This function is used to generate a random sample from a sequence.

In [3]:
class CFG:
    model = "stabilityai/stable-diffusion-xl-base-1.0"
    howmany = 1

In [4]:
# initialize the pipe
pipe = StableDiffusionXLPipeline.from_pretrained(
    CFG.model, 
    torch_dtype=torch.float16, 
    variant="fp16", use_safetensors=True).to("cuda")

model_index.json:   0%|          | 0.00/609 [00:00<?, ?B/s]

Fetching 19 files:   0%|          | 0/19 [00:00<?, ?it/s]

text_encoder/config.json:   0%|          | 0.00/565 [00:00<?, ?B/s]

tokenizer/merges.txt:   0%|          | 0.00/525k [00:00<?, ?B/s]

tokenizer/special_tokens_map.json:   0%|          | 0.00/472 [00:00<?, ?B/s]

model.fp16.safetensors:   0%|          | 0.00/246M [00:00<?, ?B/s]

tokenizer/tokenizer_config.json:   0%|          | 0.00/737 [00:00<?, ?B/s]

text_encoder_2/config.json:   0%|          | 0.00/575 [00:00<?, ?B/s]

model.fp16.safetensors:   0%|          | 0.00/1.39G [00:00<?, ?B/s]

scheduler/scheduler_config.json:   0%|          | 0.00/479 [00:00<?, ?B/s]

tokenizer_2/special_tokens_map.json:   0%|          | 0.00/460 [00:00<?, ?B/s]

unet/config.json:   0%|          | 0.00/1.68k [00:00<?, ?B/s]

tokenizer_2/vocab.json:   0%|          | 0.00/1.06M [00:00<?, ?B/s]

diffusion_pytorch_model.fp16.safetensors:   0%|          | 0.00/5.14G [00:00<?, ?B/s]

tokenizer_2/tokenizer_config.json:   0%|          | 0.00/725 [00:00<?, ?B/s]

vae/config.json:   0%|          | 0.00/642 [00:00<?, ?B/s]

diffusion_pytorch_model.fp16.safetensors:   0%|          | 0.00/167M [00:00<?, ?B/s]

diffusion_pytorch_model.fp16.safetensors:   0%|          | 0.00/167M [00:00<?, ?B/s]

Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]

This code initializes a Stable Diffusion XL pipeline for image generation. Let's break it down in detail:

1. `pipe = StableDiffusionXLPipeline.from_pretrained(`
   - This line creates an instance of the StableDiffusionXLPipeline.
   - The `from_pretrained` method is used to load a pre-trained model.

2. `CFG.model,`
   - This is likely a configuration variable (CFG) that specifies which pre-trained model to use.
   - The exact model isn't shown in this snippet, but it would be a string like "stabilityai/stable-diffusion-xl-base-1.0".

3. `torch_dtype=torch.float16,`
   - This sets the data type for the model's parameters to 16-bit floating point (float16).
   - Using float16 reduces memory usage and can speed up computation, especially on GPUs, at a small cost to precision.

4. `variant="fp16",`
   - This specifies that we want to use the 16-bit floating point variant of the model.
   - This is consistent with the `torch_dtype` parameter above.

5. `use_safetensors=True`
   - This tells the pipeline to use the `safetensors` format for loading the model weights.
   - SafeTensors is a safe and fast serialization format, which can help prevent certain types of attacks when loading models.

6. `).to("cuda")`
   - This moves the entire model to the CUDA device (GPU).
   - CUDA is NVIDIA's parallel computing platform and programming model for general computing on GPUs.
   - Moving the model to GPU significantly speeds up the image generation process.

In summary, this code is setting up a Stable Diffusion XL model with the following characteristics:
- It's using a specific pre-trained model (defined in CFG.model).
- It's optimized for memory efficiency and speed by using 16-bit floating point precision.
- It's using the SafeTensors format for added security.
- It's set to run on a CUDA-enabled GPU for faster processing.

This setup is typical for when you want to generate images using Stable Diffusion XL, balancing between performance (by using the GPU and fp16), memory usage (again with fp16), and safety (with SafeTensors).

In [5]:
# prepare the prompts 
gender_list = ['man', 'woman']


origin_list = [ 'North European',   'South American',
                'Middle Eastern', 'South East Asian'
                ]


age_list = ['young', 'middle aged', 'elderly']

profession_list = ['lawyer', 'doctor', 'athlete', 'singer']

In [6]:
totality =  [origin_list, age_list, gender_list, profession_list ]
combo = itertools.product(*totality)
combo_list = []
for f in combo:
    combo_list.append(f)

This code is creating combinations of elements from multiple lists. Let's break it down step by step:

1. `totality = [origin_list, age_list, gender_list, profession_list]`
   - This line creates a list called `totality` that contains four other lists.
   - Each of these lists (origin_list, age_list, gender_list, profession_list) presumably contains different attributes or characteristics.

2. `combo = itertools.product(*totality)`
   - This line uses the `itertools.product()` function to create a Cartesian product of the lists in `totality`.
   - The `*` operator unpacks the `totality` list, passing each of its contained lists as a separate argument to `product()`.
   - The Cartesian product gives all possible combinations where we choose one item from each list.

3. `combo_list = []`
   - This initializes an empty list called `combo_list` to store the combinations.

4. ```python
   for f in combo:
       combo_list.append(f)
   ```
   - This loop iterates over each combination in `combo`.
   - Each combination `f` is appended to `combo_list`.


In [7]:
combo_list[0:3]

[('North European', 'young', 'man', 'lawyer'),
 ('North European', 'young', 'man', 'doctor'),
 ('North European', 'young', 'man', 'athlete')]

In [8]:
prompt_list = []
for (ii, xx) in enumerate(combo_list):
    prom = "High-definition, cinematic, full-body photograph of "  + xx[0] + " " + xx[1] + " " + xx[2] + " " + xx[3]
    prompt_list.append(prom)
    
prompt_list[0:3]

['High-definition, cinematic, full-body photograph of North European young man lawyer',
 'High-definition, cinematic, full-body photograph of North European young man doctor',
 'High-definition, cinematic, full-body photograph of North European young man athlete']

In [9]:
## subset to run faster - otherwise it's 2 * 4 * 3 * 4 = 96 images
prompt_list = sample(prompt_list,3)

for (ii, prompt) in enumerate(prompt_list):
    
    for jj in range(CFG.howmany):
        image = pipe(prompt=prompt).images[0]
        imgname = "img_" + str(ii) + "x" + str(jj) + ".jpg"
        image = image.save(imgname)
        print(prompt)

  0%|          | 0/50 [00:00<?, ?it/s]

High-definition, cinematic, full-body photograph of South American young man singer


  0%|          | 0/50 [00:00<?, ?it/s]

High-definition, cinematic, full-body photograph of North European elderly man doctor


  0%|          | 0/50 [00:00<?, ?it/s]

High-definition, cinematic, full-body photograph of South East Asian young woman lawyer


This code is generating images and saving them. Let's break it down:

1. `prompt_list = sample(prompt_list,20)`
   - This line uses the `sample()` function to randomly select 20 prompts from the original `prompt_list`.
   - This is done to reduce the number of images generated, making the process faster.
   - The comment suggests that without this sampling, there would be 96 prompts (2 * 4 * 3 * 4), likely referring to the combinations created earlier.

2. `for (ii, prompt) in enumerate(prompt_list):`
   - This starts a loop that iterates over the sampled prompts.
   - `enumerate()` is used to get both the index (`ii`) and the prompt itself.

3. `for jj in range(CFG.howmany):`
   - This is a nested loop that generates multiple images for each prompt.
   - `CFG.howmany` is likely a configuration variable determining how many images to generate per prompt.

4. `image = pipe(prompt=prompt).images[0]`
   - This line uses the Stable Diffusion pipeline (`pipe`) to generate an image based on the current prompt.
   - `.images[0]` retrieves the first (and likely only) generated image.

5. `imgname = "img_" + str(ii) + "x" + str(jj) + ".jpg"`
   - This creates a filename for the image, incorporating the prompt index (`ii`) and the image index for that prompt (`jj`).

6. `image = image.save(imgname)`
   - This saves the generated image with the created filename.

7. `print(prompt)`
   - This prints the current prompt, likely for logging or monitoring purposes.


In summary, this code is:
1. Randomly selecting a subset of prompts to work with.
2. For each of these prompts, generating a specified number of images.
3. Saving each generated image with a unique filename.
4. Printing the prompts and separators to track progress.

This approach is useful when you want to generate a diverse set of images based on different text prompts, but don't want to generate images for all possible combinations (which could be time-consuming). The random sampling helps to explore the variety of possible outputs without exhaustively generating every combination.
