# Experimental Setup

In this experiment I'll be using the [Stable Diffusion 2](https://huggingface.co/stabilityai/stable-diffusion-2) model to generate an image and the [Blip Large](https://huggingface.co/Salesforce/blip-image-captioning-large) model to caption to the images, and then using the caption as the next input prompt to Stable Diffusion. Ill then run this for N cycles, looking at the semantic decay that occurs over all and between cycles.

In [1]:
#import relevant packages
import pandas as pd
import os
import random
import numpy as np
from diffusers import StableDiffusionPipeline, EulerDiscreteScheduler
import torch
from PIL import Image
import transformers
from transformers import BlipProcessor, BlipForConditionalGeneration

### starting with some Transformers weirdness

From the [docs](https://huggingface.co/docs/diffusers/optimization/mps): "We recommend to “prime” the pipeline using an additional one-time pass through it. This is a temporary workaround for a weird issue we have detected: the first inference pass produces slightly different results than subsequent ones. You only need to do this pass once, and it’s ok to use just one inference step and discard the result."

In [2]:
#set up the stable diffusion pipeline

diffusion_model_id = 'stabilityai/stable-diffusion-2'

scheduler = EulerDiscreteScheduler.from_pretrained(diffusion_model_id, subfolder = 'scheduler')

sd_pipe = StableDiffusionPipeline.from_pretrained(diffusion_model_id, scheduler = scheduler)

sd_pipe.to('mps')

sd_pipe.enable_attention_slicing()


#warm-up prompt

initial_prompt = 'An oil painting of a pirate ship made of Swiss cheese'

_ = sd_pipe(initial_prompt, num_inference_steps = 1) 

#warmup_image = sd_pipe(initial_prompt).images[0]

Fetching 13 files:   0%|          | 0/13 [00:00<?, ?it/s]

Cannot initialize model with low cpu memory usage because `accelerate` was not found in the environment. Defaulting to `low_cpu_mem_usage=False`. It is strongly recommended to install `accelerate` for faster and less memory-intense model loading. You can do so with: 
```
pip install accelerate
```
.


  0%|          | 0/1 [00:00<?, ?it/s]

  step_index = (self.timesteps == timestep).nonzero().item()


In [4]:
#initialize the experiment         
if not os.path.exists('images'):
    os.mkdir('images')
    
random.seed(42)
torch.manual_seed(42)
    
cycles = 10

first_real_prompt = 'A brown and white corgi is eating a large watermelon while sitting on a towel at the beach'

prompts = np.array([first_real_prompt])

processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-large")
captioning_model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-large")

In [5]:
for i in range(cycles):
    prompt = prompts[i]
    image = sd_pipe(prompt).images[0]
    
    image_path = f'images/image_{i}.jpeg'
    image.save(image_path, 'JPEG')
    
    raw_image = Image.open(image_path).convert('RGB')
    
    caption_inputs = processor(raw_image, return_tensors = 'pt')
        
    blip_out = captioning_model.generate(**caption_inputs)
    
    caption = processor.decode(blip_out[0], skip_special_tokens = True)
    
    prompts = np.append(prompts, [caption])

  0%|          | 0/50 [00:00<?, ?it/s]



  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

In [7]:
data = pd.DataFrame(prompts, columns = ['prompts'])
data.to_csv('data.csv')