# Endless Image Replication

### Context

In this blog, I play around with OpenAI's GPT-5 image generation capabilities. Beginning with a photo of my dog, I repeatedly give the prompt *"create a replica of this image. don't change a thing"*, feeding the output of each iteration as the input of the next.

<img src="blog_cover.png" width="100%"/>

I was inspired by this [Reddit post](https://www.reddit.com/r/ChatGPT/comments/1n8dung/chatgpt_prompted_to_create_the_exact_replica_of/), where a user shows how an image of a person became completely unrecognizable after the 74th iteration! My experiment didn't go that far though, yielding different (and a little more underwhelming) results.

<img src="reddit.gif" width="50%"/>

### Results

I performed 11 iterations of the prompt using the GPT-5 model. Here's an animation of my results:

<img src="dogresult.gif" width="50%"/>

The first iteration was extremely good -- the original image is on the left and the output is on the right. I initially had high hopes, since there are no noticeable differences I can spot when seeing the images side by side. 

<div style="display: flex; align-items: flex-start; margin: 20px 0;">
  <!-- First image + caption -->
  <figure style="margin: 0; flex: 0 0 auto; margin-right: 28px; text-align: center;">
    <img src="1.JPG" alt="Original image"
         style="display: block; height: 400px; width: auto; border-radius: 8px;">
  </figure>

  <!-- Second image + caption -->
  <figure style="margin: 0; flex: 0 0 auto; text-align: center;">
    <img src="2.png" alt="First output"
         style="display: block; height: 400px; width: auto; border-radius: 8px;">
  </figure>
</div>

However, it went quickly downhill after that. The biggest jump seemed to be within the third and fourth images, where the fourth image suddenly gained lots of noise and a higher contrast/sharpness.

<div style="display: flex; align-items: flex-start; margin: 20px 0;">
  <!-- First image + caption -->
  <figure style="margin: 0; flex: 0 0 auto; margin-right: 28px; text-align: center;">
    <img src="3.png" alt="Third image"
         style="display: block; height: 400px; width: auto; border-radius: 8px;">
  </figure>

  <!-- Second image + caption -->
  <figure style="margin: 0; flex: 0 0 auto; text-align: center;">
    <img src="4.png" alt="Fourth image"
         style="display: block; height: 400px; width: auto; border-radius: 8px;">
  </figure>
</div>

After that, it seemed to hit a plateau where the image became black and white and extremely noisy, with the outline of my dog still slightly distinguishable. This was the final photo after 11 rounds:

<figure style="margin: 0; flex: 0 0 auto; margin-right: 28px; text-align: center;">
    <img src="11.png" alt="Final image"
         style="display: block; height: 400px; width: auto; border-radius: 8px;">
  </figure>

### Reflections

I couldn't find any detailed and definitive source online regarding how the GPT-5 model generates images, but I'm assuming what happened here is that some noise/mismatches got introduced at some stage, which then got amplified as the iterations continued. According to an article on [The Verge](https://www.theverge.com/openai/635118/chatgpt-sora-ai-image-generation-chatgpt), the GPT-4o model uses an autoregressive approach (not diffusion), where it generates an image token by token, just like text. Instead of starting with random noise, like with diffusion, it predicts a sequence of image tokens that make up the image. Because of this, I'm surprised that the model introduced so much static/noise and didn't veer off the path like the Reddit example. 

In fact, I even fed the final image into GPT-5 and asked if it could identify what it was. It said: "This is an image of a dog lying in the grass. The photo appears to have been processed with a strong filter or edge-detection effect, which makes it look like a high-contrast sketch or engraving. You can still make out the dog’s head, body, and the surrounding grass and trees in the background, though the details are stylized." Therefore, it's puzzling as to why it didn't retain a more obvious "dog token" in its image generation process. 

To push the model a bit further, I asked it to "generate an image of what you think the original photo looked like." This was what it generated -- pretty good!

(From left to right: the original image, the final image, and the reconstructed image)

<div style="display: flex; align-items: flex-start; margin: 20px 0;">
  
  <figure style="margin: 0; flex: 0 0 auto; margin-right: 28px; text-align: center;">
    <img src="1.JPG" alt="Original image"
         style="display: block; height: 400px; width: auto; border-radius: 8px;">
  </figure>

  
  <figure style="margin: 0; flex: 0 0 auto; margin-right: 28px; text-align: center;">
    <img src="11.png" alt="Last output"
         style="display: block; height: 400px; width: auto; border-radius: 8px;">
  </figure>

   <figure style="margin: 0; flex: 0 0 auto; text-align: center;">
    <img src="gptreconstruction.png" alt="Reconstructed output"
         style="display: block; height: 400px; width: auto; border-radius: 8px;">
  </figure>
</div>