Colab notebook #2

woctezuma · 2021-12-21T09:37:06Z

I have slightly changed the structure of your text2im notebook so that:

it runs straight away on Google Colab with GPU toggled ON,
it is easier for everyone to try different prompts.

Run text2im.ipynb

Reference: https://github.com/woctezuma/glide-text2im-colab

The text was updated successfully, but these errors were encountered:

loretoparisi · 2021-12-21T19:07:06Z

@woctezuma thanks!!! Is the base the only checkpoint available for the base diffusion model? I cannot reproduce the same results as showed in paper images "Figure 1" with the same text prompt.
In the references I can see also CLIP guided diffusion models for both 2566x256 and 512x512.

Crowson, K. Clip guided diffusion hq 256x256
https://colab.research.google.com/drive/12a_Wrfi2_gwwAuN3VvMTwVMz9TfqctNj,
2021a.Crowson,K.Clip guided diffusion 512x512,secondarymodelmethod
https://twitter.com/RiversHaveWings/status/1462859669454536711, 2021b.

woctezuma · 2021-12-21T19:27:40Z

I cannot reproduce the same results as showed in paper images "Figure 1" with the same text prompt.

Unfortunately, this is normal, because the publicly available model:

is smaller, as it has roughly 10x fewer parameters,
was trained on a filtered dataset.

You should get outputs similar to the third row of Figure 9.

From a user perspective, the main benefit of GLIDE is that it is much faster than the CLIP-guided methods which I have tried so far.

Is the base the only checkpoint available for the base diffusion model?

I think so. From what I can see in the code below, there are 6 checkpoints:

two for classifier-free guidance (sampling and upsampling),
two for inpainting (sampling and upsampling),
two for CLIP (text encoding and image encoding).

glide-text2im/glide_text2im/download.py

Lines 10 to 17 in 742510e

    
           MODEL_PATHS = { 
        
               "base": "https://openaipublic.blob.core.windows.net/diffusion/dec-2021/base.pt", 
        
               "upsample": "https://openaipublic.blob.core.windows.net/diffusion/dec-2021/upsample.pt", 
        
               "base-inpaint": "https://openaipublic.blob.core.windows.net/diffusion/dec-2021/base_inpaint.pt", 
        
               "upsample-inpaint": "https://openaipublic.blob.core.windows.net/diffusion/dec-2021/upsample_inpaint.pt", 
        
               "clip/image-enc": "https://openaipublic.blob.core.windows.net/diffusion/dec-2021/clip_image_enc.pt", 
        
               "clip/text-enc": "https://openaipublic.blob.core.windows.net/diffusion/dec-2021/clip_text_enc.pt", 
        
           }

woctezuma · 2021-12-21T23:28:16Z

I see the nice following commits:

146bd9c add install command to notebooks -> git and pip at the start of the notebook,
f468908 add colab links -> Colab badges for the links in the README,
9cc8e56 colab GPU backend -> GPU support toggled ON.

loretoparisi · 2021-12-22T18:40:30Z

@woctezuma thanks! I can see that the sampling part is slightly different than yours, adding the model_fn function to the sample loop. Is this related to the fact that they just do free guidance (cond_fn=None) rather than clip guidance like in your colab? Also, I have tried to combine the last two, and results seems to be better, like if clip guidance, for the small model introduces too much randomness. Any idea why?

# Create the text tokens to feed to the model.
tokens = model.tokenizer.encode(prompt)
tokens, mask = model.tokenizer.padded_tokens_and_mask(
    tokens, options['text_ctx']
)

# Create the classifier-free guidance tokens (empty)
full_batch_size = batch_size * 2
uncond_tokens, uncond_mask = model.tokenizer.padded_tokens_and_mask(
    [], options['text_ctx']
)

# Pack the tokens together into model kwargs.
model_kwargs = dict(
    tokens=th.tensor(
        [tokens] * batch_size + [uncond_tokens] * batch_size, device=device
    ),
    mask=th.tensor(
        [mask] * batch_size + [uncond_mask] * batch_size,
        dtype=th.bool,
        device=device,
    ),
)

# Create a classifier-free guidance sampling function
def model_fn(x_t, ts, **kwargs):
    half = x_t[: len(x_t) // 2]
    combined = th.cat([half, half], dim=0)
    model_out = model(combined, ts, **kwargs)
    eps, rest = model_out[:, :3], model_out[:, 3:]
    cond_eps, uncond_eps = th.split(eps, len(eps) // 2, dim=0)
    half_eps = uncond_eps + guidance_scale * (cond_eps - uncond_eps)
    eps = th.cat([half_eps, half_eps], dim=0)
    return th.cat([eps, rest], dim=1)

# Sample from the base model.
model.del_cache()
samples = diffusion.p_sample_loop(
    model_fn,
    (full_batch_size, 3, options["image_size"], options["image_size"]),
    device=device,
    clip_denoised=True,
    progress=True,
    model_kwargs=model_kwargs,
    cond_fn=None,
)[:batch_size]
model.del_cache()

# Show the output
show_images(samples)

woctezuma · 2021-12-22T19:03:41Z

I can see that the sampling part is slightly different than yours, adding the model_fn function to the sample loop. Is this related to the fact that they just do free guidance (cond_fn=None) rather than clip guidance like in your colab?

To clarify any confusion:

when cond_fn is not None, I assume you are looking at the CLIP-guided approach:
https://github.com/openai/glide-text2im/blob/main/notebooks/clip_guided.ipynb
the notebook linked in my first post is the classifier-free guidance, with cond_fn=None, copied from:
https://github.com/openai/glide-text2im/blob/main/notebooks/text2im.ipynb

Unless I am missing something, the model_fn function is added to the sample loop in both notebooks called text2im.ipynb.

# Sample from the base model.
model.del_cache()
samples = diffusion.p_sample_loop(
    model_fn,
    (full_batch_size, 3, options["image_size"], options["image_size"]),
    device=device,
    clip_denoised=True,
    progress=True,
    model_kwargs=model_kwargs,
    cond_fn=None,
)[:batch_size]
model.del_cache()

Also, I have tried to combine the last two, and results seems to be better, like if clip guidance, for the small model introduces too much randomness. Any idea why?

I need to see the diff of what you did to understand better.

I would be glad to test this and see the results, if they are better. :) The black cat with white paws looks nice. 👍

loretoparisi · 2021-12-22T19:24:19Z

Thanks! I have two versions, this one

samples = diffusion.p_sample_loop(
    model,
    (batch_size, 3, options["image_size"], options["image_size"]),
    device=device,
    clip_denoised=True,
    progress=True,
    model_kwargs=model_kwargs,
    cond_fn=cond_fn,
)

where

cond_fn = clip_model.cond_fn([prompt] * batch_size, guidance_scale)

and in the latest colab from the repo

samples = diffusion.p_sample_loop(
    model_fn,
    (full_batch_size, 3, options["image_size"], options["image_size"]),
    device=device,
    clip_denoised=True,
    progress=True,
    model_kwargs=model_kwargs,
    cond_fn=None,
)[:batch_size]

with cond_fn=None and as model_fn

def model_fn(x_t, ts, **kwargs):
    half = x_t[: len(x_t) // 2]
    combined = th.cat([half, half], dim=0)
    model_out = model(combined, ts, **kwargs)
    eps, rest = model_out[:, :3], model_out[:, 3:]
    cond_eps, uncond_eps = th.split(eps, len(eps) // 2, dim=0)
    half_eps = uncond_eps + guidance_scale * (cond_eps - uncond_eps)
    eps = th.cat([half_eps, half_eps], dim=0)
    return th.cat([eps, rest], dim=1)

woctezuma mentioned this issue Dec 22, 2021

How to get the results closer to what is shown in the paper? #5

Closed

woctezuma mentioned this issue Dec 27, 2021

some questions of the result picture #11

Closed

woctezuma mentioned this issue Feb 11, 2022

Thanks! #21

Open

woctezuma mentioned this issue Mar 10, 2022

Larger batch size to generate images in text2im.ipynb? #29

Open

woctezuma mentioned this issue Apr 15, 2022

disappointed, looks the model is poor for unseen data #35

Closed

woctezuma closed this as completed Apr 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Colab notebook #2

Colab notebook #2

woctezuma commented Dec 21, 2021 •

edited

loretoparisi commented Dec 21, 2021 •

edited

woctezuma commented Dec 21, 2021 •

edited

woctezuma commented Dec 21, 2021 •

edited

loretoparisi commented Dec 22, 2021 •

edited

woctezuma commented Dec 22, 2021 •

edited

loretoparisi commented Dec 22, 2021

Colab notebook #2

Colab notebook #2

Comments

woctezuma commented Dec 21, 2021 • edited

loretoparisi commented Dec 21, 2021 • edited

woctezuma commented Dec 21, 2021 • edited

woctezuma commented Dec 21, 2021 • edited

loretoparisi commented Dec 22, 2021 • edited

woctezuma commented Dec 22, 2021 • edited

loretoparisi commented Dec 22, 2021

woctezuma commented Dec 21, 2021 •

edited

loretoparisi commented Dec 21, 2021 •

edited

woctezuma commented Dec 21, 2021 •

edited

woctezuma commented Dec 21, 2021 •

edited

loretoparisi commented Dec 22, 2021 •

edited

woctezuma commented Dec 22, 2021 •

edited