[QUESTION] [BEGINNER] How to save image from 4d tensor? generating plain noise. #80

dani3lh00ps · 2022-05-10T08:14:09Z

Hi, I am running the following code:

import torch
from dalle2_pytorch import DALLE2, DiffusionPriorNetwork, DiffusionPrior, Unet, Decoder, OpenAIClipAdapter

# openai pretrained clip - defaults to ViT/B-32

clip = OpenAIClipAdapter()

# mock data

text = torch.randint(0, 49408, (4, 256)).cuda()
images = torch.randn(4, 3, 256, 256).cuda()

# prior networks (with transformer)

prior_network = DiffusionPriorNetwork(
    dim = 512,
    depth = 6,
    dim_head = 64,
    heads = 8
).cuda()

diffusion_prior = DiffusionPrior(
    net = prior_network,
    clip = clip,
    timesteps = 100,
    cond_drop_prob = 0.2
).cuda()

loss = diffusion_prior(text, images)
loss.backward()

# do above for many steps ...

# decoder (with unet)

unet1 = Unet(
    dim = 128,
    image_embed_dim = 512,
    cond_dim = 128,
    channels = 3,
    dim_mults=(1, 2, 4, 8)
).cuda()

unet2 = Unet(
    dim = 16,
    image_embed_dim = 512,
    cond_dim = 128,
    channels = 3,
    dim_mults = (1, 2, 4, 8, 16)
).cuda()

decoder = Decoder(
    unet = (unet1, unet2),
    image_sizes = (128, 256),
    clip = clip,
    timesteps = 100,
    image_cond_drop_prob = 0.1,
    text_cond_drop_prob = 0.5,
    condition_on_text_encodings = False  # set this to True if you wish to condition on text during training and sampling
).cuda()

for unet_number in (1, 2):
    loss = decoder(images, unet_number = unet_number) # this can optionally be decoder(images, text) if you wish to condition on the text encodings as well, though it was hinted in the paper it didn't do much
    loss.backward()

# do above for many steps

dalle2 = DALLE2(
    prior = diffusion_prior,
    decoder = decoder
)

generating images:

images = dalle2(
    ['a butterfly trying to escape a tornado'],
    cond_scale = 2. # classifier free guidance strength (> 1 would strengthen the condition)
)

and trying to save:

from torchvision.utils import save_image
save_image(images[0], 'img.png')

but the img.png is just plain noise... what am I missing here? can anyone please tell me. I just want to try out the code, I am new to ML.

The text was updated successfully, but these errors were encountered:

rom1504 · 2022-05-10T08:20:37Z

You're missing the part "use a large dataset and a GPU for many hours to do a lot of forward, backward and optimization passes" There is no pretrained model yet

…

On Tue, May 10, 2022, 10:14 dani3lh00ps ***@***.***> wrote: Hi, I am running the following code: import torch from dalle2_pytorch import DALLE2, DiffusionPriorNetwork, DiffusionPrior, Unet, Decoder, OpenAIClipAdapter # openai pretrained clip - defaults to ViT/B-32 clip = OpenAIClipAdapter() # mock data text = torch.randint(0, 49408, (4, 256)).cuda() images = torch.randn(4, 3, 256, 256).cuda() # prior networks (with transformer) prior_network = DiffusionPriorNetwork( dim = 512, depth = 6, dim_head = 64, heads = 8 ).cuda() diffusion_prior = DiffusionPrior( net = prior_network, clip = clip, timesteps = 100, cond_drop_prob = 0.2 ).cuda() loss = diffusion_prior(text, images) loss.backward() # do above for many steps ... # decoder (with unet) unet1 = Unet( dim = 128, image_embed_dim = 512, cond_dim = 128, channels = 3, dim_mults=(1, 2, 4, 8) ).cuda() unet2 = Unet( dim = 16, image_embed_dim = 512, cond_dim = 128, channels = 3, dim_mults = (1, 2, 4, 8, 16) ).cuda() decoder = Decoder( unet = (unet1, unet2), image_sizes = (128, 256), clip = clip, timesteps = 100, image_cond_drop_prob = 0.1, text_cond_drop_prob = 0.5, condition_on_text_encodings = False # set this to True if you wish to condition on text during training and sampling ).cuda() for unet_number in (1, 2): loss = decoder(images, unet_number = unet_number) # this can optionally be decoder(images, text) if you wish to condition on the text encodings as well, though it was hinted in the paper it didn't do much loss.backward() # do above for many steps dalle2 = DALLE2( prior = diffusion_prior, decoder = decoder ) generating images: images = dalle2( ['a butterfly trying to escape a tornado'], cond_scale = 2. # classifier free guidance strength (> 1 would strengthen the condition) ) and trying to save: from torchvision.utils import save_image save_image(images[0], 'img.png') but the img.png is just plain noise... what am I missing here? can anyone please tell me. I just want to try out the code, I am new to ML. — Reply to this email directly, view it on GitHub <#80>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAR437R2HWRZRCQBO7PEWB3VJILFZANCNFSM5VQXHNDQ> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

dani3lh00ps · 2022-05-10T08:45:11Z

okay, so given we use clip model from openai,
we still need to train the prior and decoder...

lucidrains closed this as completed May 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION] [BEGINNER] How to save image from 4d tensor? generating plain noise. #80

[QUESTION] [BEGINNER] How to save image from 4d tensor? generating plain noise. #80

dani3lh00ps commented May 10, 2022

rom1504 commented May 10, 2022 via email

dani3lh00ps commented May 10, 2022

[QUESTION] [BEGINNER] How to save image from 4d tensor? generating plain noise. #80

[QUESTION] [BEGINNER] How to save image from 4d tensor? generating plain noise. #80

Comments

dani3lh00ps commented May 10, 2022

rom1504 commented May 10, 2022 via email

dani3lh00ps commented May 10, 2022