-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QUESTION] [BEGINNER] How to save image from 4d tensor? generating plain noise. #80
Comments
You're missing the part "use a large dataset and a GPU for many hours to do
a lot of forward, backward and optimization passes"
There is no pretrained model yet
…On Tue, May 10, 2022, 10:14 dani3lh00ps ***@***.***> wrote:
Hi, I am running the following code:
import torch
from dalle2_pytorch import DALLE2, DiffusionPriorNetwork, DiffusionPrior, Unet, Decoder, OpenAIClipAdapter
# openai pretrained clip - defaults to ViT/B-32
clip = OpenAIClipAdapter()
# mock data
text = torch.randint(0, 49408, (4, 256)).cuda()
images = torch.randn(4, 3, 256, 256).cuda()
# prior networks (with transformer)
prior_network = DiffusionPriorNetwork(
dim = 512,
depth = 6,
dim_head = 64,
heads = 8
).cuda()
diffusion_prior = DiffusionPrior(
net = prior_network,
clip = clip,
timesteps = 100,
cond_drop_prob = 0.2
).cuda()
loss = diffusion_prior(text, images)
loss.backward()
# do above for many steps ...
# decoder (with unet)
unet1 = Unet(
dim = 128,
image_embed_dim = 512,
cond_dim = 128,
channels = 3,
dim_mults=(1, 2, 4, 8)
).cuda()
unet2 = Unet(
dim = 16,
image_embed_dim = 512,
cond_dim = 128,
channels = 3,
dim_mults = (1, 2, 4, 8, 16)
).cuda()
decoder = Decoder(
unet = (unet1, unet2),
image_sizes = (128, 256),
clip = clip,
timesteps = 100,
image_cond_drop_prob = 0.1,
text_cond_drop_prob = 0.5,
condition_on_text_encodings = False # set this to True if you wish to condition on text during training and sampling
).cuda()
for unet_number in (1, 2):
loss = decoder(images, unet_number = unet_number) # this can optionally be decoder(images, text) if you wish to condition on the text encodings as well, though it was hinted in the paper it didn't do much
loss.backward()
# do above for many steps
dalle2 = DALLE2(
prior = diffusion_prior,
decoder = decoder
)
generating images:
images = dalle2(
['a butterfly trying to escape a tornado'],
cond_scale = 2. # classifier free guidance strength (> 1 would strengthen the condition)
)
and trying to save:
from torchvision.utils import save_image
save_image(images[0], 'img.png')
but the img.png is just plain noise... what am I missing here? can anyone
please tell me. I just want to try out the code, I am new to ML.
—
Reply to this email directly, view it on GitHub
<#80>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAR437R2HWRZRCQBO7PEWB3VJILFZANCNFSM5VQXHNDQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
okay, so given we use clip model from openai, |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi, I am running the following code:
generating images:
and trying to save:
but the img.png is just plain noise... what am I missing here? can anyone please tell me. I just want to try out the code, I am new to ML.
The text was updated successfully, but these errors were encountered: