Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

While running the clip_guided notebook in CPU mode I get: "RuntimeError - Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.FloatTensor instead" #28

Closed
illtellyoulater opened this issue Feb 25, 2022 · 8 comments

Comments

@illtellyoulater
Copy link

When I run clip_guided notebook in CPU mode, I get the following error at the "Sample from the base model" cell:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_9272/4093479580.py in <module>
     20 # Sample from the base model.
     21 model.del_cache()
---> 22 samples = diffusion.p_sample_loop(
     23     model,
     24     (batch_size, 3, options["image_size"], options["image_size"]),

c:\users\alf\downloads\glide-text2im\glide_text2im\gaussian_diffusion.py in p_sample_loop(self, model, shape, noise, clip_denoised, denoised_fn, cond_fn, model_kwargs, device, progress)
    387         """
    388         final = None
--> 389         for sample in self.p_sample_loop_progressive(
    390             model,
    391             shape,

c:\users\alf\downloads\glide-text2im\glide_text2im\gaussian_diffusion.py in p_sample_loop_progressive(self, model, shape, noise, clip_denoised, denoised_fn, cond_fn, model_kwargs, device, progress)
    439             t = th.tensor([i] * shape[0], device=device)
    440             with th.no_grad():
--> 441                 out = self.p_sample(
    442                     model,
    443                     img,

c:\users\alf\downloads\glide-text2im\glide_text2im\gaussian_diffusion.py in p_sample(self, model, x, t, clip_denoised, denoised_fn, cond_fn, model_kwargs)
    351         )  # no noise when t == 0
    352         if cond_fn is not None:
--> 353             out["mean"] = self.condition_mean(cond_fn, out, x, t, model_kwargs=model_kwargs)
    354         sample = out["mean"] + nonzero_mask * th.exp(0.5 * out["log_variance"]) * noise
    355         return {"sample": sample, "pred_xstart": out["pred_xstart"]}

c:\users\alf\downloads\glide-text2im\glide_text2im\respace.py in condition_mean(self, cond_fn, *args, **kwargs)
     95 
     96     def condition_mean(self, cond_fn, *args, **kwargs):
---> 97         return super().condition_mean(self._wrap_model(cond_fn), *args, **kwargs)
     98 
     99     def condition_score(self, cond_fn, *args, **kwargs):

c:\users\alf\downloads\glide-text2im\glide_text2im\gaussian_diffusion.py in condition_mean(self, cond_fn, p_mean_var, x, t, model_kwargs)
    287         This uses the conditioning strategy from Sohl-Dickstein et al. (2015).
    288         """
--> 289         gradient = cond_fn(x, t, **model_kwargs)
    290         new_mean = p_mean_var["mean"].float() + p_mean_var["variance"] * gradient.float()
    291         return new_mean

c:\users\alf\downloads\glide-text2im\glide_text2im\respace.py in __call__(self, x, ts, **kwargs)
    122         new_ts_2 = map_tensor[ts.ceil().long()]
    123         new_ts = th.lerp(new_ts_1, new_ts_2, frac)
--> 124         return self.model(x, new_ts, **kwargs)

c:\users\alf\downloads\glide-text2im\glide_text2im\clip\model_creation.py in cond_fn(x, t, grad_scale, **kwargs)
     57             with torch.enable_grad():
     58                 x_var = x.detach().requires_grad_(True)
---> 59                 z_i = self.image_embeddings(x_var, t)
     60                 loss = torch.exp(self.logit_scale) * (z_t * z_i).sum()
     61                 grad = torch.autograd.grad(loss, x_var)[0].detach()

c:\users\alf\downloads\glide-text2im\glide_text2im\clip\model_creation.py in image_embeddings(self, images, t)
     47 
     48     def image_embeddings(self, images: torch.Tensor, t: torch.Tensor) -> torch.Tensor:
---> 49         z_i = self.image_encoder((images + 1) * 127.5, t)
     50         return z_i / (torch.linalg.norm(z_i, dim=-1, keepdim=True) + 1e-12)
     51 

~\.conda\envs\glide-text2im\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

c:\users\alf\downloads\glide-text2im\glide_text2im\clip\encoders.py in forward(self, image, timesteps, return_probe_features)
    483     ) -> torch.Tensor:
    484         n_batch = image.shape[0]
--> 485         h = self.blocks["input"](image, t=timesteps)
    486 
    487         for i in range(self.n_xf_blocks):

~\.conda\envs\glide-text2im\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

c:\users\alf\downloads\glide-text2im\glide_text2im\clip\encoders.py in forward(self, x, t)
    124             self.pred_state[None, None].expand(x.shape[0], -1, -1)
    125             if self.n_timestep == 0
--> 126             else F.embedding(cast(torch.Tensor, t), self.w_t)[:, None]
    127         )
    128         x = torch.cat((sot, x), dim=1) + self.w_pos[None]

~\.conda\envs\glide-text2im\lib\site-packages\torch\nn\functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
   1850         # remove once script supports set_grad_enabled
   1851         _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> 1852     return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
   1853 
   1854 

RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.FloatTensor instead (while checking arguments for embedding)

Can anyone help?
Thanks!

@woctezuma
Copy link

woctezuma commented Mar 4, 2022

Not sure what is happening here, but you should try to aim for a GPU if possible.

See the comment in the notebook:

# This notebook supports both CPU and GPU.
# On CPU, generating one sample may take on the order of 20 minutes.
# On a GPU, it should be under a minute.

CPU mode takes 20 times more computation time than GPU mode.

@illtellyoulater
Copy link
Author

I know, but my current GPU doesn't have enough VRAM... that's why I was running in CPU mode.
In my case I'm getting a new GPU soon, but think it would still be cool if this could still work on CPU...

@woctezuma
Copy link

Yes, sure. In the meantime, try to use a free GPU on Google Colab.

@illtellyoulater
Copy link
Author

@woctezuma I finally got hold of a new GPU with 6 GB VRAM... so I am now running again the clip_guided notebook in GPU mode, but I am seeing exactly the same error I documented above...

@illtellyoulater
Copy link
Author

Thanks! I saw them already but I don't have the necessary ML & rel. libs knowledge to properly make use of them...
I also already tried kind of blindly playing with those types and their conversion, but without success...
Honestly, I see it very hard I can come up with something useful just by myself... 🤷‍♂️

@woctezuma
Copy link

woctezuma commented Mar 8, 2022

It could be just a simple change of this line:

sot = (
self.pred_state[None, None].expand(x.shape[0], -1, -1)
if self.n_timestep == 0
else F.embedding(cast(torch.Tensor, t), self.w_t)[:, None]
)

You could try to replace:

F.embedding(cast(torch.Tensor, t), self.w_t)

with either:

F.embedding(cast(torch.Tensor, t.long()), self.w_t)

or:

F.embedding(cast(torch.Tensor, t).long(), self.w_t)

@illtellyoulater
Copy link
Author

illtellyoulater commented Mar 8, 2022

Ok, thanks! Now at least in CPU mode it works!
In GPU mode a completely black image is generated (at some points tensors become NaN), but I'll open another thread for that, as it must be caused by a different problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants