option to replace noise during sample #170

deepglugs · 2022-08-08T18:23:24Z

The idea is that it would allow some of the functionality of cond_image without any training. This has been tested by m9 (discord).

Something like:

imagen.sample(texts, noise=an_image_tensor)

lucidrains · 2022-08-08T18:26:45Z

@deepglugs i'm not sure i understand

could you point to the line of code that you are trying to substitute with your own noise. and how does it relate to the conditioning image?

deepglugs · 2022-08-08T18:28:00Z

Afaiu, the cond_image is concatenated onto the noise. This argument would replace the noise altogether with a "starting image" of sorts.

Nodja · 2022-08-08T18:28:47Z

I've seen this before in other diffusion models and works well, it's usually called init_image, having a skip_steps parameter would also be useful, it would skip the first x steps of sampling.

edit: the skip steps would only apply to normal diffusion as elucidated works differently

lucidrains · 2022-08-08T18:28:51Z

@deepglugs ohh got it, but why wouldn't you use the inpainting feature (repaint)? that should work a lot better

lucidrains · 2022-08-08T18:29:58Z

@Nodja ohh cool, i wasn't aware of this technique

you would manually noise the init image and guess at which timestep to start at? that doesn't sound very rigorous

does it work?

lucidrains · 2022-08-08T18:30:38Z

https://github.com/lucidrains/imagen-pytorch#inpainting this should be state of the art for inpainting, if that is what you are trying to achieve

lucidrains · 2022-08-08T18:31:43Z

@Nodja @deepglugs i could add it for exploration purposes i suppose

it is such a young technology, we have no idea what is possible yet

lucidrains · 2022-08-08T18:33:41Z

tell you what, i need to walk Ice Cream and run some errands downtown, but post some of the results that m9 is seeing with this approach (or link me to the paper) and i can start thinking about how to fit in this feature (but only if the results look good). if there is no associated paper, maybe would also require a tiny writeup for the readme explaining how to use it

marunine · 2022-08-08T18:48:03Z

I don't know if there's a paper specifically but I can comment on the approach I took in lieu of inpainting, which I couldn't get working with satisfiable results, unfortunately.

https://github.com/marunine/imagen-pytorch/blob/b36bceb853d3110e92332dacf19952068fda7ae9/imagen_pytorch/elucidated_imagen.py#L348

Adds the initial noise onto an initial image. The sigma min/max let you control the sampling scheduling and how much noise ends up getting added onto the image. This leverages the model's ability to denoise images like it does during training but is more flexible than cond_image because you can scale how much of the initial you take based on your sigma max.

I've tried the skip sampling steps out, but determined it wasn't necessary because of the control elucidated gives you. The regular imagen would need it but I haven't tested it.

lucidrains · 2022-08-08T19:08:44Z

@marunine ohh maru"9"nine! haha yes i know you!

so i think the repaint paper actually does exactly what you converged on doing

they renoise the conditioning image in the unmasked region to the appropriate noise level for a given timestep. then they repeat that a couple times to harmonize (thus re-paint). i think it should be much better than the other types of inpainting techniques out there, if i coded it up correctly

marunine · 2022-08-08T19:11:33Z

I was under the impression that inpaint should be equivalent to this if you pass all zeroes as the mask, but in my experience I ended up getting gray images or noisy images at 1/5 resample loops and equivalent sigma schedule. At the normal sigma schedule it ended up ignoring the initial image altogether, but I suppose that's expected.

I didn't carefully review the code to see if there's a bug somewhere in there, but I agree that it should be more fully functional than what I ended up doing.

lucidrains · 2022-08-08T19:13:55Z

@marunine oh, maybe i misunderstood the purpose here

are you trying to lightly condition the generation by giving it a subliminal image at the very start?

lucidrains · 2022-08-08T19:16:35Z

it sounds like it isn't working anyways, so maybe we should not add it

however, let's definitely reopen an issue for inpainting, if repaint is not working as expected

deepglugs · 2022-08-08T19:21:01Z

it sounds like it isn't working anyways, so maybe we should not add it

Which part isn't working? the init_image?

lucidrains · 2022-08-08T19:22:09Z

@deepglugs

I don't know if there's a paper specifically but I can comment on the approach I took in lieu of inpainting, which I couldn't get working with satisfiable results, unfortunately.

deepglugs · 2022-08-08T19:23:04Z

ah, the in-painting isn't working for this use-case. I see.

marunine · 2022-08-08T19:23:12Z

@deepglugs

I don't know if there's a paper specifically but I can comment on the approach I took in lieu of inpainting, which I couldn't get working with satisfiable results, unfortunately.

I meant that I couldn't get inpainting to work. The init_images method works better than I expected - it can pick up pose, style, etc based on how much noise you apply through the sigma schedule. The text caption can be used to guide what you change on that initial image.

It's a little bit more flexible than the binary mask of the inpainting in that regard, but I think inpainting is the better approach overall provided the implementation works.

lucidrains · 2022-08-08T19:24:10Z

@marunine @deepglugs can you upload some examples of the success you are seeing with init_image technique? seeing is believing

and thanks for letting me know that repaint isn't working that well

marunine · 2022-08-08T19:42:03Z

<images removed for now>

lucidrains · 2022-08-08T19:47:11Z

@marunine ohhh i see, yea, this type of cartoon image is challenging for inpainting. unless one trains with pretty much the whole internet

marunine · 2022-08-08T19:59:17Z

The init image approach shines for style transfer. You can make inpainting work if it understands how to add or subtract parts of the image, but it's probably better to just specify the mask and replace it with noise.

I'll let others comment on their experience with inpainting since their version of the repo is closer to your upstream. Mine is a little out of date and I might have made some mistakes when I backported it as a test.

lucidrains · 2022-08-08T20:01:07Z

@marunine @deepglugs ok! i'll definitely consider it later this week when i get back to ddpms

thank you for sharing this!

Nodja · 2022-08-08T20:30:28Z

The place I saw init_image used previously was the laion logo generator on replicate here, github repo. (Linking old version of replicate as init_skip_fraction seems to be broken in the newer ones.)

Here's a simple demo of what init_image can accomplish, all values left default unless specified. Init image I used is on the left of the screenshot, just something I quickly drew in paint.

Click to see images

No init image

Init image with 0.02 init_skip_fraction (doesn't accept 0)

init_image with 0.1 init_skip_fraction

init_image with 0.2 init_skip_fraction

init_image with 0.3 init_skip_fraction

As you can see the more steps you skip the bigger the influence the init_image has on the final result, to the point that it's mostly a straight up copy at values above 0.5.

lucidrains · 2022-08-08T20:53:53Z

ok I'm convinced! I'll add it later this week with credit to whoever discovered this

lucidrains · 2022-08-09T14:42:17Z

for the example above, is the initial image normalized to -1 to 1 before summing the noise?

Nodja · 2022-08-09T18:33:48Z

If you're talking about the latent-diffusion demo, that seems to be the relevant code. (I'm not familiar with vqgan to give a straight yes/no)

lucidrains · 2022-08-09T19:46:06Z

yup, it is appropriately normalized! thanks!

lucidrains · 2022-08-10T21:11:15Z

@deepglugs do you want to see if 1.7.0 works? 37953b2 also welcoming any PRs with a small tutorial on how to use it (and how to effectively choose the number of steps to skip)

deepglugs · 2022-08-10T21:47:24Z

Sure. I'll give it a shot.

deepglugs · 2022-08-10T21:59:23Z

Looks like there might be a typo in config.py with the 'video' keyword?

Traceback (most recent call last):
  File "/home/kev/ai/src/deep-imagen/imagen.py", line 731, in <module>
    main()
  File "/home/kev/ai/src/deep-imagen/imagen.py", line 134, in main
    sample(args)
  File "/home/kev/ai/src/deep-imagen/imagen.py", line 141, in sample
    imagen = load(args.imagen).to(args.device)
  File "/home/kev/ai/src/deep-imagen/imagen.py", line 217, in load
    imagen = load_imagen_from_checkpoint(path)
  File "/home/kev/anaconda3/envs/ai/lib/python3.10/site-packages/imagen_pytorch/utils.py", line 37, in load_imagen_from_checkpoint
    imagen = imagen_klass(**imagen_params).create()
  File "/home/kev/anaconda3/envs/ai/lib/python3.10/site-packages/imagen_pytorch/configs.py", line 125, in create
    imagen = ElucidatedImagen(unets, **decoder_kwargs)
TypeError: ElucidatedImagen.__init__() got an unexpected keyword argument 'video'

lucidrains · 2022-08-10T22:06:04Z

@deepglugs 😅 9c6ff50

deepglugs · 2022-08-10T23:19:35Z

Looks like I need to set sigma_max (for Elucidated) down to something really low to get the images to pop out looking like each other:

init_image and sigma_max=1.0.

I wasn't able to go below 1.0, however. 0.1, 0.5 and 0.75 resulted in black images. Would be nice to be able to tune sigma_min/max and the number of sample steps inside of the sample function. You can't adjust these if you are using load_from_checkpoint(). I ended up having to construct a new ElucidatedImagen class, and then manually torch.load the model state dict.

Super cool that we can do this though :)

lucidrains · 2022-08-11T00:01:40Z

@deepglugs here you go! a936ea7

…imagen-pytorch#170

lucidrains added a commit that referenced this issue Aug 10, 2022

allow for initializing with an image or video, addressing #170

37953b2

lucidrains closed this as completed Aug 11, 2022

AIDevMonster added a commit to AIDevMonster/Text-to-Image-Neural-Network-Pytorch that referenced this issue Jun 27, 2023

allow for initializing with an image or video, addressing lucidrains/…

8dcad49

…imagen-pytorch#170

whiteghostDev added a commit to whiteghostDev/Text-to-Image-Neural-Network-Pytorch that referenced this issue Aug 6, 2023

allow for initializing with an image or video, addressing lucidrains/…

4efa906

…imagen-pytorch#170

option to replace noise during sample #170

option to replace noise during sample #170

Comments

deepglugs commented Aug 8, 2022

lucidrains commented Aug 8, 2022

deepglugs commented Aug 8, 2022

Nodja commented Aug 8, 2022 • edited Loading

lucidrains commented Aug 8, 2022

lucidrains commented Aug 8, 2022

lucidrains commented Aug 8, 2022

lucidrains commented Aug 8, 2022

lucidrains commented Aug 8, 2022 • edited Loading

marunine commented Aug 8, 2022 • edited Loading

lucidrains commented Aug 8, 2022

marunine commented Aug 8, 2022 • edited Loading

lucidrains commented Aug 8, 2022

lucidrains commented Aug 8, 2022

deepglugs commented Aug 8, 2022

lucidrains commented Aug 8, 2022

deepglugs commented Aug 8, 2022

marunine commented Aug 8, 2022 • edited Loading

lucidrains commented Aug 8, 2022 • edited Loading

marunine commented Aug 8, 2022 • edited Loading

lucidrains commented Aug 8, 2022

marunine commented Aug 8, 2022

lucidrains commented Aug 8, 2022

Nodja commented Aug 8, 2022 • edited Loading

No init image

Init image with 0.02 init_skip_fraction (doesn't accept 0)

init_image with 0.1 init_skip_fraction

init_image with 0.2 init_skip_fraction

init_image with 0.3 init_skip_fraction

lucidrains commented Aug 8, 2022

lucidrains commented Aug 9, 2022 • edited Loading

Nodja commented Aug 9, 2022 • edited Loading

lucidrains commented Aug 9, 2022

lucidrains commented Aug 10, 2022

deepglugs commented Aug 10, 2022

deepglugs commented Aug 10, 2022

lucidrains commented Aug 10, 2022

deepglugs commented Aug 10, 2022 • edited Loading

lucidrains commented Aug 11, 2022

Nodja commented Aug 8, 2022 •

edited

Loading

lucidrains commented Aug 8, 2022 •

edited

Loading

marunine commented Aug 8, 2022 •

edited

Loading

marunine commented Aug 8, 2022 •

edited

Loading

marunine commented Aug 8, 2022 •

edited

Loading

lucidrains commented Aug 8, 2022 •

edited

Loading

marunine commented Aug 8, 2022 •

edited

Loading

Nodja commented Aug 8, 2022 •

edited

Loading

lucidrains commented Aug 9, 2022 •

edited

Loading

Nodja commented Aug 9, 2022 •

edited

Loading

deepglugs commented Aug 10, 2022 •

edited

Loading