Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

option to replace noise during sample #170

Closed
deepglugs opened this issue Aug 8, 2022 · 33 comments
Closed

option to replace noise during sample #170

deepglugs opened this issue Aug 8, 2022 · 33 comments

Comments

@deepglugs
Copy link
Contributor

The idea is that it would allow some of the functionality of cond_image without any training. This has been tested by m9 (discord).

Something like:

imagen.sample(texts, noise=an_image_tensor)
@lucidrains
Copy link
Owner

@deepglugs i'm not sure i understand

could you point to the line of code that you are trying to substitute with your own noise. and how does it relate to the conditioning image?

@deepglugs
Copy link
Contributor Author

Afaiu, the cond_image is concatenated onto the noise. This argument would replace the noise altogether with a "starting image" of sorts.

@Nodja
Copy link
Contributor

Nodja commented Aug 8, 2022

I've seen this before in other diffusion models and works well, it's usually called init_image, having a skip_steps parameter would also be useful, it would skip the first x steps of sampling.

edit: the skip steps would only apply to normal diffusion as elucidated works differently

@lucidrains
Copy link
Owner

@deepglugs ohh got it, but why wouldn't you use the inpainting feature (repaint)? that should work a lot better

@lucidrains
Copy link
Owner

@Nodja ohh cool, i wasn't aware of this technique

you would manually noise the init image and guess at which timestep to start at? that doesn't sound very rigorous

does it work?

@lucidrains
Copy link
Owner

https://github.com/lucidrains/imagen-pytorch#inpainting this should be state of the art for inpainting, if that is what you are trying to achieve

@lucidrains
Copy link
Owner

@Nodja @deepglugs i could add it for exploration purposes i suppose

it is such a young technology, we have no idea what is possible yet

@lucidrains
Copy link
Owner

lucidrains commented Aug 8, 2022

tell you what, i need to walk Ice Cream and run some errands downtown, but post some of the results that m9 is seeing with this approach (or link me to the paper) and i can start thinking about how to fit in this feature (but only if the results look good). if there is no associated paper, maybe would also require a tiny writeup for the readme explaining how to use it

@marunine
Copy link

marunine commented Aug 8, 2022

I don't know if there's a paper specifically but I can comment on the approach I took in lieu of inpainting, which I couldn't get working with satisfiable results, unfortunately.

https://github.com/marunine/imagen-pytorch/blob/b36bceb853d3110e92332dacf19952068fda7ae9/imagen_pytorch/elucidated_imagen.py#L348

Adds the initial noise onto an initial image. The sigma min/max let you control the sampling scheduling and how much noise ends up getting added onto the image. This leverages the model's ability to denoise images like it does during training but is more flexible than cond_image because you can scale how much of the initial you take based on your sigma max.

I've tried the skip sampling steps out, but determined it wasn't necessary because of the control elucidated gives you. The regular imagen would need it but I haven't tested it.

@lucidrains
Copy link
Owner

@marunine ohh maru"9"nine! haha yes i know you!

so i think the repaint paper actually does exactly what you converged on doing

they renoise the conditioning image in the unmasked region to the appropriate noise level for a given timestep. then they repeat that a couple times to harmonize (thus re-paint). i think it should be much better than the other types of inpainting techniques out there, if i coded it up correctly

@marunine
Copy link

marunine commented Aug 8, 2022

I was under the impression that inpaint should be equivalent to this if you pass all zeroes as the mask, but in my experience I ended up getting gray images or noisy images at 1/5 resample loops and equivalent sigma schedule. At the normal sigma schedule it ended up ignoring the initial image altogether, but I suppose that's expected.

I didn't carefully review the code to see if there's a bug somewhere in there, but I agree that it should be more fully functional than what I ended up doing.

@lucidrains
Copy link
Owner

@marunine oh, maybe i misunderstood the purpose here

are you trying to lightly condition the generation by giving it a subliminal image at the very start?

@lucidrains
Copy link
Owner

it sounds like it isn't working anyways, so maybe we should not add it

however, let's definitely reopen an issue for inpainting, if repaint is not working as expected

@deepglugs
Copy link
Contributor Author

it sounds like it isn't working anyways, so maybe we should not add it

Which part isn't working? the init_image?

@lucidrains
Copy link
Owner

@deepglugs

I don't know if there's a paper specifically but I can comment on the approach I took in lieu of inpainting, which I couldn't get working with satisfiable results, unfortunately.

@deepglugs
Copy link
Contributor Author

ah, the in-painting isn't working for this use-case. I see.

@marunine
Copy link

marunine commented Aug 8, 2022

@deepglugs

I don't know if there's a paper specifically but I can comment on the approach I took in lieu of inpainting, which I couldn't get working with satisfiable results, unfortunately.

I meant that I couldn't get inpainting to work. The init_images method works better than I expected - it can pick up pose, style, etc based on how much noise you apply through the sigma schedule. The text caption can be used to guide what you change on that initial image.

It's a little bit more flexible than the binary mask of the inpainting in that regard, but I think inpainting is the better approach overall provided the implementation works.

@lucidrains
Copy link
Owner

lucidrains commented Aug 8, 2022

@marunine @deepglugs can you upload some examples of the success you are seeing with init_image technique? seeing is believing

and thanks for letting me know that repaint isn't working that well

@marunine
Copy link

marunine commented Aug 8, 2022

<images removed for now>

@lucidrains
Copy link
Owner

@marunine ohhh i see, yea, this type of cartoon image is challenging for inpainting. unless one trains with pretty much the whole internet

@marunine
Copy link

marunine commented Aug 8, 2022

The init image approach shines for style transfer. You can make inpainting work if it understands how to add or subtract parts of the image, but it's probably better to just specify the mask and replace it with noise.

I'll let others comment on their experience with inpainting since their version of the repo is closer to your upstream. Mine is a little out of date and I might have made some mistakes when I backported it as a test.

@lucidrains
Copy link
Owner

@marunine @deepglugs ok! i'll definitely consider it later this week when i get back to ddpms

thank you for sharing this!

@Nodja
Copy link
Contributor

Nodja commented Aug 8, 2022

The place I saw init_image used previously was the laion logo generator on replicate here, github repo. (Linking old version of replicate as init_skip_fraction seems to be broken in the newer ones.)

Here's a simple demo of what init_image can accomplish, all values left default unless specified. Init image I used is on the left of the screenshot, just something I quickly drew in paint.

Click to see images

No init image

image

Init image with 0.02 init_skip_fraction (doesn't accept 0)

image

init_image with 0.1 init_skip_fraction

image

init_image with 0.2 init_skip_fraction

image

init_image with 0.3 init_skip_fraction

image

As you can see the more steps you skip the bigger the influence the init_image has on the final result, to the point that it's mostly a straight up copy at values above 0.5.

@lucidrains
Copy link
Owner

ok I'm convinced! I'll add it later this week with credit to whoever discovered this

@lucidrains
Copy link
Owner

lucidrains commented Aug 9, 2022

for the example above, is the initial image normalized to -1 to 1 before summing the noise?

@Nodja
Copy link
Contributor

Nodja commented Aug 9, 2022

If you're talking about the latent-diffusion demo, that seems to be the relevant code. (I'm not familiar with vqgan to give a straight yes/no)

@lucidrains
Copy link
Owner

yup, it is appropriately normalized! thanks!

@lucidrains
Copy link
Owner

@deepglugs do you want to see if 1.7.0 works? 37953b2 also welcoming any PRs with a small tutorial on how to use it (and how to effectively choose the number of steps to skip)

@deepglugs
Copy link
Contributor Author

Sure. I'll give it a shot.

@deepglugs
Copy link
Contributor Author

Looks like there might be a typo in config.py with the 'video' keyword?

Traceback (most recent call last):
  File "/home/kev/ai/src/deep-imagen/imagen.py", line 731, in <module>
    main()
  File "/home/kev/ai/src/deep-imagen/imagen.py", line 134, in main
    sample(args)
  File "/home/kev/ai/src/deep-imagen/imagen.py", line 141, in sample
    imagen = load(args.imagen).to(args.device)
  File "/home/kev/ai/src/deep-imagen/imagen.py", line 217, in load
    imagen = load_imagen_from_checkpoint(path)
  File "/home/kev/anaconda3/envs/ai/lib/python3.10/site-packages/imagen_pytorch/utils.py", line 37, in load_imagen_from_checkpoint
    imagen = imagen_klass(**imagen_params).create()
  File "/home/kev/anaconda3/envs/ai/lib/python3.10/site-packages/imagen_pytorch/configs.py", line 125, in create
    imagen = ElucidatedImagen(unets, **decoder_kwargs)
TypeError: ElucidatedImagen.__init__() got an unexpected keyword argument 'video'

@lucidrains
Copy link
Owner

@deepglugs 😅 9c6ff50

@deepglugs
Copy link
Contributor Author

deepglugs commented Aug 10, 2022

Looks like I need to set sigma_max (for Elucidated) down to something really low to get the images to pop out looking like each other:

image
init_image and sigma_max=1.0.

I wasn't able to go below 1.0, however. 0.1, 0.5 and 0.75 resulted in black images. Would be nice to be able to tune sigma_min/max and the number of sample steps inside of the sample function. You can't adjust these if you are using load_from_checkpoint(). I ended up having to construct a new ElucidatedImagen class, and then manually torch.load the model state dict.

Super cool that we can do this though :)

@lucidrains
Copy link
Owner

@deepglugs here you go! a936ea7

AIDevMonster added a commit to AIDevMonster/Text-to-Image-Neural-Network-Pytorch that referenced this issue Jun 27, 2023
whiteghostDev added a commit to whiteghostDev/Text-to-Image-Neural-Network-Pytorch that referenced this issue Aug 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants