-
-
Notifications
You must be signed in to change notification settings - Fork 768
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
option to replace noise during sample #170
Comments
@deepglugs i'm not sure i understand could you point to the line of code that you are trying to substitute with your own noise. and how does it relate to the conditioning image? |
Afaiu, the cond_image is concatenated onto the noise. This argument would replace the noise altogether with a "starting image" of sorts. |
I've seen this before in other diffusion models and works well, it's usually called init_image, having a skip_steps parameter would also be useful, it would skip the first x steps of sampling. edit: the skip steps would only apply to normal diffusion as elucidated works differently |
@deepglugs ohh got it, but why wouldn't you use the inpainting feature (repaint)? that should work a lot better |
@Nodja ohh cool, i wasn't aware of this technique you would manually noise the init image and guess at which timestep to start at? that doesn't sound very rigorous does it work? |
https://github.com/lucidrains/imagen-pytorch#inpainting this should be state of the art for inpainting, if that is what you are trying to achieve |
@Nodja @deepglugs i could add it for exploration purposes i suppose it is such a young technology, we have no idea what is possible yet |
tell you what, i need to walk Ice Cream and run some errands downtown, but post some of the results that m9 is seeing with this approach (or link me to the paper) and i can start thinking about how to fit in this feature (but only if the results look good). if there is no associated paper, maybe would also require a tiny writeup for the readme explaining how to use it |
I don't know if there's a paper specifically but I can comment on the approach I took in lieu of inpainting, which I couldn't get working with satisfiable results, unfortunately. Adds the initial noise onto an initial image. The sigma min/max let you control the sampling scheduling and how much noise ends up getting added onto the image. This leverages the model's ability to denoise images like it does during training but is more flexible than cond_image because you can scale how much of the initial you take based on your sigma max. I've tried the skip sampling steps out, but determined it wasn't necessary because of the control elucidated gives you. The regular imagen would need it but I haven't tested it. |
@marunine ohh maru"9"nine! haha yes i know you! so i think the repaint paper actually does exactly what you converged on doing they renoise the conditioning image in the unmasked region to the appropriate noise level for a given timestep. then they repeat that a couple times to harmonize (thus re-paint). i think it should be much better than the other types of inpainting techniques out there, if i coded it up correctly |
I was under the impression that inpaint should be equivalent to this if you pass all zeroes as the mask, but in my experience I ended up getting gray images or noisy images at 1/5 resample loops and equivalent sigma schedule. At the normal sigma schedule it ended up ignoring the initial image altogether, but I suppose that's expected. I didn't carefully review the code to see if there's a bug somewhere in there, but I agree that it should be more fully functional than what I ended up doing. |
@marunine oh, maybe i misunderstood the purpose here are you trying to lightly condition the generation by giving it a subliminal image at the very start? |
it sounds like it isn't working anyways, so maybe we should not add it however, let's definitely reopen an issue for inpainting, if repaint is not working as expected |
Which part isn't working? the init_image? |
|
ah, the in-painting isn't working for this use-case. I see. |
I meant that I couldn't get inpainting to work. The init_images method works better than I expected - it can pick up pose, style, etc based on how much noise you apply through the sigma schedule. The text caption can be used to guide what you change on that initial image. It's a little bit more flexible than the binary mask of the inpainting in that regard, but I think inpainting is the better approach overall provided the implementation works. |
@marunine @deepglugs can you upload some examples of the success you are seeing with and thanks for letting me know that repaint isn't working that well |
|
@marunine ohhh i see, yea, this type of cartoon image is challenging for inpainting. unless one trains with pretty much the whole internet |
The init image approach shines for style transfer. You can make inpainting work if it understands how to add or subtract parts of the image, but it's probably better to just specify the mask and replace it with noise. I'll let others comment on their experience with inpainting since their version of the repo is closer to your upstream. Mine is a little out of date and I might have made some mistakes when I backported it as a test. |
@marunine @deepglugs ok! i'll definitely consider it later this week when i get back to ddpms thank you for sharing this! |
The place I saw init_image used previously was the laion logo generator on replicate here, github repo. (Linking old version of replicate as init_skip_fraction seems to be broken in the newer ones.) Here's a simple demo of what init_image can accomplish, all values left default unless specified. Init image I used is on the left of the screenshot, just something I quickly drew in paint. Click to see imagesNo init imageInit image with 0.02 init_skip_fraction (doesn't accept 0)init_image with 0.1 init_skip_fractioninit_image with 0.2 init_skip_fractioninit_image with 0.3 init_skip_fractionAs you can see the more steps you skip the bigger the influence the init_image has on the final result, to the point that it's mostly a straight up copy at values above 0.5. |
ok I'm convinced! I'll add it later this week with credit to whoever discovered this |
for the example above, is the initial image normalized to -1 to 1 before summing the noise? |
If you're talking about the latent-diffusion demo, that seems to be the relevant code. (I'm not familiar with vqgan to give a straight yes/no) |
yup, it is appropriately normalized! thanks! |
@deepglugs do you want to see if 1.7.0 works? 37953b2 also welcoming any PRs with a small tutorial on how to use it (and how to effectively choose the number of steps to skip) |
Sure. I'll give it a shot. |
Looks like there might be a typo in config.py with the 'video' keyword?
|
Looks like I need to set sigma_max (for Elucidated) down to something really low to get the images to pop out looking like each other: I wasn't able to go below 1.0, however. 0.1, 0.5 and 0.75 resulted in black images. Would be nice to be able to tune sigma_min/max and the number of sample steps inside of the sample function. You can't adjust these if you are using load_from_checkpoint(). I ended up having to construct a new ElucidatedImagen class, and then manually torch.load the model state dict. Super cool that we can do this though :) |
@deepglugs here you go! a936ea7 |
The idea is that it would allow some of the functionality of cond_image without any training. This has been tested by m9 (discord).
Something like:
The text was updated successfully, but these errors were encountered: