Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about training #5

Closed
JacksonCakes opened this issue Mar 31, 2022 · 4 comments
Closed

Question about training #5

JacksonCakes opened this issue Mar 31, 2022 · 4 comments

Comments

@JacksonCakes
Copy link

Hi, this is really an impressive work! Two question here.

  1. I would like ask is the overall process of the text-guided image editing is using only pre-trained model without any extra training or fine-tuning?
  2. If it does not required any further fine-tuning or training, what is the purpose of having diffusion guided loss (which combine loss from CLIP model and background preservation loss)?

Thanks in advance for your clarification!

@omriav
Copy link
Owner

omriav commented Mar 31, 2022

Hi,

Thank you very much for the kind words!

Yes - there is no need for any further fine-tuning, we simply use the diffusion model as-is.
Essentially, the purpose of the diffusion model is to restrict the editing to the natural images domain.
We want the edit operation to correspond to the guiding text (this is what is CLIP being used for) and to be natural (this is why the diffusion model is being used for).

Hope it clarifies.

Omri

@JacksonCakes
Copy link
Author

JacksonCakes commented Mar 31, 2022

Thanks for you quick respond!

I think I should better rephrase my question 2.
I can see that you are using CLIP loss at each reverse denoising step to better guide the generation of seamless output between background and edited region. So when does the diffusion guided loss which is the combination of CLIP loss and background preservation loss in Algorithm 1 come into play?

@omriav
Copy link
Owner

omriav commented Apr 3, 2022

Algorithm 1 is a weak baseline that we added in the paper and showed that algorithm 2 (AKA Blended Diffusion) produces better results with no need for background preservation loss.

@JacksonCakes
Copy link
Author

Alright, thanks for your clarification!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants