Scribble-guided editing #8

wileewang · 2022-06-14T05:28:02Z

Hi! I wonder if a loss such as MSE or LPIPS is used between the user-provided scribbles and the scribbled regions of $\widehat{x}_0$ , in addition to the CLIP loss. I am curious how the shapes and colors stay consistent when only text with no specific description, e.g., "blanket" in Fig 9, is given.

omriav · 2022-06-14T09:32:13Z

Hi,

Thank you for your interest in our work.
No - there is no need for MSE/LPIPS loss, the only signal for the scribbles comes from the partial nosing of the image (i.e. to noise the image to a certain noise level).
The shapes and the colors stay somewhat consistent because of the why the diffusion model operates - the initial stages generate a rough sketch of the image and the finer details are added later, so we can noise the image up to the point that preserves the colors/shapes. For more details please see Figure 32 in the paper.

wileewang · 2022-06-16T01:31:22Z

I see. Thanks for your reminding.

wileewang closed this as completed Jun 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scribble-guided editing #8

Scribble-guided editing #8

wileewang commented Jun 14, 2022

omriav commented Jun 14, 2022

wileewang commented Jun 16, 2022

Scribble-guided editing #8

Scribble-guided editing #8

Comments

wileewang commented Jun 14, 2022

omriav commented Jun 14, 2022

wileewang commented Jun 16, 2022