nodes/ui: upscaling & hi-res fix #3599
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi-Res Fix
This usually thought of as a text to image feature, but actually, it's just resizing an image after generation plus img2img on it the resized image. So the main question for this feature is how do we resize the image?
Resizing
Each of these methods supports an interpolation mode.
Unfortunately, due to the lack of detail after resizing, you usually have to turn the img2img strength up quite high to restore detail, resulting in a lot of hallucinated weirdness. ControlNet helps but still not great.
In v2.3, hires fix uses torch to resize the latents.
AI Upscaling
There are a number of upscalers out there. The most popular are in the
ESRGANfamily:RealESRGAN_x4plus: best on photos/realistic imagesRealESRGAN_x4plus_anime_6B: best for animeESRGAN_SRx4_DF2KOST_official-ff704c30: less smoothing, more detail (OG ESRGAN, this is the name of thepthfile)There are many fine-tunes of these models for various use-cases.
Other upscalers include:
LDSR: Latent Diffusion Super Resolution, a SD1.4 upscaling model, very resource intensiveRemacri: Cannot find an official source for thisTopazLabs Gigapixel: Good but closed-source if I understand correctlyControlNet
The
tileControlNet model also produces excellent results in img2img. I haven't tested this because we don't have a functioning implementation, but I have tried using Canny ControlNet on the img2img inference for an AI-upscaled image and it does help to preserve quality.I wonder if the best results will be had by using AI upscaling followed by tiled ControlNet on the upscaled image...
User Experience
Results with the AI upscalers are so much better than torch/PIL that offering these is mandatory. I mean, we need to do it anyways right.
I think the most sensible workflow for creating larger images than 512 or whatever optimal size for a given model is:
Note that our current
image to image fitparameter is simply using PIL to resize the image before inference. So we already do this, but only in the least effective way.So what I'd like to do is evolve the simple
fittoggle into aresize before inferencefeature - a new accordion -, which lets you choose resizing methods:Hi-Res Fix
Finally, I don't really know if this feature even makes sense given the suggested workflow above.
Parameters that make a lot of sense to expose in Hi-Res Fix:
Obviously it's not feasible to expose all of this on the txt2img tab - plus, we already have all of this on the img2img tab.
I don't think it even really makes sense to have a minimal version that always uses one particular RealESRGAN model and only exposes the img2img strength as a parameter.
Intuitive batch image processing (as described above) sounds like a way more effective workflow, and I suspect the reason it's not popular is because nobody has implemented it yet.
Implementation
So far I've done a lot of experimentation today, and made a very simple RealESRGAN node. The existing
upscalingandrestorationservices may no longer be totally necessary, but we do still need a way to download and provide the upscaling models.That sounds like a good candidate for the model manager service. It would be nice if this was provided via a model context like main SD models.
The three RealESRGAN models I mentioned above (and two others, which are in the node in this PR) are all hosted on
xinntao's github so we can download from there to load them.I think this is a good starting point for upscaling in general. We can extend this to support user-provided upscalers in the future.