nodes/ui: upscaling & hi-res fix #3599

psychedelicious · 2023-06-27T08:37:15Z

Hi-Res Fix

This usually thought of as a text to image feature, but actually, it's just resizing an image after generation plus img2img on it the resized image. So the main question for this feature is how do we resize the image?

Resizing

resize the latents with torch (implemented as a node)
resize the image with PIL (implemented as a node)

Each of these methods supports an interpolation mode.

Unfortunately, due to the lack of detail after resizing, you usually have to turn the img2img strength up quite high to restore detail, resulting in a lot of hallucinated weirdness. ControlNet helps but still not great.

In v2.3, hires fix uses torch to resize the latents.

AI Upscaling

There are a number of upscalers out there. The most popular are in the ESRGAN family:

RealESRGAN_x4plus: best on photos/realistic images
RealESRGAN_x4plus_anime_6B: best for anime
ESRGAN_SRx4_DF2KOST_official-ff704c30: less smoothing, more detail (OG ESRGAN, this is the name of the pth file)

There are many fine-tunes of these models for various use-cases.

Other upscalers include:

LDSR: Latent Diffusion Super Resolution, a SD1.4 upscaling model, very resource intensive
Remacri: Cannot find an official source for this
TopazLabs Gigapixel: Good but closed-source if I understand correctly

ControlNet

The tile ControlNet model also produces excellent results in img2img. I haven't tested this because we don't have a functioning implementation, but I have tried using Canny ControlNet on the img2img inference for an AI-upscaled image and it does help to preserve quality.

I wonder if the best results will be had by using AI upscaling followed by tiled ControlNet on the upscaled image...

User Experience

Results with the AI upscalers are so much better than torch/PIL that offering these is mandatory. I mean, we need to do it anyways right.

I think the most sensible workflow for creating larger images than 512 or whatever optimal size for a given model is:

Make a lot of images with txt2img (or upload some)
Choose the best ones
Drop them all as a batch onto img2img
AI upscale before inference
optionally ControlNet during inference

Note that our current image to image fit parameter is simply using PIL to resize the image before inference. So we already do this, but only in the least effective way.

So what I'd like to do is evolve the simple fit toggle into a resize before inference feature - a new accordion -, which lets you choose resizing methods:

torch (latents + interpolation methods)
image (PIL + interpolation methods)
AI upscaler (ESRGAN/RealESRGAN)

Hi-Res Fix

Finally, I don't really know if this feature even makes sense given the suggested workflow above.

Parameters that make a lot of sense to expose in Hi-Res Fix:

scheduler
CFG scale
img2img strength
UNet model
steps
ControlNet

Obviously it's not feasible to expose all of this on the txt2img tab - plus, we already have all of this on the img2img tab.

I don't think it even really makes sense to have a minimal version that always uses one particular RealESRGAN model and only exposes the img2img strength as a parameter.

Intuitive batch image processing (as described above) sounds like a way more effective workflow, and I suspect the reason it's not popular is because nobody has implemented it yet.

Implementation

So far I've done a lot of experimentation today, and made a very simple RealESRGAN node. The existing upscaling and restoration services may no longer be totally necessary, but we do still need a way to download and provide the upscaling models.

That sounds like a good candidate for the model manager service. It would be nice if this was provided via a model context like main SD models.

The three RealESRGAN models I mentioned above (and two others, which are in the node in this PR) are all hosted on xinntao's github so we can download from there to load them.

I think this is a good starting point for upscaling in general. We can extend this to support user-provided upscalers in the future.

mickr777 · 2023-06-28T04:24:05Z

Only my opinion, but thinking, there will be basic/new users that really only use the txt2img/img2img tab, I assume they would at least expect the same options they get in v2.3 (plus the new 3.0 ones that fit in with the linear ui) and not have to do anything more complicated then change a few options then press invoke.

Dispite how nice nodes are they can be daunting to a new or basic users and currently there is no preset layouts to show them how a nodes should be used (however I would be happy with all being nodes, as I love the nodes) 😁

psychedelicious · 2023-07-16T04:16:36Z

superseded by #3773

psychedelicious added 2 commits June 27, 2023 17:49

feat(nodes): add WIP real-esrgan node

ca1b96f

chore(ui): regen types

ee7d700

psychedelicious requested review from Kyle0654, blessedcoolant and maryhipp as code owners June 27, 2023 08:37

psychedelicious marked this pull request as draft June 27, 2023 10:34

wip upscale node

3aca35c

psychedelicious closed this Jul 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

nodes/ui: upscaling & hi-res fix #3599

nodes/ui: upscaling & hi-res fix #3599

Uh oh!

psychedelicious commented Jun 27, 2023

Uh oh!

mickr777 commented Jun 28, 2023 •

edited

Loading

Uh oh!

psychedelicious commented Jul 16, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nodes/ui: upscaling & hi-res fix #3599

nodes/ui: upscaling & hi-res fix #3599

Uh oh!

Conversation

psychedelicious commented Jun 27, 2023

Hi-Res Fix

Resizing

AI Upscaling

ControlNet

User Experience

Hi-Res Fix

Implementation

Uh oh!

mickr777 commented Jun 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

psychedelicious commented Jul 16, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mickr777 commented Jun 28, 2023 •

edited

Loading