Skip to content

Conversation

@psychedelicious
Copy link
Collaborator

Hi-Res Fix

This usually thought of as a text to image feature, but actually, it's just resizing an image after generation plus img2img on it the resized image. So the main question for this feature is how do we resize the image?

Resizing

  • resize the latents with torch (implemented as a node)
  • resize the image with PIL (implemented as a node)

Each of these methods supports an interpolation mode.

Unfortunately, due to the lack of detail after resizing, you usually have to turn the img2img strength up quite high to restore detail, resulting in a lot of hallucinated weirdness. ControlNet helps but still not great.

In v2.3, hires fix uses torch to resize the latents.

AI Upscaling

There are a number of upscalers out there. The most popular are in the ESRGAN family:

  • RealESRGAN_x4plus: best on photos/realistic images
  • RealESRGAN_x4plus_anime_6B: best for anime
  • ESRGAN_SRx4_DF2KOST_official-ff704c30: less smoothing, more detail (OG ESRGAN, this is the name of the pth file)

There are many fine-tunes of these models for various use-cases.

Other upscalers include:

  • LDSR: Latent Diffusion Super Resolution, a SD1.4 upscaling model, very resource intensive
  • Remacri: Cannot find an official source for this
  • TopazLabs Gigapixel: Good but closed-source if I understand correctly

ControlNet

The tile ControlNet model also produces excellent results in img2img. I haven't tested this because we don't have a functioning implementation, but I have tried using Canny ControlNet on the img2img inference for an AI-upscaled image and it does help to preserve quality.

I wonder if the best results will be had by using AI upscaling followed by tiled ControlNet on the upscaled image...

User Experience

Results with the AI upscalers are so much better than torch/PIL that offering these is mandatory. I mean, we need to do it anyways right.

I think the most sensible workflow for creating larger images than 512 or whatever optimal size for a given model is:

  • Make a lot of images with txt2img (or upload some)
  • Choose the best ones
  • Drop them all as a batch onto img2img
  • AI upscale before inference
  • optionally ControlNet during inference

Note that our current image to image fit parameter is simply using PIL to resize the image before inference. So we already do this, but only in the least effective way.

So what I'd like to do is evolve the simple fit toggle into a resize before inference feature - a new accordion -, which lets you choose resizing methods:

  • torch (latents + interpolation methods)
  • image (PIL + interpolation methods)
  • AI upscaler (ESRGAN/RealESRGAN)

Hi-Res Fix

Finally, I don't really know if this feature even makes sense given the suggested workflow above.

Parameters that make a lot of sense to expose in Hi-Res Fix:

  • scheduler
  • CFG scale
  • img2img strength
  • UNet model
  • steps
  • ControlNet

Obviously it's not feasible to expose all of this on the txt2img tab - plus, we already have all of this on the img2img tab.

I don't think it even really makes sense to have a minimal version that always uses one particular RealESRGAN model and only exposes the img2img strength as a parameter.

Intuitive batch image processing (as described above) sounds like a way more effective workflow, and I suspect the reason it's not popular is because nobody has implemented it yet.

Implementation

So far I've done a lot of experimentation today, and made a very simple RealESRGAN node. The existing upscaling and restoration services may no longer be totally necessary, but we do still need a way to download and provide the upscaling models.

That sounds like a good candidate for the model manager service. It would be nice if this was provided via a model context like main SD models.

The three RealESRGAN models I mentioned above (and two others, which are in the node in this PR) are all hosted on xinntao's github so we can download from there to load them.

I think this is a good starting point for upscaling in general. We can extend this to support user-provided upscalers in the future.

@psychedelicious psychedelicious marked this pull request as draft June 27, 2023 10:34
@mickr777
Copy link
Contributor

mickr777 commented Jun 28, 2023

Only my opinion, but thinking, there will be basic/new users that really only use the txt2img/img2img tab, I assume they would at least expect the same options they get in v2.3 (plus the new 3.0 ones that fit in with the linear ui) and not have to do anything more complicated then change a few options then press invoke.

Dispite how nice nodes are they can be daunting to a new or basic users and currently there is no preset layouts to show them how a nodes should be used (however I would be happy with all being nodes, as I love the nodes) 😁

@psychedelicious
Copy link
Collaborator Author

superseded by #3773

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants