Comparison discussion #3

x-legion · 2023-03-07T06:29:46Z

MultiDiffusion Seems to be doing worse (not sharp) or am i doing something wrong?
original:

MultiDiffusion:

Ultimate SD Upscale:

pkuliyi2015 · 2023-03-07T10:13:37Z

Hello, would you please provide your weights (including the checkpoint & lora needed if you use lora) for your original image? I need them to reproduce your results in an oil-painting fashion. The MultiDiffusion results can be severely affected by the model checkpoints & lora you used.

But generally speaking, extraordinary high CFG Scale, and slightly higher denoising value will give you satisfying details. Example positive prompts are "highres, masterpiece, best quality, ultra-detailed unity 8k wallpaper, extremely clear, very clear, ultra-clear". You don't need anything concrete things in positive prompts; and then, drag the CFG Scale to an extra-large value. Denoising values between 0.1 and 0.4 are all OK but the content will change accordingly.

Here is my result of CFG=20, Sampler=DPM++ SDE Karras, denoising strength=0.3 for example. As I use the protogenX34 checkpoint, my painting style will be wildly different from yours:

Please comment on this issue if you find your results have significantly improved after you use proper model and CFG values.

jurandfantom · 2023-03-09T15:55:02Z

Hi there, I will write here to not create new "issue" about similar thing.
Would be possible to write down or picture all settings that were used to upscale picture attached in extension description ? I think I tested everything but only what I get is blurred upscaled picture. Here is one of example results that shows how blurry result is (not to mention about lack of extra details with denoise at 0.3 and CFG at 20 - as example). Atm. I want copy 1:1 everything to see if issue is on my side or what. Thanks for create that extension - have high hopes
Example picture.

pkuliyi2015 · 2023-03-09T17:07:27Z

Hello, as you wish I provide the PNG info:

Here is the text version for your convenience. All resources are public things, but I'm quite busy and cannot provide your links.

masterpiece, best quality, highres, extremely detailed 8k unity wallpaper, ultra-detailed
Negative prompt: EasyNegative
Steps: 24, Sampler: DPM++ SDE Karras, CFG scale: 7, Seed: 1614054406, Size: 4096x3200, Model hash: 2ccfc34fe3, Model: 0.9(Gf_style2) + 0.1(abyssorangemix2_Hard), Denoising strength: 0.4, Clip skip: 3, Mask blur: 4, MultiDiffusion upscaler: 4x_foolhardy_Remacri, MultiDiffusion scale factor: 4, MultiDiffusion tile width: 128, MultiDiffusion tile height: 128, MultiDiffusion overlap: 64

If you don't know any of them, you can Google it. But your result is likely to come from pool positive and negative prompts, where I use a Textual Inversion called EasyNegative from civitai.com.

x-legion · 2023-03-09T18:19:34Z

Click Here for Better Comparison View

original

masterpiece, best quality, portrait,
blue fire, silver hair, fox girl, mage, arm extended, holding blue fire, by jordan grimmer and greg rutkowski and pine ハイネ wlop, intricate, beautiful, trending artstation, pixiv, digital art, anime, no torch,
<lora:Noise:1.75>
Negative prompt: EasyNegative, lowres, ((bad anatomy)), ((bad hands)), text, missing finger, extra digits, fewer digits, blurry, ((mutated hands and fingers)), (poorly drawn face), ((mutation)), ((deformed face)), (ugly), ((bad proportions)), ((extra limbs)), extra face, (double head), (extra head), ((extra feet)), monster, logo, cropped, worst quality, low quality, normal quality, jpeg, humpbacked, long body, long neck, ((jpeg artifacts))
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 3857533696, Size: 640x960, Model: dreamniji3fp16, Clip skip: 2, ENSD: 31337, Discard penultimate sigma: True

Ultimate SD upscaler

masterpiece, best quality, portrait,
blue fire, silver hair, fox girl, mage, arm extended, holding blue fire, by jordan grimmer and greg rutkowski and pine ハイネ wlop, intricate, beautiful, trending artstation, pixiv, digital art, anime, no torch,
<lora:Noise:1.75>
Negative prompt: EasyNegative, lowres, ((bad anatomy)), ((bad hands)), text, missing finger, extra digits, fewer digits, blurry, ((mutated hands and fingers)), (poorly drawn face), ((mutation)), ((deformed face)), (ugly), ((bad proportions)), ((extra limbs)), extra face, (double head), (extra head), ((extra feet)), monster, logo, cropped, worst quality, low quality, normal quality, jpeg, humpbacked, long body, long neck, ((jpeg artifacts))
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 14, Seed: 3857533696, Size: 1280x1920, Model: dreamniji3fp16, Denoising strength: 0.4, Clip skip: 2, ENSD: 31337, Mask blur: 4, Ultimate SD upscale upscaler: 4x_foolhardy_Remacri, Ultimate SD upscale tile_width: 768, Ultimate SD upscale tile_height: 768, Ultimate SD upscale mask_blur: 8, Ultimate SD upscale padding: 32, Discard penultimate sigma: True

MultiDiffusion

masterpiece, best quality, portrait,
blue fire, silver hair, fox girl, mage, arm extended, holding blue fire, by jordan grimmer and greg rutkowski and pine ハイネ wlop, intricate, beautiful, trending artstation, pixiv, digital art, anime, no torch,
<lora:Noise:1.75>
Negative prompt: EasyNegative, lowres, ((bad anatomy)), ((bad hands)), text, missing finger, extra digits, fewer digits, blurry, ((mutated hands and fingers)), (poorly drawn face), ((mutation)), ((deformed face)), (ugly), ((bad proportions)), ((extra limbs)), extra face, (double head), (extra head), ((extra feet)), monster, logo, cropped, worst quality, low quality, normal quality, jpeg, humpbacked, long body, long neck, ((jpeg artifacts))
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 14, Seed: 3857533696, Size: 1280x1920, Model: dreamniji3fp16, Denoising strength: 0.4, Clip skip: 2, ENSD: 31337, Mask blur: 4, MultiDiffusion upscaler: 4x_foolhardy_Remacri, MultiDiffusion scale factor: 2, Discard penultimate sigma: True

jurandfantom · 2023-03-09T18:24:15Z

Ok, now I know it might be something wrong on my side. I can see additional details (will check its because of clip skip 3 or upscaler or what) but its still blurred. That super weird - ahh and thanks for reply. Attached pictures to description don't have infos attached (that why I ask :) )

x-legion · 2023-03-09T18:46:59Z

https://imgsli.com/MTYwOTcx same here again

pkuliyi2015 · 2023-03-09T23:23:06Z

Hello, thanks for your interests in this work. I tried for several minutes on your image and here is my result with no tuning:
https://imgsli.com/MTYxMDI5.

It's hard to tell what is better; if you like illustration-style sharpness and faithfulness to the original image, may be Ultimate SD Upscaler + 4x Ultra Sharp is your best choice. But personally I'd like to see some fabricated details on realistic human face, so I prefer this tool.

It's noteworthy that, the biggest difference between MultiDiffusion and other upscalers is that currently it doesn't support any concrete contents when you upscale a image, otherwise each tile will contain a small character and your image finally becomes blur and messy.

The correct prompts is just as follows. I even don't use lora:

And my configurations, FYI:

DenkingOfficial · 2023-03-09T23:23:08Z

I provide the PNG info

I tried to replicate your settings with an image provided by OP and it's still very blurry:

Compared to an image you sent:

As you can see, settings are pretty much the same except CFG scale:

pkuliyi2015 · 2023-03-09T23:26:26Z

Update: Oh I just noticed that, EasyNegative is a textual inversion from civitai.com, it is not a word. Please download that textual inversion.

Here is the link: https://civitai.com/models/7808/easynegative

The Upscalers are important too. I personally use two: 4x-UltraSharp and 4x-remacri. Here is the link:
https://upscale.wiki/wiki/Model_Database
Where you can find the two upscalers and put it in your ESRGAN folder.

DenkingOfficial · 2023-03-09T23:30:11Z

4x-remacri

I used it with the image above

EasyNegative is a textual inversion

Already downloaded this embedding

pkuliyi2015 · 2023-03-09T23:30:46Z

4x-remacri

I used it with the image above

Do you use EasyNegative embeddings?

You mean you have used it in the above images?

DenkingOfficial · 2023-03-09T23:33:13Z

You mean you have used it in the above images?

Yes, it was used

UPD:

pkuliyi2015 · 2023-03-09T23:48:14Z

You mean you have used it in the above images?

Yes, it was used

UPD:

I spend some time to find the original PNG info. Here is it, please try to reproduce using my params:

pkuliyi2015 · 2023-03-09T23:55:46Z

It may not be as easy as the Ultimate Upscaler to use, as it's essentially a completely redraw without post-processing. Personally I have some intuitions to use it:

No concrete positive prompts. Just something like clear, very clear, ultra clear
Don't use too large tile size as SD 1.4 is only good at 512 - 768 (so you divide it by 8 and get 64 - 96).
Large CFG Scales, Eular a & DPM++ SDE Karras, Denoising=0.2-0.4
Try both 4x-UltraSharp and 4x-Remacri
Clip Skip=2 or 3 worth to try.

DenkingOfficial · 2023-03-10T00:10:24Z

please try to reproduce using my params

I just did it and it's a lot better

Settings (Even seed is the same):

But still it can't generate a result as good as yours
I know it highly depends on a hardware, but there's a very large difference in details
No any optimizations used (Such as xformers, opt-split-attention etc.)

My:

And yours:

pkuliyi2015 · 2023-03-10T00:18:53Z

please try to reproduce using my params

I just did it and it's a lot better

Settings (Even seed is the same):

But still it can't generate a result as good as yours I know it highly depends on a hardware, but there's a very large difference in details No any optimizations used (Such as xformers, opt-split-attention etc.)

My:

And yours:

I'm also confused. Are you using this model?

https://civitai.com/models/3666/protogen-x34-photorealism-official-release

I see our model hash is different. Except from this I couldn't find something else.

DenkingOfficial · 2023-03-10T08:56:49Z

I'm also confused. Are you using this model?

Yes, I used protogen_x3.4, but pruned
Now I downloaded 5GB version with the same hash as your and THAT'S AMAZING

Very huge improvement in details:

It still not produces the exact same result as yours, I quess it depends on a hardware, but details are unbelievable, I can clearly see stitch seam on the sleeve

pkuliyi2015 · 2023-03-10T09:02:24Z

Oh thanks for your feedback. I don't know that pruned model can affect the details too before you test it.

jurandfantom · 2023-03-10T09:56:43Z

Ohh! I think not many knows that to be honest o_O As much as I understand pruning, it should not affect such task as upscalling via small tiles? I gonna try with not pruned model as well and let you know.

Edit. No clue but today everything works as it should. Maybe Its needed to turn off and on everything, not just to restart UI - just like during installing Dreambooth

2blackbar · 2023-03-10T14:33:48Z

tried it and to be honest esrgan upscalers do 99% of the lifting, it barely does anything when used with lanczos, unless theres gonna be examples of it with lanczos where it introduces new details ? Best bet is to just upscale with esrgan by 2 and go to inpaint with it to mask the parts one by one to upscale them since you gonna have more pixel area to resolve detail, so unless someone will automate that , its gonna stay as the best way to upscale

jurandfantom · 2023-03-10T15:58:38Z

More tests. ControlNet not work or it need way lower denoise than I used.
Upscaling for attached was in two passes plus dynamic CFG script - agree, way to off from original picture, but now when i know what and where, its time for fine tunning (hopefully to figure out issue with control net).

Indeed its essential to test couple upscalers because differences are huge - even bigger than used SD model.

jurandfantom · 2023-03-10T16:03:22Z

Left is my, right is pkuliyi2015
As you can see, left have way more details, but some noise and weird issues as well - pure remacri x4 looks almost like pkuliyi2015 version. Plenty of space for tests

x-legion · 2023-03-11T16:39:13Z

tried it and to be honest esrgan upscalers do 99% of the lifting, it barely does anything when used with lanczos, unless theres gonna be examples of it with lanczos where it introduces new details ? Best bet is to just upscale with esrgan by 2 and go to inpaint with it to mask the parts one by one to upscale them since you gonna have more pixel area to resolve detail, so unless someone will automate that , its gonna stay as the best way to upscale

This is basically a tile-by-tile img2img SD redraw. So if you don't give it high strength it doesn't work as you expected. However, one of the weakness is that it currently cannot automatically map your prompts to different areas... If you can use stronger prompts, it should be way better.

But I'm working on Automatic Prompt Mapping. In img2img, it works by first estimate the attention map of your prompt to the original picture, and then re-apply them to multidiffusion tiles. In txt2img this may be similar, but I need time to do so.

https://github.com/dustysys/ddetailer.git try this one

pkuliyi2015 · 2023-03-11T19:56:26Z

tried it and to be honest esrgan upscalers do 99% of the lifting, it barely does anything when used with lanczos, unless theres gonna be examples of it with lanczos where it introduces new details ? Best bet is to just upscale with esrgan by 2 and go to inpaint with it to mask the parts one by one to upscale them since you gonna have more pixel area to resolve detail, so unless someone will automate that , its gonna stay as the best way to upscale

I’m sorry for accidentally wrong edit.

This is basically a tile-by-tile img2img SD redraw. So if you don't give it high strength it doesn't work as you expected. However, one of the weakness is that it currently cannot automatically map your prompts to different areas... If you can use stronger prompts, it should be way better.

But I'm working on Automatic Prompt Mapping. In img2img, it works by first estimate the attention map of your prompt to the original picture, and then re-apply them to multidiffusion tiles. In txt2img this may be similar, but I need time to do so.

pkuliyi2015 · 2023-03-12T03:28:01Z

The key point is that I need a user interface to draw bbox, so that you can draw rectangles and control the MultiDiffusion with different prompts. In this way the result should get way better.

Why? because in this way you can just select the woman's face and tell SD to draw a beautiful woman's face. Then the SD will try his best, using his 512 * 512 resolution to ONLY draw a face. The resolution will be unprecedentedly high for SD models, as he dedicated to draw only one part of the image at the best of his capabilities.

However, when I was adding features I saw this f**king issue:
gradio-app/gradio#2316

Some one pr a bbox tool but the officials denied the merging:
gradio-app/gradio#3220

I don't know what are they thinking in mind to deny such a good PR (from my perspective) but don't provide their own solutions. It has been a half year since it was first proposed.

So it will be hard to draw rectangles on images directly. I must find another way to draw rectangles. Do you have any other idea?

ManOrMonster · 2023-03-12T19:26:54Z

So it will be hard to draw rectangles on images directly. I must find another way to draw rectangles. Do you have any other idea?

Check out this extension: https://github.com/hnmr293/sd-webui-llul

It fakes it by having you move around a rectangle in a separate window.

x-legion · 2023-03-13T13:08:41Z

https://www.reddit.com/r/StableDiffusion/comments/11pyiro/new_feature_zoom_enhance_for_the_a111_webui/

New Feature: "ZOOM ENHANCE" for the A111 WebUI. Automatically fix small details like faces and hands!

Hello, fellow Stable Diffusion users! I'm excited to share with you a new feature that I've added to the Unprompted extension: it's the [zoom_enhance] shortcode.

If you're not familiar with Unprompted, it's a powerful extension that lets you use various shortcodes in your prompts to enhance your text generation experience. You can learn more about it here.

The [zoom_enhance] shortcode is inspired by the fictional technology from CSI, where they can magically zoom in on any pixelated image and reveal crisp details. Of course, this is not possible in real life, but we can get pretty close with Stable Diffusion and some clever tricks.

The shortcode allows you to automatically upscale small details within your image where Stable Diffusion tends to struggle. It is particularly good at fixing faces and hands in long-distance shots.

How does it work?

The [zoom_enhance] shortcode searches your image for specified target(s), crops out the matching regions and processes them through [img2img]. It then blends the result back into your original image. All of this happens behind-the-scenes without adding any unnecessary steps to your workflow. Just set it and forget it.

Features and Benefits

Great in both txt2img and img2img modes.
The shortcode is powered by the [txt2mask] implementation of clipseg, which means you can search for literally anything as a replacement target, and you get access to the full suite of [txt2mask] settings, such as "padding" and "negative_mask."
It's also pretty good at deepfakes. Set mask="face" and replacement="another person's face" and check out the results.
It applies a gaussian blur to the boundaries of the upscaled image which helps it blend seamlessly with the original.
It is equipped with Dynamic Denoising Strength which is based on a simple idea: the smaller your replacement target, the worse it probably looks. Think about it: when you generate a character who's far away from the camera, their face is often a complete mess. So, the shortcode will use a high denoising strength for small objects and a low strength for larger ones.
It is significantly faster than Hires Fix and won't mess up the rest of your image.
Compatible with A111's color correction setting.

How to use it?

To use this feature, you need to have Unprompted installed on your WebUI. If you don't have it yet, you can get it from here.

Once you have Unprompted, simply add this line anywhere in your prompt:

pkuliyi2015 · 2023-03-13T14:20:59Z

I have investigated a new technology DDNM (https://github.com/wyhuai/DDNM) that is very powerful in super-resolution. And it is also compatible with MultiDiffusion. Through initial test I found it is amazing. I believe this can beat their new feature in a compelling way.

The automatic mask technology seems not very compatible with multi-diffusion txt2img but I will try in img2img

Vuhiep190297 · 2023-03-13T18:26:29Z

How long does it take you to upgrade a photo, how can it be faster? Here are my settings

Rkkss · 2023-04-03T17:42:44Z

I made a few comparison with Ultimate upscaler (default settings, CFG 10, DDIM, denoise 0.23 ) and mixture of diffuser.
The original image

vs Denoise 0.23, DDIM:
https://imgsli.com/MTY2Njkw

vs Denoise 0.35 DDIM
https://imgsli.com/MTY2Njkx

vs Denoise 0.35 Euler A - cfg 14
https://imgsli.com/MTY2Njky

MD is good at adding extra details, without overcook image, you can go with high denoise and cfg but as far upscaling go, Ultimate SD upscaler still has less pixelated texture when you zoom in, especially the hand and face.
Parameter for MD:
Tiled Diffusion upscaler: 4x-UltraSharp, Tiled Diffusion scale factor: 2, Tiled Diffusion: "{'Method': 'Mixture of Diffusers', 'Latent tile width': 64, 'Latent tile height': 64, 'Overlap': 48, 'Tile batch size': 4, 'Upscaler': '4x-UltraSharp', 'Scale factor': 2, 'Keep input size': True}"
Got way bad result with recommended settings. Maybe I'm doing wrong

I inpainted a bit before upscaling, here is the actual original image if anyone want to try out:
https://files.catbox.moe/wek7ed.png

RainehDaze · 2023-04-06T02:38:23Z

Hm, I've been using the region prompt, and I've noticed that if anything, it seems even worse about concatenating random people on the boundaries--even if there's nothing in the main prompt about people.

This sort of thing is nearly constant:

In many regards, it's performing even worse than just straight generating a 1024x1024 image. 20 images in a batch, and only one didn't have extra people (instead, it duplicated the entire horizon):

SamBigAbs · 2023-04-06T17:02:42Z

Hm, I've been using the region prompt, and I've noticed that if anything, it seems even worse about concatenating random people on the boundaries--even if there's nothing in the main prompt about people.

This sort of thing is nearly constant:

In many regards, it's performing even worse than just straight generating a 1024x1024 image. 20 images in a batch, and only one didn't have extra people (instead, it duplicated the entire horizon):

Because most models are 512x512 this is more likely to occur the larger the images you try to create. I would first check that your batch size in multidiffusion is 1 with . Increasing Tile size and/or overlap may decrease the likelihood of this occurring. Try 80x80 with an overlap of 16 and latent tile batch size of 8, or 96x96 with an overlap of 32, or even 128x128 with an overlap of 64 and batch size of 4. If that does not fix it than create the images at 768x768 or even 512x512 then upscale them.
The greater the overlap the more context each tile has from its surrounding tiles.

RainehDaze · 2023-04-06T17:04:59Z

Because most models are 512x512 this is more likely to occur the larger the images you try to create. I would first check that your batch size in multidiffusion is 1 with . Increasing Tile size and/or overlap may decrease the likelihood of this occurring. Try 80x80 with an overlap of 16 and latent tile batch size of 8, or 96x96 with an overlap of 32, or even 128x128 with an overlap of 64 and batch size of 4. If that does not fix it than create the images at 768x768 or even 512x512 then upscale them. The greater the overlap the more context each tile has from its surrounding tiles.

The point is, the region control is supposed to be used to help avoid such things while composing larger images. This is literally the point of it, and what the demonstration pictures were showing.

SamBigAbs · 2023-04-06T17:30:34Z

Because most models are 512x512 this is more likely to occur the larger the images you try to create. I would first check that your batch size in multidiffusion is 1 with . Increasing Tile size and/or overlap may decrease the likelihood of this occurring. Try 80x80 with an overlap of 16 and latent tile batch size of 8, or 96x96 with an overlap of 32, or even 128x128 with an overlap of 64 and batch size of 4. If that does not fix it than create the images at 768x768 or even 512x512 then upscale them. The greater the overlap the more context each tile has from its surrounding tiles.

The point is, the region control is supposed to be used to help avoid such things while composing larger images. This is literally the point of it, and what the demonstration pictures were showing.

Have you tried what I suggested what I suggested?

RainehDaze · 2023-04-06T17:35:47Z

Because most models are 512x512 this is more likely to occur the larger the images you try to create. I would first check that your batch size in multidiffusion is 1 with . Increasing Tile size and/or overlap may decrease the likelihood of this occurring. Try 80x80 with an overlap of 16 and latent tile batch size of 8, or 96x96 with an overlap of 32, or even 128x128 with an overlap of 64 and batch size of 4. If that does not fix it than create the images at 768x768 or even 512x512 then upscale them. The greater the overlap the more context each tile has from its surrounding tiles.

The point is, the region control is supposed to be used to help avoid such things while composing larger images. This is literally the point of it, and what the demonstration pictures were showing.

Have you tried what I suggested what I suggested?

You realise your suggestions are totally irrelevant, right?

Like, the point of region prompting is that you can have a larger image (with a background prompt using the usual MD merging), and then a specific foreground region (or regions) that are meant to contain specific things. It's even spelled out on the main page, including that tile size doesn't really matter for this one.

Creating an image at 512 or 768 and upscaling also completely defeats the point, which is that your standard SD-sized generation would only be a component of an image with a different aspect ratio, and not full of body part concatenation.

(I think it might actually be that some quality-related things tend to act as catalysts for drawing people; I'm not sure and I'm going to keep poking away)

SamBigAbs · 2023-04-06T17:47:30Z

image at 512 or 768 and upscaling also completely defeats the point, which is that your standard SD-sized generation would only be a component of an image with a different aspect ratio, and not full of body pa

Image height is 1024. 8 tiles are in a batch so a height of 128 prevents the problem you are having. You can see my settings in the screenshot.

RainehDaze · 2023-04-06T17:49:57Z

image at 512 or 768 and upscaling also completely defeats the point, which is that your standard SD-sized generation would only be a component of an image with a different aspect ratio, and not full of body pa

Image height is 1024. 8 tiles are in a batch so a height of 128 prevents the problem you are having. You can see my settings in the screenshot.

It doesn't, actually, because 128x128 tiles were what I was using when I was testing, and the concatenation kept happening. I'm pretty sure, after some more tests, that random tokens were actually prompting for people (for some unspeakable reason). Getting to this sort of thing consistently was a matter of changing the prompt settings, not messing with the tiles:

SamBigAbs · 2023-04-06T17:52:23Z

image at 512 or 768 and upscaling also completely defeats the point, which is that your standard SD-sized generation would only be a component of an image with a different aspect ratio, and not full of body pa

Image height is 1024. 8 tiles are in a batch so a height of 128 prevents the problem you are having. You can see my settings in the screenshot.

It doesn't, actually, because 128x128 tiles were what I was using when I was testing, and the concatenation kept happening. I'm pretty sure, after some more tests, that random tokens were actually prompting for people (for some unspeakable reason). Getting to this sort of thing consistently was a matter of changing the prompt settings, not messing with the tiles:

If you check powershell or command line it shows that at 128x128 Multi Diffusion does not take effect because the image is too small. You have to use a tile width smaller than 128.

RainehDaze · 2023-04-06T17:55:14Z

Looking at the command line, multidiffusion was doing its thing. Probably because of region control. Which, again, to reference the MAIN PAGE FOR THIS REPO, says (with regard to region prompt)

The tile size parameters become useless; just ignore them

seriously, do you think the person maintaining this knows less about how it works than you do?

SamBigAbs · 2023-04-06T18:16:38Z

Looking at the command line, multidiffusion was doing its thing. Probably because of region control. Which, again, to reference the MAIN PAGE FOR THIS REPO, says (with regard to region prompt)

The tile size parameters become useless; just ignore them

seriously, do you think the person maintaining this knows less about how it works than you do?

He is probably referring to it in the context of img2img, not txt2img.
And yes it possible to know more about how to use a tool than the person that made it. Musicians are better at their instruments than the people that made them.

RainehDaze · 2023-04-06T18:17:39Z

That's like saying a guitar player knows more about how an amplifier works.

SamBigAbs · 2023-04-06T18:19:26Z

That's like saying a guitar player knows more about how an amplifier works.

No. It's like the thing I said. You can't just come up with a different analogy to discredit my first one.

SamBigAbs · 2023-04-06T18:25:23Z

That's like saying a guitar player knows more about how an amplifier works.

In fact that is the exact opposite of my original analogy which is that the artist can utilize the tool better than the creator. It does not imply that the artist has the ability to design or create the tool.

RainehDaze · 2023-04-06T18:28:22Z

Your analogy was flawed, because I said how it works. The creator of something is more likely to know whether a certain setting actually does anything for a given setting than a user, even if the user is extremely good at it.

Anyway, I did more testing. It was the prompt causing humans to be generated where they really shouldn't be (like the entire half of an image that was only supposed to be scenery) and concatenating things when adjacent. Seriously, it was doing things like this:

or this:

When there was supposed to be only scenery to either side (and obviously nothing was describing those particular people). As I noted, it seems that a lot of tags that describe image quality are actually tied really strongly to generating people.

pkuliyi2015 · 2023-04-07T03:50:45Z

Thank you for making attempts on this. This is a classical noise pollution problem where the foreground noises triggered the undesirable multi-character change in the background, when your model is not that good for high resolution image generation.

This can be partly mitigate by adding some negative prompts in the background regions. However, this may not solve the problem totally. I am considering a much more powerful merging strategy and corresponding ui that lets you fuses images better.

you will definitely like it.

RainehDaze · 2023-04-07T05:31:09Z

Thank you for making attempts on this. This is a classical noise pollution problem where the foreground noises triggered the undesirable multi-character change in the background, when your model is not that good for high resolution image generation.

This can be partly mitigate by adding some negative prompts in the background regions. However, this may not solve the problem totally. I am considering a much more powerful merging strategy and corresponding ui that lets you fuses images better.

you will definitely like it.

It wasn't too bad once there were no triggering tags in the general prompt (only 5 or 6 out of 100), and I got this out of it all with region control and the noise inversion:

But anything that would make for better image composition is great (only about 9 of the 100 had reasonable background coherency).

Rorowalnuss · 2023-04-08T16:15:01Z

Hello, I am trying to use Multidifusion to place kemono characters in the background, but the checkpoint I am using requires Hires fix and hypernet to be enabled by default, otherwise it will generate humans.

The overall prompt words only describe the camera and background, as well as enabling hypernet. Enter character prompts for the foreground, and no prompts for the background. The first few steps of the denoising process can generate kemono normally, but in the end, the Hires fix transforms the character into a human. I tried to reduce the denoising value of the Hires fix, but it will result in fewer and more blurry image details. Increasing the denoising will make the character more like a human.

I don't know if this situation is due to the incompatibility between Hires fix and Multidifusion or if hypernet did not start properly.

pkuliyi2015 · 2023-04-08T16:42:48Z

Hello, I am trying to use Multidifusion to place kemono characters in the background, but the checkpoint I am using requires Hires fix and hypernet to be enabled by default, otherwise it will generate humans.

The overall prompt words only describe the camera and background, as well as enabling hypernet. Enter character prompts for the foreground, and no prompts for the background. The first few steps of the denoising process can generate kemono normally, but in the end, the Hires fix transforms the character into a human. I tried to reduce the denoising value of the Hires fix, but it will result in fewer and more blurry image details. Increasing the denoising will make the character more like a human.

I don't know if this situation is due to the incompatibility between Hires fix and Multidifusion or if hypernet did not start properly.

I make a trial fix. Please switch to the dev branch and have a test. If it works please tell me on time.

Rorowalnuss · 2023-04-09T02:57:42Z

Hello, I am trying to use Multidifusion to place kemono characters in the background, but the checkpoint I am using requires Hires fix and hypernet to be enabled by default, otherwise it will generate humans.
The overall prompt words only describe the camera and background, as well as enabling hypernet. Enter character prompts for the foreground, and no prompts for the background. The first few steps of the denoising process can generate kemono normally, but in the end, the Hires fix transforms the character into a human. I tried to reduce the denoising value of the Hires fix, but it will result in fewer and more blurry image details. Increasing the denoising will make the character more like a human.
I don't know if this situation is due to the incompatibility between Hires fix and Multidifusion or if hypernet did not start properly.

I make a trial fix. Please switch to the dev branch and have a test. If it works please tell me on time.

IT doesnt work well.The first image uses Multidifusion with Hires fix Denoising=0.7, while the second image does not use Multidifusion.
You can see that using Multidifusion generates completely different characters, and the third image is a screenshot of the denoising process.

I tried to turn off Hires fix when using Multidifusion in t2i and move the generated blurry image to i2i, but the background details did not increase. To be honest, it was only changed to high-definition, while Hires fix can add things that were not in the original image.

$01407-1454243371-(masterpiece_1 3), (2D_1 0), (anime_1 0), (illustration_1 0), (sharp_1 2),_(hard light_1 0), (shadow_1 0),(reflection, refractio$
$00035-3734885203-(masterpiece_1 3), (2D_1 0), (anime_1 0), (illustration_1 0), (sharp_1 2),_(hard light_1 0), (shadow_1 0),(reflection, refractio (1)$

I also tried the other three models proposed by the checkpoint author, neither of which requires Hypernet to be enabled. However, two of these models also encountered a problem with character image changes when opening both Hires fix and Multidifusion, while the other model was able to generate Kemono characters normally.
If you are interested, the model address is below.

https://civitai.com/models/11888?modelVersionId=32830

It has been verified that the model that can use Multidifusion normally is crossfemono2.0, while the models that cannot be used normally are G, G2, F, and D

qiuchengzhi · 2023-04-14T14:47:57Z

你好，我使用清明上河图配合controlnet生成超长图的时候它似乎没起作用，请问这是什么原因呢，是因为预处理器分辨率不够吗

zc61536337 · 2023-05-10T03:02:07Z

"RuntimeError: Invalid buffer size: 6.89 GB" How to solve it?

zc61536337 · 2023-05-10T03:10:56Z

Display 'min and input tensors must be of the same shape' with tiled vae

ShivaeAI · 2023-05-12T10:31:30Z

4x-UltraSharp upscaler and put it in the ESRGAN folder, I didn't find the relevant folder

PotatoBananaApple · 2023-05-12T19:01:09Z

4x-UltraSharp upscaler and put it in the ESRGAN folder, I didn't find the relevant folder

Folder can be found in ...\stable-diffusion-webui\models\ESRGAN

laoraozi · 2023-06-03T16:00:49Z

Can the author show how to generate a realistic style of Qingming River painting through interface manipulation? This plugin will make it easier for me to understand tiling diffuser, area tips and drawing full canvas backgrounds. Thank you very much.

laoraozi · 2023-06-04T13:41:01Z

As you can see, I don't know how the area prompt words and Draw full canvas background you mentioned apply to this painting.

halr9000 · 2023-06-12T09:12:14Z

To those of you asking questions on a closed discussion, you need to take some lessons from an old master at the art of asking questions online.

leopard-LSG · 2023-08-08T14:23:42Z

Is there a setting that works with Intel 16-inch high-end model with 16g of RAM and AMD Radeon Pro 5500M with 6g of vram?

And is there a distinction between Python and PyTorch versions that work? Currently, the desired image size cannot be created in Python 3.10.12 and PyTorch Nightly 2.1.0. If R-ESRGAN 4x+ scale exceeds 1.7 in 512 size, cmd will exit with an mps shortage error.
I followed the settings as described in the description, but it fails.

pkuliyi2015 added the documentation Improvements or additions to documentation label Mar 10, 2023

pkuliyi2015 closed this as completed May 28, 2023

Comparison discussion #3

Comparison discussion #3

Comments

x-legion commented Mar 7, 2023 • edited Loading

pkuliyi2015 commented Mar 7, 2023 • edited Loading

jurandfantom commented Mar 9, 2023

pkuliyi2015 commented Mar 9, 2023 • edited Loading

x-legion commented Mar 9, 2023

jurandfantom commented Mar 9, 2023

x-legion commented Mar 9, 2023

pkuliyi2015 commented Mar 9, 2023

DenkingOfficial commented Mar 9, 2023

pkuliyi2015 commented Mar 9, 2023 • edited Loading

DenkingOfficial commented Mar 9, 2023 • edited Loading

pkuliyi2015 commented Mar 9, 2023 • edited Loading

DenkingOfficial commented Mar 9, 2023 • edited Loading

pkuliyi2015 commented Mar 9, 2023 • edited Loading

pkuliyi2015 commented Mar 9, 2023

DenkingOfficial commented Mar 10, 2023

pkuliyi2015 commented Mar 10, 2023

DenkingOfficial commented Mar 10, 2023

pkuliyi2015 commented Mar 10, 2023

jurandfantom commented Mar 10, 2023 • edited Loading

2blackbar commented Mar 10, 2023 • edited by pkuliyi2015 Loading

jurandfantom commented Mar 10, 2023

jurandfantom commented Mar 10, 2023

x-legion commented Mar 11, 2023

pkuliyi2015 commented Mar 11, 2023

pkuliyi2015 commented Mar 12, 2023

ManOrMonster commented Mar 12, 2023

x-legion commented Mar 13, 2023

New Feature: "ZOOM ENHANCE" for the A111 WebUI. Automatically fix small details like faces and hands!

How does it work?

Features and Benefits

How to use it?

pkuliyi2015 commented Mar 13, 2023 • edited Loading

Vuhiep190297 commented Mar 13, 2023

Rkkss commented Apr 3, 2023

RainehDaze commented Apr 6, 2023

SamBigAbs commented Apr 6, 2023

RainehDaze commented Apr 6, 2023

SamBigAbs commented Apr 6, 2023

RainehDaze commented Apr 6, 2023

SamBigAbs commented Apr 6, 2023

RainehDaze commented Apr 6, 2023

SamBigAbs commented Apr 6, 2023

RainehDaze commented Apr 6, 2023

SamBigAbs commented Apr 6, 2023

RainehDaze commented Apr 6, 2023

SamBigAbs commented Apr 6, 2023

SamBigAbs commented Apr 6, 2023

RainehDaze commented Apr 6, 2023

pkuliyi2015 commented Apr 7, 2023

RainehDaze commented Apr 7, 2023

Rorowalnuss commented Apr 8, 2023

pkuliyi2015 commented Apr 8, 2023

Rorowalnuss commented Apr 9, 2023 • edited Loading

qiuchengzhi commented Apr 14, 2023

zc61536337 commented May 10, 2023

zc61536337 commented May 10, 2023

ShivaeAI commented May 12, 2023

PotatoBananaApple commented May 12, 2023 • edited Loading

laoraozi commented Jun 3, 2023

laoraozi commented Jun 4, 2023

halr9000 commented Jun 12, 2023

leopard-LSG commented Aug 8, 2023

x-legion commented Mar 7, 2023 •

edited

Loading

pkuliyi2015 commented Mar 7, 2023 •

edited

Loading

pkuliyi2015 commented Mar 9, 2023 •

edited

Loading

pkuliyi2015 commented Mar 9, 2023 •

edited

Loading

DenkingOfficial commented Mar 9, 2023 •

edited

Loading

pkuliyi2015 commented Mar 9, 2023 •

edited

Loading

DenkingOfficial commented Mar 9, 2023 •

edited

Loading

pkuliyi2015 commented Mar 9, 2023 •

edited

Loading

jurandfantom commented Mar 10, 2023 •

edited

Loading

2blackbar commented Mar 10, 2023 •

edited by pkuliyi2015

Loading

pkuliyi2015 commented Mar 13, 2023 •

edited

Loading

Rorowalnuss commented Apr 9, 2023 •

edited

Loading

PotatoBananaApple commented May 12, 2023 •

edited

Loading