Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparison discussion #3

Closed
x-legion opened this issue Mar 7, 2023 · 92 comments
Closed

Comparison discussion #3

x-legion opened this issue Mar 7, 2023 · 92 comments
Labels
documentation Improvements or additions to documentation

Comments

@x-legion
Copy link

x-legion commented Mar 7, 2023

MultiDiffusion Seems to be doing worse (not sharp) or am i doing something wrong?
original:
image

MultiDiffusion:
image
Ultimate SD Upscale:
image

@pkuliyi2015
Copy link
Owner

pkuliyi2015 commented Mar 7, 2023

Hello, would you please provide your weights (including the checkpoint & lora needed if you use lora) for your original image? I need them to reproduce your results in an oil-painting fashion. The MultiDiffusion results can be severely affected by the model checkpoints & lora you used.

But generally speaking, extraordinary high CFG Scale, and slightly higher denoising value will give you satisfying details. Example positive prompts are "highres, masterpiece, best quality, ultra-detailed unity 8k wallpaper, extremely clear, very clear, ultra-clear". You don't need anything concrete things in positive prompts; and then, drag the CFG Scale to an extra-large value. Denoising values between 0.1 and 0.4 are all OK but the content will change accordingly.

Here is my result of CFG=20, Sampler=DPM++ SDE Karras, denoising strength=0.3 for example. As I use the protogenX34 checkpoint, my painting style will be wildly different from yours:

00064-2792530863-20230307100606

Please comment on this issue if you find your results have significantly improved after you use proper model and CFG values.

@jurandfantom
Copy link

Hi there, I will write here to not create new "issue" about similar thing.
Would be possible to write down or picture all settings that were used to upscale picture attached in extension description ? I think I tested everything but only what I get is blurred upscaled picture. Here is one of example results that shows how blurry result is (not to mention about lack of extra details with denoise at 0.3 and CFG at 20 - as example). Atm. I want copy 1:1 everything to see if issue is on my side or what. Thanks for create that extension - have high hopes
Example picture.

@pkuliyi2015
Copy link
Owner

pkuliyi2015 commented Mar 9, 2023

Hello, as you wish I provide the PNG info:
image

Here is the text version for your convenience. All resources are public things, but I'm quite busy and cannot provide your links.

masterpiece, best quality, highres, extremely detailed 8k unity wallpaper, ultra-detailed
Negative prompt: EasyNegative
Steps: 24, Sampler: DPM++ SDE Karras, CFG scale: 7, Seed: 1614054406, Size: 4096x3200, Model hash: 2ccfc34fe3, Model: 0.9(Gf_style2) + 0.1(abyssorangemix2_Hard), Denoising strength: 0.4, Clip skip: 3, Mask blur: 4, MultiDiffusion upscaler: 4x_foolhardy_Remacri, MultiDiffusion scale factor: 4, MultiDiffusion tile width: 128, MultiDiffusion tile height: 128, MultiDiffusion overlap: 64

If you don't know any of them, you can Google it. But your result is likely to come from pool positive and negative prompts, where I use a Textual Inversion called EasyNegative from civitai.com.

@x-legion
Copy link
Author

x-legion commented Mar 9, 2023

Click Here for Better Comparison View

original
image

masterpiece, best quality, portrait,
blue fire, silver hair, fox girl, mage, arm extended, holding blue fire, by jordan grimmer and greg rutkowski and pine ハイネ wlop, intricate, beautiful, trending artstation, pixiv, digital art, anime, no torch,
<lora:Noise:1.75>
Negative prompt: EasyNegative, lowres, ((bad anatomy)), ((bad hands)), text, missing finger, extra digits, fewer digits, blurry, ((mutated hands and fingers)), (poorly drawn face), ((mutation)), ((deformed face)), (ugly), ((bad proportions)), ((extra limbs)), extra face, (double head), (extra head), ((extra feet)), monster, logo, cropped, worst quality, low quality, normal quality, jpeg, humpbacked, long body, long neck, ((jpeg artifacts))
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 3857533696, Size: 640x960, Model: dreamniji3fp16, Clip skip: 2, ENSD: 31337, Discard penultimate sigma: True

Ultimate SD upscaler
image

masterpiece, best quality, portrait,
blue fire, silver hair, fox girl, mage, arm extended, holding blue fire, by jordan grimmer and greg rutkowski and pine ハイネ wlop, intricate, beautiful, trending artstation, pixiv, digital art, anime, no torch,
<lora:Noise:1.75>
Negative prompt: EasyNegative, lowres, ((bad anatomy)), ((bad hands)), text, missing finger, extra digits, fewer digits, blurry, ((mutated hands and fingers)), (poorly drawn face), ((mutation)), ((deformed face)), (ugly), ((bad proportions)), ((extra limbs)), extra face, (double head), (extra head), ((extra feet)), monster, logo, cropped, worst quality, low quality, normal quality, jpeg, humpbacked, long body, long neck, ((jpeg artifacts))
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 14, Seed: 3857533696, Size: 1280x1920, Model: dreamniji3fp16, Denoising strength: 0.4, Clip skip: 2, ENSD: 31337, Mask blur: 4, Ultimate SD upscale upscaler: 4x_foolhardy_Remacri, Ultimate SD upscale tile_width: 768, Ultimate SD upscale tile_height: 768, Ultimate SD upscale mask_blur: 8, Ultimate SD upscale padding: 32, Discard penultimate sigma: True

MultiDiffusion
image

masterpiece, best quality, portrait,
blue fire, silver hair, fox girl, mage, arm extended, holding blue fire, by jordan grimmer and greg rutkowski and pine ハイネ wlop, intricate, beautiful, trending artstation, pixiv, digital art, anime, no torch,
<lora:Noise:1.75>
Negative prompt: EasyNegative, lowres, ((bad anatomy)), ((bad hands)), text, missing finger, extra digits, fewer digits, blurry, ((mutated hands and fingers)), (poorly drawn face), ((mutation)), ((deformed face)), (ugly), ((bad proportions)), ((extra limbs)), extra face, (double head), (extra head), ((extra feet)), monster, logo, cropped, worst quality, low quality, normal quality, jpeg, humpbacked, long body, long neck, ((jpeg artifacts))
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 14, Seed: 3857533696, Size: 1280x1920, Model: dreamniji3fp16, Denoising strength: 0.4, Clip skip: 2, ENSD: 31337, Mask blur: 4, MultiDiffusion upscaler: 4x_foolhardy_Remacri, MultiDiffusion scale factor: 2, Discard penultimate sigma: True

@jurandfantom
Copy link

Ok, now I know it might be something wrong on my side. I can see additional details (will check its because of clip skip 3 or upscaler or what) but its still blurred. That super weird - ahh and thanks for reply. Attached pictures to description don't have infos attached (that why I ask :) )
00147-1803174913

@x-legion
Copy link
Author

x-legion commented Mar 9, 2023

https://imgsli.com/MTYwOTcx same here again

@pkuliyi2015
Copy link
Owner

Hello, thanks for your interests in this work. I tried for several minutes on your image and here is my result with no tuning:
https://imgsli.com/MTYxMDI5.

It's hard to tell what is better; if you like illustration-style sharpness and faithfulness to the original image, may be Ultimate SD Upscaler + 4x Ultra Sharp is your best choice. But personally I'd like to see some fabricated details on realistic human face, so I prefer this tool.

It's noteworthy that, the biggest difference between MultiDiffusion and other upscalers is that currently it doesn't support any concrete contents when you upscale a image, otherwise each tile will contain a small character and your image finally becomes blur and messy.

The correct prompts is just as follows. I even don't use lora:

image

And my configurations, FYI:

image

@DenkingOfficial
Copy link

I provide the PNG info

I tried to replicate your settings with an image provided by OP and it's still very blurry:

image

Compared to an image you sent:

image

As you can see, settings are pretty much the same except CFG scale:

image

@pkuliyi2015
Copy link
Owner

pkuliyi2015 commented Mar 9, 2023

Update: Oh I just noticed that, EasyNegative is a textual inversion from civitai.com, it is not a word. Please download that textual inversion.

Here is the link: https://civitai.com/models/7808/easynegative

The Upscalers are important too. I personally use two: 4x-UltraSharp and 4x-remacri. Here is the link:
https://upscale.wiki/wiki/Model_Database
Where you can find the two upscalers and put it in your ESRGAN folder.

@DenkingOfficial
Copy link

DenkingOfficial commented Mar 9, 2023

4x-remacri

I used it with the image above

EasyNegative is a textual inversion

Already downloaded this embedding

@pkuliyi2015
Copy link
Owner

pkuliyi2015 commented Mar 9, 2023

4x-remacri

I used it with the image above

Do you use EasyNegative embeddings?

You mean you have used it in the above images?

@DenkingOfficial
Copy link

DenkingOfficial commented Mar 9, 2023

You mean you have used it in the above images?

Yes, it was used

UPD:

image

@pkuliyi2015
Copy link
Owner

pkuliyi2015 commented Mar 9, 2023

You mean you have used it in the above images?

Yes, it was used

UPD:

I spend some time to find the original PNG info. Here is it, please try to reproduce using my params:
image

@pkuliyi2015
Copy link
Owner

It may not be as easy as the Ultimate Upscaler to use, as it's essentially a completely redraw without post-processing. Personally I have some intuitions to use it:

  • No concrete positive prompts. Just something like clear, very clear, ultra clear
  • Don't use too large tile size as SD 1.4 is only good at 512 - 768 (so you divide it by 8 and get 64 - 96).
  • Large CFG Scales, Eular a & DPM++ SDE Karras, Denoising=0.2-0.4
  • Try both 4x-UltraSharp and 4x-Remacri
  • Clip Skip=2 or 3 worth to try.

@DenkingOfficial
Copy link

please try to reproduce using my params

I just did it and it's a lot better

image

Settings (Even seed is the same):

image

But still it can't generate a result as good as yours
I know it highly depends on a hardware, but there's a very large difference in details
No any optimizations used (Such as xformers, opt-split-attention etc.)

My:
image

And yours:
image

@pkuliyi2015
Copy link
Owner

please try to reproduce using my params

I just did it and it's a lot better

image

Settings (Even seed is the same):

image

But still it can't generate a result as good as yours I know it highly depends on a hardware, but there's a very large difference in details No any optimizations used (Such as xformers, opt-split-attention etc.)

My: image

And yours: image

I'm also confused. Are you using this model?

https://civitai.com/models/3666/protogen-x34-photorealism-official-release

I see our model hash is different. Except from this I couldn't find something else.

@DenkingOfficial
Copy link

I'm also confused. Are you using this model?

Yes, I used protogen_x3.4, but pruned
Now I downloaded 5GB version with the same hash as your and THAT'S AMAZING

Very huge improvement in details:

image

It still not produces the exact same result as yours, I quess it depends on a hardware, but details are unbelievable, I can clearly see stitch seam on the sleeve

@pkuliyi2015
Copy link
Owner

Oh thanks for your feedback. I don't know that pruned model can affect the details too before you test it.

@jurandfantom
Copy link

jurandfantom commented Mar 10, 2023

Ohh! I think not many knows that to be honest o_O As much as I understand pruning, it should not affect such task as upscalling via small tiles? I gonna try with not pruned model as well and let you know.

Edit. No clue but today everything works as it should. Maybe Its needed to turn off and on everything, not just to restart UI - just like during installing Dreambooth

@pkuliyi2015 pkuliyi2015 added the documentation Improvements or additions to documentation label Mar 10, 2023
@2blackbar
Copy link

2blackbar commented Mar 10, 2023

tried it and to be honest esrgan upscalers do 99% of the lifting, it barely does anything when used with lanczos, unless theres gonna be examples of it with lanczos where it introduces new details ? Best bet is to just upscale with esrgan by 2 and go to inpaint with it to mask the parts one by one to upscale them since you gonna have more pixel area to resolve detail, so unless someone will automate that , its gonna stay as the best way to upscale

@jurandfantom
Copy link

More tests. ControlNet not work or it need way lower denoise than I used.
Upscaling for attached was in two passes plus dynamic CFG script - agree, way to off from original picture, but now when i know what and where, its time for fine tunning (hopefully to figure out issue with control net).
00034-715773611 - Copy
Indeed its essential to test couple upscalers because differences are huge - even bigger than used SD model.

@jurandfantom
Copy link

23,03,10 - 16,01,21 - 7331 a
Left is my, right is pkuliyi2015
As you can see, left have way more details, but some noise and weird issues as well - pure remacri x4 looks almost like pkuliyi2015 version. Plenty of space for tests

@x-legion
Copy link
Author

tried it and to be honest esrgan upscalers do 99% of the lifting, it barely does anything when used with lanczos, unless theres gonna be examples of it with lanczos where it introduces new details ? Best bet is to just upscale with esrgan by 2 and go to inpaint with it to mask the parts one by one to upscale them since you gonna have more pixel area to resolve detail, so unless someone will automate that , its gonna stay as the best way to upscale

This is basically a tile-by-tile img2img SD redraw. So if you don't give it high strength it doesn't work as you expected. However, one of the weakness is that it currently cannot automatically map your prompts to different areas... If you can use stronger prompts, it should be way better.

But I'm working on Automatic Prompt Mapping. In img2img, it works by first estimate the attention map of your prompt to the original picture, and then re-apply them to multidiffusion tiles. In txt2img this may be similar, but I need time to do so.

https://github.com/dustysys/ddetailer.git try this one

@pkuliyi2015
Copy link
Owner

tried it and to be honest esrgan upscalers do 99% of the lifting, it barely does anything when used with lanczos, unless theres gonna be examples of it with lanczos where it introduces new details ? Best bet is to just upscale with esrgan by 2 and go to inpaint with it to mask the parts one by one to upscale them since you gonna have more pixel area to resolve detail, so unless someone will automate that , its gonna stay as the best way to upscale

I’m sorry for accidentally wrong edit.

This is basically a tile-by-tile img2img SD redraw. So if you don't give it high strength it doesn't work as you expected. However, one of the weakness is that it currently cannot automatically map your prompts to different areas... If you can use stronger prompts, it should be way better.

But I'm working on Automatic Prompt Mapping. In img2img, it works by first estimate the attention map of your prompt to the original picture, and then re-apply them to multidiffusion tiles. In txt2img this may be similar, but I need time to do so.

@pkuliyi2015
Copy link
Owner

The key point is that I need a user interface to draw bbox, so that you can draw rectangles and control the MultiDiffusion with different prompts. In this way the result should get way better.

Why? because in this way you can just select the woman's face and tell SD to draw a beautiful woman's face. Then the SD will try his best, using his 512 * 512 resolution to ONLY draw a face. The resolution will be unprecedentedly high for SD models, as he dedicated to draw only one part of the image at the best of his capabilities.

However, when I was adding features I saw this f**king issue:
gradio-app/gradio#2316

Some one pr a bbox tool but the officials denied the merging:
gradio-app/gradio#3220

I don't know what are they thinking in mind to deny such a good PR (from my perspective) but don't provide their own solutions. It has been a half year since it was first proposed.

So it will be hard to draw rectangles on images directly. I must find another way to draw rectangles. Do you have any other idea?

@ManOrMonster
Copy link

So it will be hard to draw rectangles on images directly. I must find another way to draw rectangles. Do you have any other idea?

Check out this extension: https://github.com/hnmr293/sd-webui-llul

It fakes it by having you move around a rectangle in a separate window.

image

@x-legion
Copy link
Author

https://www.reddit.com/r/StableDiffusion/comments/11pyiro/new_feature_zoom_enhance_for_the_a111_webui/

New Feature: "ZOOM ENHANCE" for the A111 WebUI. Automatically fix small details like faces and hands!

Hello, fellow Stable Diffusion users! I'm excited to share with you a new feature that I've added to the Unprompted extension: it's the [zoom_enhance] shortcode.

If you're not familiar with Unprompted, it's a powerful extension that lets you use various shortcodes in your prompts to enhance your text generation experience. You can learn more about it here.

The [zoom_enhance] shortcode is inspired by the fictional technology from CSI, where they can magically zoom in on any pixelated image and reveal crisp details. Of course, this is not possible in real life, but we can get pretty close with Stable Diffusion and some clever tricks.

The shortcode allows you to automatically upscale small details within your image where Stable Diffusion tends to struggle. It is particularly good at fixing faces and hands in long-distance shots.

How does it work?

The [zoom_enhance] shortcode searches your image for specified target(s), crops out the matching regions and processes them through [img2img]. It then blends the result back into your original image. All of this happens behind-the-scenes without adding any unnecessary steps to your workflow. Just set it and forget it.

Features and Benefits

  • Great in both txt2img and img2img modes.
  • The shortcode is powered by the [txt2mask] implementation of clipseg, which means you can search for literally anything as a replacement target, and you get access to the full suite of [txt2mask] settings, such as "padding" and "negative_mask."
  • It's also pretty good at deepfakes. Set mask="face" and replacement="another person's face" and check out the results.
  • It applies a gaussian blur to the boundaries of the upscaled image which helps it blend seamlessly with the original.
  • It is equipped with Dynamic Denoising Strength which is based on a simple idea: the smaller your replacement target, the worse it probably looks. Think about it: when you generate a character who's far away from the camera, their face is often a complete mess. So, the shortcode will use a high denoising strength for small objects and a low strength for larger ones.
  • It is significantly faster than Hires Fix and won't mess up the rest of your image.
  • Compatible with A111's color correction setting.

How to use it?

To use this feature, you need to have Unprompted installed on your WebUI. If you don't have it yet, you can get it from here.

Once you have Unprompted, simply add this line anywhere in your prompt:

@pkuliyi2015
Copy link
Owner

pkuliyi2015 commented Mar 13, 2023

I have investigated a new technology DDNM (https://github.com/wyhuai/DDNM) that is very powerful in super-resolution. And it is also compatible with MultiDiffusion. Through initial test I found it is amazing. I believe this can beat their new feature in a compelling way.

The automatic mask technology seems not very compatible with multi-diffusion txt2img but I will try in img2img

@Vuhiep190297
Copy link

How long does it take you to upgrade a photo, how can it be faster? Here are my settings
image
image
image

@Rkkss
Copy link

Rkkss commented Apr 3, 2023

I made a few comparison with Ultimate upscaler (default settings, CFG 10, DDIM, denoise 0.23 ) and mixture of diffuser.
The original image
00026-57791799

vs Denoise 0.23, DDIM:
https://imgsli.com/MTY2Njkw

vs Denoise 0.35 DDIM
https://imgsli.com/MTY2Njkx

vs Denoise 0.35 Euler A - cfg 14
https://imgsli.com/MTY2Njky

MD is good at adding extra details, without overcook image, you can go with high denoise and cfg but as far upscaling go, Ultimate SD upscaler still has less pixelated texture when you zoom in, especially the hand and face.
Parameter for MD:
Tiled Diffusion upscaler: 4x-UltraSharp, Tiled Diffusion scale factor: 2, Tiled Diffusion: "{'Method': 'Mixture of Diffusers', 'Latent tile width': 64, 'Latent tile height': 64, 'Overlap': 48, 'Tile batch size': 4, 'Upscaler': '4x-UltraSharp', 'Scale factor': 2, 'Keep input size': True}"
Got way bad result with recommended settings. Maybe I'm doing wrong

I inpainted a bit before upscaling, here is the actual original image if anyone want to try out:
https://files.catbox.moe/wek7ed.png

@RainehDaze
Copy link

Hm, I've been using the region prompt, and I've noticed that if anything, it seems even worse about concatenating random people on the boundaries--even if there's nothing in the main prompt about people.

This sort of thing is nearly constant:
image

In many regards, it's performing even worse than just straight generating a 1024x1024 image. 20 images in a batch, and only one didn't have extra people (instead, it duplicated the entire horizon):
image

@SamBigAbs
Copy link

Hm, I've been using the region prompt, and I've noticed that if anything, it seems even worse about concatenating random people on the boundaries--even if there's nothing in the main prompt about people.

This sort of thing is nearly constant: image

In many regards, it's performing even worse than just straight generating a 1024x1024 image. 20 images in a batch, and only one didn't have extra people (instead, it duplicated the entire horizon): image

Because most models are 512x512 this is more likely to occur the larger the images you try to create. I would first check that your batch size in multidiffusion is 1 with . Increasing Tile size and/or overlap may decrease the likelihood of this occurring. Try 80x80 with an overlap of 16 and latent tile batch size of 8, or 96x96 with an overlap of 32, or even 128x128 with an overlap of 64 and batch size of 4. If that does not fix it than create the images at 768x768 or even 512x512 then upscale them.
The greater the overlap the more context each tile has from its surrounding tiles.

@RainehDaze
Copy link

Because most models are 512x512 this is more likely to occur the larger the images you try to create. I would first check that your batch size in multidiffusion is 1 with . Increasing Tile size and/or overlap may decrease the likelihood of this occurring. Try 80x80 with an overlap of 16 and latent tile batch size of 8, or 96x96 with an overlap of 32, or even 128x128 with an overlap of 64 and batch size of 4. If that does not fix it than create the images at 768x768 or even 512x512 then upscale them. The greater the overlap the more context each tile has from its surrounding tiles.

The point is, the region control is supposed to be used to help avoid such things while composing larger images. This is literally the point of it, and what the demonstration pictures were showing.

@SamBigAbs
Copy link

Because most models are 512x512 this is more likely to occur the larger the images you try to create. I would first check that your batch size in multidiffusion is 1 with . Increasing Tile size and/or overlap may decrease the likelihood of this occurring. Try 80x80 with an overlap of 16 and latent tile batch size of 8, or 96x96 with an overlap of 32, or even 128x128 with an overlap of 64 and batch size of 4. If that does not fix it than create the images at 768x768 or even 512x512 then upscale them. The greater the overlap the more context each tile has from its surrounding tiles.

The point is, the region control is supposed to be used to help avoid such things while composing larger images. This is literally the point of it, and what the demonstration pictures were showing.

Have you tried what I suggested what I suggested?

@RainehDaze
Copy link

Because most models are 512x512 this is more likely to occur the larger the images you try to create. I would first check that your batch size in multidiffusion is 1 with . Increasing Tile size and/or overlap may decrease the likelihood of this occurring. Try 80x80 with an overlap of 16 and latent tile batch size of 8, or 96x96 with an overlap of 32, or even 128x128 with an overlap of 64 and batch size of 4. If that does not fix it than create the images at 768x768 or even 512x512 then upscale them. The greater the overlap the more context each tile has from its surrounding tiles.

The point is, the region control is supposed to be used to help avoid such things while composing larger images. This is literally the point of it, and what the demonstration pictures were showing.

Have you tried what I suggested what I suggested?

You realise your suggestions are totally irrelevant, right?

Like, the point of region prompting is that you can have a larger image (with a background prompt using the usual MD merging), and then a specific foreground region (or regions) that are meant to contain specific things. It's even spelled out on the main page, including that tile size doesn't really matter for this one.

Creating an image at 512 or 768 and upscaling also completely defeats the point, which is that your standard SD-sized generation would only be a component of an image with a different aspect ratio, and not full of body part concatenation.

(I think it might actually be that some quality-related things tend to act as catalysts for drawing people; I'm not sure and I'm going to keep poking away)

@SamBigAbs
Copy link

image at 512 or 768 and upscaling also completely defeats the point, which is that your standard SD-sized generation would only be a component of an image with a different aspect ratio, and not full of body pa

image

Image height is 1024. 8 tiles are in a batch so a height of 128 prevents the problem you are having. You can see my settings in the screenshot.

@RainehDaze
Copy link

image at 512 or 768 and upscaling also completely defeats the point, which is that your standard SD-sized generation would only be a component of an image with a different aspect ratio, and not full of body pa

image

Image height is 1024. 8 tiles are in a batch so a height of 128 prevents the problem you are having. You can see my settings in the screenshot.

It doesn't, actually, because 128x128 tiles were what I was using when I was testing, and the concatenation kept happening. I'm pretty sure, after some more tests, that random tokens were actually prompting for people (for some unspeakable reason). Getting to this sort of thing consistently was a matter of changing the prompt settings, not messing with the tiles:
image

@SamBigAbs
Copy link

image at 512 or 768 and upscaling also completely defeats the point, which is that your standard SD-sized generation would only be a component of an image with a different aspect ratio, and not full of body pa

image
Image height is 1024. 8 tiles are in a batch so a height of 128 prevents the problem you are having. You can see my settings in the screenshot.

It doesn't, actually, because 128x128 tiles were what I was using when I was testing, and the concatenation kept happening. I'm pretty sure, after some more tests, that random tokens were actually prompting for people (for some unspeakable reason). Getting to this sort of thing consistently was a matter of changing the prompt settings, not messing with the tiles: image

If you check powershell or command line it shows that at 128x128 Multi Diffusion does not take effect because the image is too small. You have to use a tile width smaller than 128.

@RainehDaze
Copy link

Looking at the command line, multidiffusion was doing its thing. Probably because of region control. Which, again, to reference the MAIN PAGE FOR THIS REPO, says (with regard to region prompt)

The tile size parameters become useless; just ignore them

seriously, do you think the person maintaining this knows less about how it works than you do?

@SamBigAbs
Copy link

Looking at the command line, multidiffusion was doing its thing. Probably because of region control. Which, again, to reference the MAIN PAGE FOR THIS REPO, says (with regard to region prompt)

The tile size parameters become useless; just ignore them

seriously, do you think the person maintaining this knows less about how it works than you do?

He is probably referring to it in the context of img2img, not txt2img.
And yes it possible to know more about how to use a tool than the person that made it. Musicians are better at their instruments than the people that made them.

@RainehDaze
Copy link

That's like saying a guitar player knows more about how an amplifier works.

@SamBigAbs
Copy link

That's like saying a guitar player knows more about how an amplifier works.

No. It's like the thing I said. You can't just come up with a different analogy to discredit my first one.

@SamBigAbs
Copy link

That's like saying a guitar player knows more about how an amplifier works.

In fact that is the exact opposite of my original analogy which is that the artist can utilize the tool better than the creator. It does not imply that the artist has the ability to design or create the tool.

@RainehDaze
Copy link

Your analogy was flawed, because I said how it works. The creator of something is more likely to know whether a certain setting actually does anything for a given setting than a user, even if the user is extremely good at it.

Anyway, I did more testing. It was the prompt causing humans to be generated where they really shouldn't be (like the entire half of an image that was only supposed to be scenery) and concatenating things when adjacent. Seriously, it was doing things like this:
image
or this:
image
When there was supposed to be only scenery to either side (and obviously nothing was describing those particular people). As I noted, it seems that a lot of tags that describe image quality are actually tied really strongly to generating people.

@pkuliyi2015
Copy link
Owner

Thank you for making attempts on this. This is a classical noise pollution problem where the foreground noises triggered the undesirable multi-character change in the background, when your model is not that good for high resolution image generation.

This can be partly mitigate by adding some negative prompts in the background regions. However, this may not solve the problem totally. I am considering a much more powerful merging strategy and corresponding ui that lets you fuses images better.

you will definitely like it.

@RainehDaze
Copy link

Thank you for making attempts on this. This is a classical noise pollution problem where the foreground noises triggered the undesirable multi-character change in the background, when your model is not that good for high resolution image generation.

This can be partly mitigate by adding some negative prompts in the background regions. However, this may not solve the problem totally. I am considering a much more powerful merging strategy and corresponding ui that lets you fuses images better.

you will definitely like it.

It wasn't too bad once there were no triggering tags in the general prompt (only 5 or 6 out of 100), and I got this out of it all with region control and the noise inversion:
00308-2862577431-masterpiece_best_quality_highres_extremely_detailed_8k_wallpaper_very_clear

But anything that would make for better image composition is great (only about 9 of the 100 had reasonable background coherency).

@Rorowalnuss
Copy link

Hello, I am trying to use Multidifusion to place kemono characters in the background, but the checkpoint I am using requires Hires fix and hypernet to be enabled by default, otherwise it will generate humans.

The overall prompt words only describe the camera and background, as well as enabling hypernet. Enter character prompts for the foreground, and no prompts for the background. The first few steps of the denoising process can generate kemono normally, but in the end, the Hires fix transforms the character into a human. I tried to reduce the denoising value of the Hires fix, but it will result in fewer and more blurry image details. Increasing the denoising will make the character more like a human.

I don't know if this situation is due to the incompatibility between Hires fix and Multidifusion or if hypernet did not start properly.

grid-0182bf1af6144efcd030d691538214e17759d15a55e1

@pkuliyi2015
Copy link
Owner

Hello, I am trying to use Multidifusion to place kemono characters in the background, but the checkpoint I am using requires Hires fix and hypernet to be enabled by default, otherwise it will generate humans.

The overall prompt words only describe the camera and background, as well as enabling hypernet. Enter character prompts for the foreground, and no prompts for the background. The first few steps of the denoising process can generate kemono normally, but in the end, the Hires fix transforms the character into a human. I tried to reduce the denoising value of the Hires fix, but it will result in fewer and more blurry image details. Increasing the denoising will make the character more like a human.

I don't know if this situation is due to the incompatibility between Hires fix and Multidifusion or if hypernet did not start properly.

grid-0182bf1af6144efcd030d691538214e17759d15a55e1

I make a trial fix. Please switch to the dev branch and have a test. If it works please tell me on time.

@Rorowalnuss
Copy link

Rorowalnuss commented Apr 9, 2023

Hello, I am trying to use Multidifusion to place kemono characters in the background, but the checkpoint I am using requires Hires fix and hypernet to be enabled by default, otherwise it will generate humans.
The overall prompt words only describe the camera and background, as well as enabling hypernet. Enter character prompts for the foreground, and no prompts for the background. The first few steps of the denoising process can generate kemono normally, but in the end, the Hires fix transforms the character into a human. I tried to reduce the denoising value of the Hires fix, but it will result in fewer and more blurry image details. Increasing the denoising will make the character more like a human.
I don't know if this situation is due to the incompatibility between Hires fix and Multidifusion or if hypernet did not start properly.
grid-0182bf1af6144efcd030d691538214e17759d15a55e1

I make a trial fix. Please switch to the dev branch and have a test. If it works please tell me on time.

IT doesnt work well.The first image uses Multidifusion with Hires fix Denoising=0.7, while the second image does not use Multidifusion.
You can see that using Multidifusion generates completely different characters, and the third image is a screenshot of the denoising process.

grid-0201
grid-0200
20230409105552

I tried to turn off Hires fix when using Multidifusion in t2i and move the generated blurry image to i2i, but the background details did not increase. To be honest, it was only changed to high-definition, while Hires fix can add things that were not in the original image.

01407-1454243371-(masterpiece_1 3), (2D_1 0), (anime_1 0), (illustration_1 0), (sharp_1 2),_(hard light_1 0), (shadow_1 0),(reflection, refractio
00035-3734885203-(masterpiece_1 3), (2D_1 0), (anime_1 0), (illustration_1 0), (sharp_1 2),_(hard light_1 0), (shadow_1 0),(reflection, refractio (1)

I also tried the other three models proposed by the checkpoint author, neither of which requires Hypernet to be enabled. However, two of these models also encountered a problem with character image changes when opening both Hires fix and Multidifusion, while the other model was able to generate Kemono characters normally.
If you are interested, the model address is below.

https://civitai.com/models/11888?modelVersionId=32830

It has been verified that the model that can use Multidifusion normally is crossfemono2.0, while the models that cannot be used normally are G, G2, F, and D

@qiuchengzhi
Copy link

00662-343669817-Masterpiece, best quality, highres, ultra-detailed 8k unity wallpaper, bird's-eye view, trees, ancient architectures, stones, fa
你好,我使用清明上河图配合controlnet生成超长图的时候它似乎没起作用,请问这是什么原因呢,是因为预处理器分辨率不够吗

@zc61536337
Copy link

"RuntimeError: Invalid buffer size: 6.89 GB" How to solve it?

@zc61536337
Copy link

Display 'min and input tensors must be of the same shape' with tiled vae

@ShivaeAI
Copy link

4x-UltraSharp upscaler and put it in the ESRGAN folder, I didn't find the relevant folder
dad

@PotatoBananaApple
Copy link

PotatoBananaApple commented May 12, 2023

4x-UltraSharp upscaler and put it in the ESRGAN folder, I didn't find the relevant folder dad

Folder can be found in ...\stable-diffusion-webui\models\ESRGAN

@laoraozi
Copy link

laoraozi commented Jun 3, 2023

Can the author show how to generate a realistic style of Qingming River painting through interface manipulation? This plugin will make it easier for me to understand tiling diffuser, area tips and drawing full canvas backgrounds. Thank you very much.

@laoraozi
Copy link

laoraozi commented Jun 4, 2023

01957-1152809661-Masterpiece,-best-quality,-highres,-ultra-detailed-8k-unity-wallpaper,-bird's-eye-view,-trees,-ancient-architectures,-stones,-fa
As you can see, I don't know how the area prompt words and Draw full canvas background you mentioned apply to this painting.

@halr9000
Copy link

To those of you asking questions on a closed discussion, you need to take some lessons from an old master at the art of asking questions online.

@leopard-LSG
Copy link

Is there a setting that works with Intel 16-inch high-end model with 16g of RAM and AMD Radeon Pro 5500M with 6g of vram?

And is there a distinction between Python and PyTorch versions that work? Currently, the desired image size cannot be created in Python 3.10.12 and PyTorch Nightly 2.1.0. If R-ESRGAN 4x+ scale exceeds 1.7 in 512 size, cmd will exit with an mps shortage error.
I followed the settings as described in the description, but it fails.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests