Is there a reference for the model/architecture used by diffusers anywhere? It doesn't seem to match to the original stable-diffusion repo

**What API design would you like to have changed or added to the library? Why?**

Hey there! 

I'm not sure if this is the right section to post this, but I have a request/question for a write up on the inference configuration used by diffusers. Similar to a config.yaml in other model repos. 

Recently I have been digging in quite a bit to diffusers and comparing with other stable diffusion implementations to compare their outputs (see post [here](https://discuss.huggingface.co/t/notable-differences-between-other-implementations-of-stable-diffusion-particularly-in-the-img2img-pipeline/24635/7)). 

I've noticed that there quite noticeable differences (both in output and code) between diffusers and the regular stable-diffusion inference model https://github.com/CompVis/stable-diffusion/blob/main/configs/stable-diffusion/v1-inference.yaml as implemented in both [Automatic1111](https://github.com/AUTOMATIC1111/stable-diffusion-webui) and [SD-GUI](https://nmkd.itch.io/t2i-gui) which give the same results to one another, but diffusers is an outlier.

I dug deeper into the model architecture in diffusers and did notice there are a few differences in the default values set for each block for just about all steps of the stable diffusion process. however, my knowledge of the architecture itself isn't as good as I'd like it to be so I'm mostly comparing the [stable diffusion architecture](https://github.com/CompVis/stable-diffusion/tree/main/ldm/models) and trying to match it with diffusers. That being said, I did try to match the settings best I could in order to try get a one to one result. Modifying parameters given in the VAE encoder seems to have a quite an effect on what image gets outputted and it's led me to believe there must be a fundamental difference between the inference model and base stable diffusion model.

I did find this script https://github.com/huggingface/diffusers/blob/main/scripts/convert_original_stable_diffusion_to_diffusers.py and ran through the procedure, but wasn't entirely sure what it actually does and how the variables fit in exactly the models used in diffusers, but I do see some defaults differ. I figure @patil-suraj may have a bit more info on the architecture within diffusers, and how it differs from the original stable-diffusion repo. 

I've noticed the largest differences seem to be in the img2img pipelines, where I believe the output is not as crisp and sharp as the base stable-diffusion library, and felt that this is something that should probably be solved one way or another.

Would love to know if a config file, or write up on the usage of the conversion scripts in the /scripts directory would be possible!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Is there a reference for the model/architecture used by diffusers anywhere? It doesn't seem to match to the original stable-diffusion repo #901

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Is there a reference for the model/architecture used by diffusers anywhere? It doesn't seem to match to the original stable-diffusion repo #901

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions