Skip to content

Is there a reference for the model/architecture used by diffusers anywhere? It doesn't seem to match to the original stable-diffusion repo #901

@tonetechnician

Description

@tonetechnician

What API design would you like to have changed or added to the library? Why?

Hey there!

I'm not sure if this is the right section to post this, but I have a request/question for a write up on the inference configuration used by diffusers. Similar to a config.yaml in other model repos.

Recently I have been digging in quite a bit to diffusers and comparing with other stable diffusion implementations to compare their outputs (see post here).

I've noticed that there quite noticeable differences (both in output and code) between diffusers and the regular stable-diffusion inference model https://github.com/CompVis/stable-diffusion/blob/main/configs/stable-diffusion/v1-inference.yaml as implemented in both Automatic1111 and SD-GUI which give the same results to one another, but diffusers is an outlier.

I dug deeper into the model architecture in diffusers and did notice there are a few differences in the default values set for each block for just about all steps of the stable diffusion process. however, my knowledge of the architecture itself isn't as good as I'd like it to be so I'm mostly comparing the stable diffusion architecture and trying to match it with diffusers. That being said, I did try to match the settings best I could in order to try get a one to one result. Modifying parameters given in the VAE encoder seems to have a quite an effect on what image gets outputted and it's led me to believe there must be a fundamental difference between the inference model and base stable diffusion model.

I did find this script https://github.com/huggingface/diffusers/blob/main/scripts/convert_original_stable_diffusion_to_diffusers.py and ran through the procedure, but wasn't entirely sure what it actually does and how the variables fit in exactly the models used in diffusers, but I do see some defaults differ. I figure @patil-suraj may have a bit more info on the architecture within diffusers, and how it differs from the original stable-diffusion repo.

I've noticed the largest differences seem to be in the img2img pipelines, where I believe the output is not as crisp and sharp as the base stable-diffusion library, and felt that this is something that should probably be solved one way or another.

Would love to know if a config file, or write up on the usage of the conversion scripts in the /scripts directory would be possible!

Metadata

Metadata

Assignees

No one assigned

    Labels

    staleIssues that haven't received updates

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions