Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: support TAESD - Tiny Autoencoder for Stable Diffusion #4316

Merged
merged 22 commits into from
Sep 20, 2023

Conversation

keturn
Copy link
Contributor

@keturn keturn commented Aug 18, 2023

TAESD - Tiny Autoencoder for Stable Diffusion - is a tiny VAE that provides significantly better results than my single-multiplication hack but is still very fast.

The entire TAESD model weights are under 10 MB!

This PR requires diffusers 0.20:

To Do

Test with

Have you discussed this change with the InvokeAI team?

Have you updated all relevant documentation?

  • No

Related Tickets & Documents

  • Related Issue #
  • Closes #

QA Instructions, Screenshots, Recordings

Should be able to import these models:

and use them as VAE.

Added/updated tests?

  • Some. There are new tests for VaeFolderProbe based on VAE configurations, but no tests that require the full model weights.

@blessedcoolant
Copy link
Collaborator

I was testing out the Tiny AutoEncoder last night. It definitely is a massive time and speed improvement but the quality of the output is a little bit inferior to the full encoder as expected. Especially prevalent in realistic models where areas like the eyes that need more detail just do not translate well.

Will give this PR a run in a bit.

Maybe we should also consider adding TAESD progress previews?

@keturn
Copy link
Contributor Author

keturn commented Aug 18, 2023

Yep, I don't expect it to be a finishing step for any production work, but it'll be useful for previews and I think also certain kinds of intermediate steps that go on to to be further mashed or re-noised.

And I guess I could imagine a workflow where, if the normal VAE is slow enough for some people, they might elect to do a quick batch generation and then only use the full VAE on the ones they want to keep?

@blessedcoolant
Copy link
Collaborator

blessedcoolant commented Aug 18, 2023

Yeah. Definitely handy to have. Esp if we can do progress previews with TAESD too. That would cut down the generation times coz each step callback decoding will take considerably lesser time.

I'm actually half in mind to actually ship the TAESD models as core models coz they're pretty small and provide the user with an option next to VAE to check Tiny. In which case, we just use the Tiny AutoEncoder otherwise we use the regular.

Unless we anticipate more Tiny AE models and having it locked to just TAESD might be an issue later.

@keturn
Copy link
Contributor Author

keturn commented Aug 19, 2023

I think the encoder is buggy, and I think the bug is upstream: huggingface/diffusers#4676

Update: just waiting on a diffusers release > 0.20 for the fix (diffusers 0.20.2 is out but seems to be a narrowly focused release that doesn't include this; I assume that means we're waiting for 0.21):

Copy link
Collaborator

@psychedelicious psychedelicious left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@singledispatchmethod - cool!

I agree - we should make TAESD a core model. No reason not to.

Then the only question is how to manage preview images with it? The step_callback has access to the InvocationContext and therefore has access to the model manager, so it could easily decode the image in there. I'm just concerned about the overhead of doing this up to 40 times per second.

Anyways, that's for a followup PR.

# Conflicts:
#	invokeai/app/invocations/latent.py
@Millu
Copy link
Contributor

Millu commented Aug 28, 2023

@lstein @StAlKeR7779 could you take a look at this? would be awesome if we could get this in for 3.1

@blessedcoolant
Copy link
Collaborator

If we are adding this, then @lstein will need to add the model support during installation.

@hipsterusername
Copy link
Member

@Millu happy to review, but I think this still needs model support in the installer, right?

cc: @lstein

@Millu
Copy link
Contributor

Millu commented Sep 12, 2023

@Millu happy to review, but I think this still needs model support in the installer, right?

cc: @lstein

We can have this PR merged, and have included TASED as a core model as a follow-up. It's my understanding this PR supports the use of TASED in Invoke and if a user wants it, they'll have to install it manually

@keturn
Copy link
Contributor Author

keturn commented Sep 13, 2023

Doing some manual testing after upgrading to diffusers 0.21.

TAESD encoding and decoding looks to be working okay, but the normal VAE is failing with

[2023-09-13 09:52:08,298]::[InvokeAI]::ERROR --> Error while invoking:
new_LoRACompatibleConv_forward() takes 2 positional arguments but 3 were given
File "…/diffusers/models/resnet.py", line 637, in forward
hidden_states = self.conv1(hidden_states, scale)

new_LoRACompatibleConv_forward seems to be something from Invoke's hotfixes 🤢 and I don't know what a LoRA-related hotfix is doing in the middle of a vae.decode call stack.

@keturn
Copy link
Contributor Author

keturn commented Sep 13, 2023

but I've just confirmed that same failure is on main, so this branch's changes to invocations.latents aren't to blame.

@RyanJDick
Copy link
Collaborator

Doing some manual testing after upgrading to diffusers 0.21.

TAESD encoding and decoding looks to be working okay, but the normal VAE is failing with

[2023-09-13 09:52:08,298]::[InvokeAI]::ERROR --> Error while invoking:
new_LoRACompatibleConv_forward() takes 2 positional arguments but 3 were given
File "…/diffusers/models/resnet.py", line 637, in forward
hidden_states = self.conv1(hidden_states, scale)

new_LoRACompatibleConv_forward seems to be something from Invoke's hotfixes 🤢 and I don't know what a LoRA-related hotfix is doing in the middle of a vae.decode call stack.

I think this should be fixed by: #4534

@RyanJDick
Copy link
Collaborator

I just tested installing both taesd models via the 'Import Models' UI. It seems like the model probe is not correctly detecting the base model.

image

@keturn
Copy link
Contributor Author

keturn commented Sep 20, 2023

It seems like the model probe is not correctly detecting the base model.

Oh, in that taesdxl is showing up under SD and not SDXL? hmm.

# Conflicts:
#	invokeai/backend/model_management/model_probe.py
@keturn
Copy link
Contributor Author

keturn commented Sep 20, 2023

The code that determines whether a VAE's base model is SD or SDXL is here:

class VaeFolderProbe(FolderProbeBase):
def get_base_type(self) -> BaseModelType:
config_file = self.folder_path / "config.json"
if not config_file.exists():
raise InvalidModelException(f"Cannot determine base type for {self.folder_path}")
with open(config_file, "r") as file:
config = json.load(file)
return (
BaseModelType.StableDiffusionXL
if config.get("scaling_factor", 0) == 0.13025 and config.get("sample_size") in [512, 1024]
else BaseModelType.StableDiffusion1
)

it checks certain values inside the VAE's config.json.

However, for TAESD, the config.json files for taesd and taesdxl are identical. Indeed, there's no reason for them to have any different parameters, as the shape of the VAE are the same.

The only heuristic I can think of is to check to see if the model name literally ends in XL.

Or to define some other metadata field and ask @madebyollin to add it to the model's config.

@hipsterusername hipsterusername merged commit b64ade5 into main Sep 20, 2023
8 checks passed
@hipsterusername hipsterusername deleted the feat/taesd branch September 20, 2023 21:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[enhancement]: Support TAESD to display better progress image
6 participants