Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feat] Support SDXL Kohya-style LoRA #4287

Merged
merged 99 commits into from
Jul 28, 2023
Merged

[Feat] Support SDXL Kohya-style LoRA #4287

merged 99 commits into from
Jul 28, 2023

Conversation

sayakpaul
Copy link
Member

@sayakpaul sayakpaul commented Jul 26, 2023

Introduces support for loading SDXL Kohya-style LoRAs in diffusers through load_lora_weights().

Currently, it doesn't work. Opening this PR so that we can discuss it further (as discussed with @patrickvonplaten internally).

If we try to do:

from diffusers import DiffusionPipeline
import torch 

base_model_id = "stabilityai/stable-diffusion-xl-base-0.9"
pipeline = DiffusionPipeline.from_pretrained(base_model_id, torch_dtype=torch.float16).to("cuda:0")
pipeline.load_lora_weights(".", weight_name="Kamepan.safetensors")

prompt = "anime screencap, glint, drawing, best quality, light smile, shy, a full body of a girl wearing wedding dress in the middle of the forest beneath the trees, fireflies, big eyes, 2d, cute, anime girl, waifu, cel shading, magical girl, vivid colors, (outline:1.1), manga anime artstyle, masterpiece, offical wallpaper, glint <lora:kame_sdxl_v2:1>"
negative_prompt = "(deformed, bad quality, sketch, depth of field, blurry:1.1), grainy, bad anatomy, bad perspective, old, ugly, realistic, cartoon, disney, bad propotions"
generator = torch.manual_seed(2947883060)
num_inference_steps = 30
guidance_scale = 7

image = pipeline(
    prompt=prompt, negative_prompt=negative_prompt, num_inference_steps=num_inference_steps,
    generator=generator, guidance_scale=guidance_scale
).images[0]
image.save("Kamepan.png")

(The checkpoint was downloaded from https://civitai.com/models/22279?modelVersionId=118556).

This will lead to the following problem:

Traceback (most recent call last):
  File "/home/sayak/test_sdxl_lora.py", line 6, in <module>
    pipeline.load_lora_weights(".", weight_name="Kamepan.safetensors")
  File "/home/sayak/diffusers/src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py", line 857, in load_lora_weights
    self.load_lora_into_unet(state_dict, network_alphas=network_alphas, unet=self.unet)
  File "/home/sayak/diffusers/src/diffusers/loaders.py", line 1064, in load_lora_into_unet
    unet.load_attn_procs(unet_lora_state_dict, network_alphas=network_alphas)
  File "/home/sayak/diffusers/src/diffusers/loaders.py", line 433, in load_attn_procs
    raise ValueError(
ValueError: None does not seem to be in the correct format expected by LoRA or Custom Diffusion training.

And rightfully so. Let's see why.

I investigated a bit and indeed there are some entrants here. If we do:

from safetensors.torch import load_file

sd_lora_state_dict = load_file("Kamepan.safetensors")

print("Just loaded.\n")
for k in sd_lora_state_dict:
    new_k = k.replace("lora_unet_", "")
    new_k = new_k.replace("lora_te_", "")
    if "lora" not in new_k:
        print(new_k)

It will print some checkpoints like so:

input_blocks_4_1_proj_in.alpha
input_blocks_4_1_proj_out.alpha
input_blocks_4_1_transformer_blocks_0_attn1_to_k.alpha
input_blocks_4_1_transformer_blocks_0_attn1_to_out_0.alpha
input_blocks_4_1_transformer_blocks_0_attn1_to_q.alpha
input_blocks_4_1_transformer_blocks_0_attn1_to_v.alpha
input_blocks_4_1_transformer_blocks_0_attn2_to_k.alpha
input_blocks_4_1_transformer_blocks_0_attn2_to_out_0.alpha
input_blocks_4_1_transformer_blocks_0_attn2_to_q.alpha
input_blocks_4_1_transformer_blocks_0_attn2_to_v.alpha
input_blocks_4_1_transformer_blocks_0_ff_net_0_proj.alpha
input_blocks_4_1_transformer_blocks_0_ff_net_2.alpha
input_blocks_4_1_transformer_blocks_1_attn1_to_k.alpha
input_blocks_4_1_transformer_blocks_1_attn1_to_out_0.alpha
input_blocks_4_1_transformer_blocks_1_attn1_to_q.alpha
input_blocks_4_1_transformer_blocks_1_attn1_to_v.alpha
input_blocks_4_1_transformer_blocks_1_attn2_to_k.alpha
input_blocks_4_1_transformer_blocks_1_attn2_to_out_0.alpha
input_blocks_4_1_transformer_blocks_1_attn2_to_q.alpha
input_blocks_4_1_transformer_blocks_1_attn2_to_v.alpha
input_blocks_4_1_transformer_blocks_1_ff_net_0_proj.alpha
input_blocks_4_1_transformer_blocks_1_ff_net_2.alpha
input_blocks_5_1_proj_in.alpha
...

We need to figure out how to deal with this new block(s) as we always assume the lora identifier will always be present in the state dict we finally populate in the diffusers modules.

is_lora = all("lora" in k for k in state_dict.keys())

There are three checkpoints worth checking out:

All of these LoRAs have multiple network_alpha values for which I have added support in the PR.

A couple things need to be also discussed:

  • How do we handle state dict munging in the _convert_kohya_lora_to_diffusers() method for the second text encoder if it's present? We probably shouldn't override this in the SDXL pipeline source code to avoid duplication of code. I suggest we do the changes necessary in _convert_kohya_lora_to_diffusers() of LoraLoaderMixin.
  • Support hada and skip connection LoRAs should be revisited in a future PR.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jul 26, 2023

The documentation is not available anymore as the PR was closed or merged.

src/diffusers/loaders.py Outdated Show resolved Hide resolved
@patrickvonplaten
Copy link
Contributor

Related: #4286

@isidentical
Copy link
Contributor

How do we handle state dict munging in the _convert_kohya_lora_to_diffusers() method for the second text encoder if it's present? We probably shouldn't override this in the SDXL pipeline source code to avoid duplication of code. I suggest we do the changes necessary in _convert_kohya_lora_to_diffusers() of LoraLoaderMixin.

I was also working on a parallel PR (sorry, didn't see this one 😄) where for this I concluded the easiest way was to simply group lora_te1_ into the same group as lora_te_ (so te_state_dict) and create an in place state-dict for lora_te2_ which gets merged to the final text encoder state dict with the correct prefix (te_state_dict adds the TEXT_ENCODER prefix which is for the TE1, I just modified the code to do the same for te2_state_dict with TEXT_ENCODER_2` and merged the final result).

The LoRA loading logic in the SDXL pipeline already handled the rest so I agree this is the simplest way.

@patrickvonplaten
Copy link
Contributor

I'm currently testing with the official SD-XL 1.0 LoRA: https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/sd_xl_offset_example-lora_1.0.safetensors

I've added some general block structure SGM<>Diffusers renaming as already noticed here.

Also refactored some code to make it easier to use/read (I think).

I'm using the following code snippet for testing:

from diffusers import DiffusionPipeline
import torch
pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16)
pipe.load_lora_weights("./sd_xl_offset_example-lora_1.0.safetensors")

@sayakpaul
Copy link
Member Author

@isidentical I will be offline for some time today. But let's maybe chat internally a bit and figure it out so that we can collaborate effectively and ship this. WDYT?

@patrickvonplaten thanks a lot for chiming in.

@isidentical
Copy link
Contributor

I will be offline for some time today. But let's maybe chat internally a bit and figure it out so that we can collaborate effectively and ship this. WDYT?

Would love that! I sent you an internal message about the details for the chat.

@patrickvonplaten patrickvonplaten merged commit 4a4cdd6 into main Jul 28, 2023
10 checks passed
@patrickvonplaten patrickvonplaten deleted the feat/sdxl-lora-1 branch July 28, 2023 17:50
@BitPhinix
Copy link

You guys rock!

sayakpaul added a commit that referenced this pull request Jul 28, 2023
* sdxl lora changes.

* better name replacement.

* better replacement.

* debugging

* debugging

* debugging

* debugging

* debugging

* remove print.

* print state dict keys.

* print

* distingisuih better

* debuggable.

* fxi: tyests

* fix: arg from training script.

* access from class.

* run style

* debug

* save intermediate

* some simplifications for SDXL LoRA

* styling

* unet config is not needed in diffusers format.

* fix: dynamic SGM block mapping for SDXL kohya loras (#4322)

* Use lora compatible layers for linear proj_in/proj_out (#4323)

* improve condition for using the sgm_diffusers mapping

* informative comment.

* load compatible keys and embedding layer maaping.

* Get SDXL 1.0 example lora to load

* simplify

* specif ranks and hidden sizes.

* better handling of k rank and hidden

* debug

* debug

* debug

* debug

* debug

* fix: alpha keys

* add check for handling LoRAAttnAddedKVProcessor

* sanity comment

* modifications for text encoder SDXL

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* denugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* up

* up

* up

* up

* up

* up

* unneeded comments.

* unneeded comments.

* kwargs for the other attention processors.

* kwargs for the other attention processors.

* debugging

* debugging

* debugging

* debugging

* improve

* debugging

* debugging

* more print

* Fix alphas

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* clean up

* clean up.

* debugging

* fix: text

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Batuhan Taskaya <batuhan@python.org>
orpatashnik pushed a commit to orpatashnik/diffusers that referenced this pull request Aug 1, 2023
* sdxl lora changes.

* better name replacement.

* better replacement.

* debugging

* debugging

* debugging

* debugging

* debugging

* remove print.

* print state dict keys.

* print

* distingisuih better

* debuggable.

* fxi: tyests

* fix: arg from training script.

* access from class.

* run style

* debug

* save intermediate

* some simplifications for SDXL LoRA

* styling

* unet config is not needed in diffusers format.

* fix: dynamic SGM block mapping for SDXL kohya loras (huggingface#4322)

* Use lora compatible layers for linear proj_in/proj_out (huggingface#4323)

* improve condition for using the sgm_diffusers mapping

* informative comment.

* load compatible keys and embedding layer maaping.

* Get SDXL 1.0 example lora to load

* simplify

* specif ranks and hidden sizes.

* better handling of k rank and hidden

* debug

* debug

* debug

* debug

* debug

* fix: alpha keys

* add check for handling LoRAAttnAddedKVProcessor

* sanity comment

* modifications for text encoder SDXL

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* denugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* up

* up

* up

* up

* up

* up

* unneeded comments.

* unneeded comments.

* kwargs for the other attention processors.

* kwargs for the other attention processors.

* debugging

* debugging

* debugging

* debugging

* improve

* debugging

* debugging

* more print

* Fix alphas

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* clean up

* clean up.

* debugging

* fix: text

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Batuhan Taskaya <batuhan@python.org>
orpatashnik pushed a commit to orpatashnik/diffusers that referenced this pull request Aug 1, 2023
* sdxl lora changes.

* better name replacement.

* better replacement.

* debugging

* debugging

* debugging

* debugging

* debugging

* remove print.

* print state dict keys.

* print

* distingisuih better

* debuggable.

* fxi: tyests

* fix: arg from training script.

* access from class.

* run style

* debug

* save intermediate

* some simplifications for SDXL LoRA

* styling

* unet config is not needed in diffusers format.

* fix: dynamic SGM block mapping for SDXL kohya loras (huggingface#4322)

* Use lora compatible layers for linear proj_in/proj_out (huggingface#4323)

* improve condition for using the sgm_diffusers mapping

* informative comment.

* load compatible keys and embedding layer maaping.

* Get SDXL 1.0 example lora to load

* simplify

* specif ranks and hidden sizes.

* better handling of k rank and hidden

* debug

* debug

* debug

* debug

* debug

* fix: alpha keys

* add check for handling LoRAAttnAddedKVProcessor

* sanity comment

* modifications for text encoder SDXL

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* denugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* up

* up

* up

* up

* up

* up

* unneeded comments.

* unneeded comments.

* kwargs for the other attention processors.

* kwargs for the other attention processors.

* debugging

* debugging

* debugging

* debugging

* improve

* debugging

* debugging

* more print

* Fix alphas

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* clean up

* clean up.

* debugging

* fix: text

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Batuhan Taskaya <batuhan@python.org>
orpatashnik pushed a commit to orpatashnik/diffusers that referenced this pull request Aug 1, 2023
* sdxl lora changes.

* better name replacement.

* better replacement.

* debugging

* debugging

* debugging

* debugging

* debugging

* remove print.

* print state dict keys.

* print

* distingisuih better

* debuggable.

* fxi: tyests

* fix: arg from training script.

* access from class.

* run style

* debug

* save intermediate

* some simplifications for SDXL LoRA

* styling

* unet config is not needed in diffusers format.

* fix: dynamic SGM block mapping for SDXL kohya loras (huggingface#4322)

* Use lora compatible layers for linear proj_in/proj_out (huggingface#4323)

* improve condition for using the sgm_diffusers mapping

* informative comment.

* load compatible keys and embedding layer maaping.

* Get SDXL 1.0 example lora to load

* simplify

* specif ranks and hidden sizes.

* better handling of k rank and hidden

* debug

* debug

* debug

* debug

* debug

* fix: alpha keys

* add check for handling LoRAAttnAddedKVProcessor

* sanity comment

* modifications for text encoder SDXL

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* denugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* up

* up

* up

* up

* up

* up

* unneeded comments.

* unneeded comments.

* kwargs for the other attention processors.

* kwargs for the other attention processors.

* debugging

* debugging

* debugging

* debugging

* improve

* debugging

* debugging

* more print

* Fix alphas

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* clean up

* clean up.

* debugging

* fix: text

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Batuhan Taskaya <batuhan@python.org>
@LWprogramming
Copy link

I'm a bit confused by the q, k, and v lora layer shapes here. It seems like you can manually set q_hidden_size and v_hidden_size (but not k_hidden_size?), but the code assumes that the resulting dimensions equal hidden_size anyways (e.g. here for query, here for key and value, because you need to be able to add things directly onto attn.to_q/k/v, which isn't possible if the dimensions don't match.

You can show this by trying the following:

d_q = 6
d_attn = 16
n_heads = 2
d_head = d_attn // n_heads
rank = 4
attn = Attention(query_dim=d_q, heads=n_heads, dim_head=d_head) # so attn.to_q is linear from 6 to 16
lora_processor = LoRAAttnProcessor(hidden_size=d_attn, rank=rank, q_hidden_size=d_q)
# but q_hidden_size is 6, so lora_processor.to_q_lora is functionally 6 to 6 and you will fail to add them together

# Create some placeholder data
batch = 4
channels = d_q
height = 5
width = 5
hidden_states = torch.randn(batch, channels, height, width)
# Call the LoRA processor on the placeholder data
processed_data = lora_processor(attn, hidden_states) # crashes

You can fix the immediate error in here by changing to_q_lora to q_hidden_size -> hidden_size LoRALinearLayer, but then you run into similar problems with k and v. I might be missing some detail with the shapes, and given that it works for the specific SDXL network this looks like it's meant for, it probably happens to be ok, but wanted to check anyways in case there was something either I or the code is missing.

@gonzalojaimovitch
Copy link

gonzalojaimovitch commented Aug 23, 2023

Hello there! I am a little bit confused reading the documentation. Sorry in advance if this is mentioned anywhere, I just couldn't find it.

In the documentation about using Lora trained with Kohya for SD XL this is the example provided:

import torch 

base_model_id = "stabilityai/stable-diffusion-xl-base-0.9"
pipeline = DiffusionPipeline.from_pretrained(base_model_id, torch_dtype=torch.float16).to("cuda")
pipeline.load_lora_weights(".", weight_name="Kamepan.safetensors")

prompt = "anime screencap, glint, drawing, best quality, light smile, shy, a full body of a girl wearing wedding dress in the middle of the forest beneath the trees, fireflies, big eyes, 2d, cute, anime girl, waifu, cel shading, magical girl, vivid colors, (outline:1.1), manga anime artstyle, masterpiece, offical wallpaper, glint <lora:kame_sdxl_v2:1>"
negative_prompt = "(deformed, bad quality, sketch, depth of field, blurry:1.1), grainy, bad anatomy, bad perspective, old, ugly, realistic, cartoon, disney, bad propotions"
generator = torch.manual_seed(2947883060)
num_inference_steps = 30
guidance_scale = 7

image = pipeline(
    prompt=prompt, negative_prompt=negative_prompt, num_inference_steps=num_inference_steps,
    generator=generator, guidance_scale=guidance_scale
).images[0]
image.save("Kamepan.png")

As you can see in the prompt, the prompt weight style is that of the Automatic1111 environment. However, to my understanding, on the official Diffusers documentation for prompt weighting it says this must be done with the Compel library. A similar thing happens when mentioning the Lora at the end of the prompt "lora:kame_sdxl_v2:1". I understand this is not the way of doing this with Diffusers.

Could you please confirm if this prompt is or not correct for Diffusers?

It would be great to have a part of the documentation where the differences on the syntaxes for both A1111 and Diffusers are explained. I see a little confusion in this topic and that would be helpful to avoid incorrectly performing operations such as prompt weighting in a way is not really effective when using Diffusers and not A1111.

@sayakpaul
Copy link
Member Author

We copy-pasted the prompt from civit. The weighting bits in the prompt doesn't influence the generation quality in diffusers.

@gonzalojaimovitch
Copy link

Thank you very much! @sayakpaul Does that mean that diffusers understands also this type of weighting?

@sayakpaul
Copy link
Member Author

It doesn't. We don't support prompt weighting as a part of the library.

@gonzalojaimovitch
Copy link

Thanks a lot. That was really helpful!

@linnanwang
Copy link

@patrickvonplaten Hey thanks for the great work. I found many new Civital LoRA models based on SDXL 1.0 are not supported in diffuers, and would be much appreciate if you guys can make the diffuers to be compatible. Thanks.

@sayakpaul
Copy link
Member Author

Feel free to post links about which LoRAs are not compatible with fully reproducible code snippets. At the moment, we natively support Kohya-style LoRAs.

@linnanwang
Copy link

Ahh I see, thanks for the quick reply.
Let's say this one as an example.
https://civitai.com/models/113488/library-bookshelf

Many of Loras come with an error of "ValueError: Checkpoint not supported" in loaders.

@sayakpaul
Copy link
Member Author

Then that might be the reason. Kohya is one of them ost popular libraries to have provided support for LoRA, so we prioritized that.

The format of the different LoRA trainers is quite scattered at the moment, and it's very hard to centralize that effort at the moment.

@sayakpaul
Copy link
Member Author

Also in the linked LoRA, I see the following:

image

The base model doesn't seem to be SDXL here IIUC.

@linnanwang
Copy link

@sayakpaul Ah yeah, you're correct. After paying more attention to these settings, it starts working with a few LoRA models on Civital already. Thanks for the great work and the detailed explanations!

@linnanwang
Copy link

@sayakpaul another quick question, I saw civital has some amazing checkpoints (yeah sdxl checkpoints) and not sure if there is a conversion tool that brings the civital checkpoints to be compatible with diffusrs? Thanks.

@MaxTran96 MaxTran96 mentioned this pull request Aug 25, 2023
yoonseokjin pushed a commit to yoonseokjin/diffusers that referenced this pull request Dec 25, 2023
* sdxl lora changes.

* better name replacement.

* better replacement.

* debugging

* debugging

* debugging

* debugging

* debugging

* remove print.

* print state dict keys.

* print

* distingisuih better

* debuggable.

* fxi: tyests

* fix: arg from training script.

* access from class.

* run style

* debug

* save intermediate

* some simplifications for SDXL LoRA

* styling

* unet config is not needed in diffusers format.

* fix: dynamic SGM block mapping for SDXL kohya loras (huggingface#4322)

* Use lora compatible layers for linear proj_in/proj_out (huggingface#4323)

* improve condition for using the sgm_diffusers mapping

* informative comment.

* load compatible keys and embedding layer maaping.

* Get SDXL 1.0 example lora to load

* simplify

* specif ranks and hidden sizes.

* better handling of k rank and hidden

* debug

* debug

* debug

* debug

* debug

* fix: alpha keys

* add check for handling LoRAAttnAddedKVProcessor

* sanity comment

* modifications for text encoder SDXL

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* denugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* up

* up

* up

* up

* up

* up

* unneeded comments.

* unneeded comments.

* kwargs for the other attention processors.

* kwargs for the other attention processors.

* debugging

* debugging

* debugging

* debugging

* improve

* debugging

* debugging

* more print

* Fix alphas

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* clean up

* clean up.

* debugging

* fix: text

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Batuhan Taskaya <batuhan@python.org>
AmericanPresidentJimmyCarter pushed a commit to AmericanPresidentJimmyCarter/diffusers that referenced this pull request Apr 26, 2024
* sdxl lora changes.

* better name replacement.

* better replacement.

* debugging

* debugging

* debugging

* debugging

* debugging

* remove print.

* print state dict keys.

* print

* distingisuih better

* debuggable.

* fxi: tyests

* fix: arg from training script.

* access from class.

* run style

* debug

* save intermediate

* some simplifications for SDXL LoRA

* styling

* unet config is not needed in diffusers format.

* fix: dynamic SGM block mapping for SDXL kohya loras (huggingface#4322)

* Use lora compatible layers for linear proj_in/proj_out (huggingface#4323)

* improve condition for using the sgm_diffusers mapping

* informative comment.

* load compatible keys and embedding layer maaping.

* Get SDXL 1.0 example lora to load

* simplify

* specif ranks and hidden sizes.

* better handling of k rank and hidden

* debug

* debug

* debug

* debug

* debug

* fix: alpha keys

* add check for handling LoRAAttnAddedKVProcessor

* sanity comment

* modifications for text encoder SDXL

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* denugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* up

* up

* up

* up

* up

* up

* unneeded comments.

* unneeded comments.

* kwargs for the other attention processors.

* kwargs for the other attention processors.

* debugging

* debugging

* debugging

* debugging

* improve

* debugging

* debugging

* more print

* Fix alphas

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* clean up

* clean up.

* debugging

* fix: text

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Batuhan Taskaya <batuhan@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants