Skip to content

Conversation

abheesht17
Copy link
Collaborator

@abheesht17 abheesht17 commented Feb 20, 2025

Resolves #2104

@abheesht17 abheesht17 changed the title Add query_proj, value_proj to target names for enable_lora Add query_proj, value_proj to target names for enable_lora Feb 20, 2025
"value",
"query_proj",
"value_proj",
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach works too but I was thinking of having a get_lora_target_names which returns ["query_dense", "value_dense", "query", "value"] in this base class and if any other model like PaliGemma has lorafiable layers other than this, can override get_lora_target_names and return their lorafiable layers, e.g. "query_proj", "value_proj", in the case of PaliGemma. This would be more scalable and flexible and prevent this list from getting longer. wdyt?
PS: The reason that I said we don't have to have it as an argument is that it's usually a constant list for a model and it's unlikely that users need to change it so hard-coding it should be fine and leaves us with a simpler API.

Copy link
Collaborator Author

@abheesht17 abheesht17 Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I guess I misconstrued what you were saying during the call. Pushing the change. Thanks!

Copy link
Member

@SamanehSaadat SamanehSaadat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@abheesht17 abheesht17 merged commit ebc56b4 into keras-team:master Feb 20, 2025
7 checks passed
Copy link
Member

@mattdangerw mattdangerw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small comments!

"""
return ["query_dense", "value_dense", "query", "value"]

def enable_lora(self, rank):
Copy link
Member

@mattdangerw mattdangerw Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could consider exposing this. E.g. target_names=None, if None uses backbone.get_default_lora_targets() or something like that. Would allow some customization.

target_names = super().get_lora_target_names()

# Add these for `PaliGemmaVITAttention`.
target_names += ["query_proj", "value_proj"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just align these names without breaking checkpoint compat? Fine to keep the overridable method, but seems like it would just save us headache (and improve overall ux) if we just kept the same naming convention here. There's no reason for the divergence right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve LoRA Compatibility by Renaming value_proj and query_proj in PaliGemmaVitAttention
4 participants