Skip to content

Conversation

ariG23498
Copy link
Collaborator

Porting Gemma 2 transformers checkpoints in Keras NLP

@github-actions github-actions bot added the Gemma Gemma model specific issues label Jun 27, 2024
Copy link
Member

@mattdangerw mattdangerw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This all looks good to me. Let's add a test.

@ariG23498 ariG23498 marked this pull request as ready for review July 4, 2024 07:21
@ariG23498
Copy link
Collaborator Author

ariG23498 commented Jul 4, 2024

@mattdangerw @grasskin this PR is ready for review!

Note: The KerasNLP Gemma 2 model works only on the JAX backend (for the time being)

Also thanks to the Hugging Face team (Matt et. al.) for providing me with compute to work on this model.

Comment on lines +113 to +122
if transformers_config["model_type"] == "gemma":
port_weight(
keras_variable=decoder_layer.pre_ffw_norm.variables[0],
hf_weight_key=f"model.layers.{i}.post_attention_layernorm.weight",
)
elif transformers_config["model_type"] == "gemma2":
port_weight(
keras_variable=decoder_layer.pre_ffw_norm.variables[0],
hf_weight_key=f"model.layers.{i}.pre_feedforward_layernorm.weight",
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was done in order to align the gemma 1 and gemma 2 checkpoints.

I am open to better ways to go around it.

@ariG23498 ariG23498 added the kokoro:force-run Runs Tests on GPU label Jul 4, 2024
@kokoro-team kokoro-team removed the kokoro:force-run Runs Tests on GPU label Jul 4, 2024
@ariG23498 ariG23498 changed the title [WIP] Porting Gemma 2 transformers checkpoint Porting Gemma 2 transformers checkpoint Jul 4, 2024
@mattdangerw
Copy link
Member

Thanks!

@mattdangerw mattdangerw merged commit a219e96 into keras-team:master Jul 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Gemma Gemma model specific issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants