Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic Llama2 Tuning #39

Merged
merged 10 commits into from
May 15, 2024
Merged

Basic Llama2 Tuning #39

merged 10 commits into from
May 15, 2024

Conversation

tengomucho
Copy link
Collaborator

@tengomucho tengomucho commented May 13, 2024

What does this PR do?

Add support for Llama model tuning on TPU v5e, with a related example.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

This will be then tweaked for TPU/XLA.
Original transformers version is 4.40.0, commit 745bbfe.
This should reduce memory consumption, with low performance loss.
Imported as-is, from version 4.39.3.
For RowParallelLinear and ColumnParallelLinear use Linear instead of the
dedicated class, to avoid issues with the backward step.
For now it does not seem to work, seems to be related to shapes of key
states that are not compatible with the attention calculation after
update. We can investigate the reason and propose a better solution
later.
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@tengomucho tengomucho marked this pull request as ready for review May 13, 2024 14:14
@alanwaketan
Copy link

Is there any reasons why you choose to use 2D sharding instead of FSDPv2? The latter is integrated in to transformer trainer: https://huggingface.co/blog/gemma-peft#accelerate-with-fsdp-via-spmd-on-tpu

@tengomucho
Copy link
Collaborator Author

@alanwaketan yes, the reason is that it did not work out of the box, apparently the trainer was trying to use some API from the XLA that was in experimental and it is not available anymore, so the code was raising an exception. I guess we can fix the FSDP in transformers too, and compare both to see which one performs better.

@alanwaketan
Copy link

@alanwaketan yes, the reason is that it did not work out of the box, apparently the trainer was trying to use some API from the XLA that was in experimental and it is not available anymore, so the code was raising an exception. I guess we can fix the FSDP in transformers too, and compare both to see which one performs better.

Can you point me to the error? All the API the trainer uses should be available in 2.3.

@alanwaketan
Copy link

FSDPv2's intention is to releif the need for all this complicated shardings from regular users. Unless you know you need 2D sharding instead of 1D.

@tengomucho
Copy link
Collaborator Author

@alanwaketan sure it makes total sense, I will get back to you with the error

@tengomucho
Copy link
Collaborator Author

@alanwaketan Ah btw I was still using torch xla 2.2, we haven't moved to 2.3 yet. I will do that so I can re-test FSDP.

@tengomucho
Copy link
Collaborator Author

@alanwaketan you can check the errors I see with FSDP here.

This part uses the multihost environment, tested on v5e litepod16.
@tengomucho
Copy link
Collaborator Author

I raised an issue on the accelerate project to fix the FSDP/dataloader issue huggingface/accelerate#2775

@alanwaketan
Copy link

@alanwaketan Ah btw I was still using torch xla 2.2, we haven't moved to 2.3 yet. I will do that so I can re-test FSDP.

2.2 is expected to not work. The 2.3 issue seems more generic than FSDPv2.

Copy link
Member

@mfuntowicz mfuntowicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🔥

@tengomucho tengomucho merged commit 8a70783 into main May 15, 2024
2 checks passed
@tengomucho tengomucho deleted the basic-training branch May 15, 2024 21:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants