Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phi-3 #30423

Merged
merged 18 commits into from
Apr 24, 2024
Merged

Phi-3 #30423

merged 18 commits into from
Apr 24, 2024

Conversation

gugarosa
Copy link
Contributor

@gugarosa gugarosa commented Apr 23, 2024

What does this PR do?

Integrates Phi-3 within transformers.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@gugarosa
Copy link
Contributor Author

gugarosa commented Apr 23, 2024

I bolded some questions that I had from the previous review. Could you please check them @ArthurZucker?

Done:

  • Standardize how Phi3LongScaledRotaryEmbedding is validated/initialized. Uses the same approach defined by Llama.
  • swiglu from flash-attention is required for now because we will need to ablate some alternative options. Will add a patch PR later on.
  • Uses copies for the unit tests in test_modeling_phi3.py.
  • Use # Ignore copy on Phi3ForCausalLM.forward.
  • Apply .float() on the apply_rotary_pos_emb.
  • Integration tests are added.

Review:

  • DbrxAttention has an extra clipping on the QKV, which prevents the copy. Or is there a way to ignore certain lines?
  • Use old get_usable_length - did not catch the reference, checked Llama and Mistral files, and they are using it as well.
  • Phi3Model uses an extra dropout - no sure how I should proceed with the copy. Can we ignore lines?

@CoderCowMoo
Copy link

I think documentation tests are fine, as they are only happening because they ran when phi-3-4k-instruct didn't exist. Waiting on approval for pull request

@gugarosa gugarosa marked this pull request as ready for review April 23, 2024 15:21
@ArthurZucker
Copy link
Collaborator

ArthurZucker commented Apr 23, 2024

Hey! As I have mentioned offline I think that converting to llama format will allow anyone to use your model without any release (for any other frameworks), and we can add the new scalings ( SuScaledRotaryEmbedding, YarnScaledRotaryEmbedding) separately!

This would unlock the entire community! 🤗

EDIT: this seems impossible as other frameworks are adapting to this format instead. We'll support the fused weights let's make the weights standardized for the good of the entire community! 🚀 🔥

@gugarosa
Copy link
Contributor Author

As mentioned in Slack, other team already sent a PR with the previous Phi3LongScaledRotaryEmbedding structure 😭

@ArthurZucker
Copy link
Collaborator

But that should not impact the weights no?

@gugarosa
Copy link
Contributor Author

No, let me see what I can do!

@gugarosa
Copy link
Contributor Author

gugarosa commented Apr 23, 2024

Updated the rotary embedding classes. Fingers crossed that all tests will pass!

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, let's pay attention to ROPE and make it as simple and close to what our users are used to!

docs/source/en/model_doc/phi3.md Outdated Show resolved Hide resolved
docs/source/en/model_doc/phi3.md Show resolved Hide resolved
src/transformers/models/phi3/modeling_phi3.py Outdated Show resolved Hide resolved
src/transformers/models/phi3/modeling_phi3.py Outdated Show resolved Hide resolved
src/transformers/models/phi3/modeling_phi3.py Outdated Show resolved Hide resolved
src/transformers/models/phi3/modeling_phi3.py Outdated Show resolved Hide resolved
tests/models/phi3/test_modeling_phi3.py Show resolved Hide resolved
@fakerybakery
Copy link

EDIT: this seems impossible as other frameworks are adapting to this format instead. We'll support the fused weights let's make the weights standardized for the good of the entire community! 🚀 🔥

So no chance we could get a Llama-converted version?

@gugarosa
Copy link
Contributor Author

Overall LGTM, let's pay attention to ROPE and make it as simple and close to what our users are used to!

I think RoPE should be clearer now and closer to what transformers proposes. What do you think?

For now, I would like to keep the short_factor and long_factor on the config.json because I am still waiting for some replies on how they were created.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥 very clean, left 2 small nits about using the config instead of passing kwargs for ROPE, and no one liners. THat's it

src/transformers/models/phi3/modeling_phi3.py Show resolved Hide resolved
src/transformers/models/phi3/modeling_phi3.py Outdated Show resolved Hide resolved
src/transformers/models/phi3/modeling_phi3.py Outdated Show resolved Hide resolved
src/transformers/models/phi3/modeling_phi3.py Outdated Show resolved Hide resolved
@ArthurZucker
Copy link
Collaborator

Solving the conflicts: accept all changes from main, ignore current

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@gugarosa
Copy link
Contributor Author

Hopefully I was able to address everything! But please let me know if anything else is needed

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 great work

@ydshieh
Copy link
Collaborator

ydshieh commented Apr 24, 2024

Hi a CI is triggered https://github.com/huggingface/transformers/actions/runs/8818618312

@ydshieh
Copy link
Collaborator

ydshieh commented Apr 24, 2024

But since the CI in this PR is not green yet, you might consider to rebase on main once #30456 is merged into main.

Sorry about the mistake in the workflow file.

@ydshieh
Copy link
Collaborator

ydshieh commented Apr 24, 2024

Well

=========== 100 passed, 47 skipped, 26 warnings in 391.83s (0:06:31) ===========

💯

I ❤️ see this ! Thank you for the great contribution 🚀

@ydshieh
Copy link
Collaborator

ydshieh commented Apr 24, 2024

The circle CI seems crazy to run my branch #30455. If it is not green, it's fine. The last commit aeb6ae7ebefcd4360441c2df4f20c0bcd45ff6f5 is green anyway

https://app.circleci.com/pipelines/github/huggingface/transformers?branch=pull%2F30423

@ArthurZucker
Copy link
Collaborator

Merging! 👍🏻

@gugarosa
Copy link
Contributor Author

Thanks for everything folks!

@ArthurZucker ArthurZucker merged commit c9693db into huggingface:main Apr 24, 2024
25 checks passed
@ArthurZucker
Copy link
Collaborator

Thanks for yet another great contribution to the ecosystem !

@ydshieh
Copy link
Collaborator

ydshieh commented Apr 26, 2024

Hi @gugarosa

(I believe) The commit on Hub repo https://huggingface.co/microsoft/Phi-3-mini-4k-instruct/commit/3c0c9df9c11252fb61789d7847fa7d03f2825596

failed some tests

tests/models/phi3/test_modeling_phi3.py::Phi3IntegrationTest::test_phi3_mini_128k_instruct_generation FAILED [ 99%]
tests/models/phi3/test_modeling_phi3.py::Phi3IntegrationTest::test_phi3_mini_4k_instruct_generation FAILED [100%]

see this job https://github.com/huggingface/transformers/actions/runs/8842226344/job/24280809384

Could you take a look and open a PR for a fix please?

You can run things like

RUN_SLOW=1 python3 -m pytest -rs -v tests/models/phi3/test_modeling_phi3.py::Phi3IntegrationTest::test_phi3_mini_4k_instruct_generation

Same for the example in the file

docs/source/en/model_doc/phi3.md

@ydshieh
Copy link
Collaborator

ydshieh commented Apr 26, 2024

And the doc example for the class Phi3ForCausalLM doesn't seem working (from the first day)

from transformers import AutoTokenizer, Phi3ForCausalLM

model = Phi3ForCausalLM.from_pretrained("microsoft/phi-3-mini-4k-instruct")
tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-3-mini-4k-instruct")

prompt = "This is an example script ."
inputs = tokenizer(prompt, return_tensors="pt")

# Generate
generate_ids = model.generate(inputs.input_ids, max_new_tokens=30)
o = tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
print(o)

It gives

This is an example script .

which is just the input text itself.

Would be great if you can check this too, thanks.

@ydshieh
Copy link
Collaborator

ydshieh commented May 2, 2024

#30423 (comment)

Hi @gugarosa It would very nice if you could take a look 🤗 .

itazap pushed a commit that referenced this pull request May 14, 2024
* chore(root): Initial commit of Phi-3 files.

* fix(root): Fixes Phi-3 missing on readme.

* fix(root): Ensures files are consistent.

* fix(phi3): Fixes unit tests.

* fix(tests): Fixes style of phi-3 test file.

* chore(tests): Adds integration tests for Phi-3.

* fix(phi3): Removes additional flash-attention usage, .e.g, swiglu and rmsnorm.

* fix(phi3): Fixes incorrect docstrings.

* fix(phi3): Fixes docstring typos.

* fix(phi3): Adds support for Su and Yarn embeddings.

* fix(phi3): Improves according first batch of reviews.

* fix(phi3): Uses up_states instead of y in Phi3MLP.

* fix(phi3): Uses gemma rotary embedding to support torch.compile.

* fix(phi3): Improves how rotary embedding classes are defined.

* fix(phi3): Fixes inv_freq not being re-computed for extended RoPE.

* fix(phi3): Adds last suggestions to modeling file.

* fix(phi3): Splits inv_freq calculation in two lines.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants