Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add qwen2 #28436

Merged
merged 76 commits into from
Jan 17, 2024
Merged

Add qwen2 #28436

merged 76 commits into from
Jan 17, 2024

Conversation

JustinLin610
Copy link
Contributor

Adding Qwen2

This PR adds the support of codes for the coming Qwen2 models. For information about Qwen, please visit https://github.com/QwenLM/Qwen. @ArthurZucker

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! My only concern is that the normal attention layer does not use the new argument (max_window_layers, so results might differ between that and the Qwen2FlashAttention2 layer)

src/transformers/convert_slow_tokenizer.py Outdated Show resolved Hide resolved
Comment on lines 89 to 91
# We need to at least pass vocab_file and merges_file to base class
# in case a slow tokenizer needs to be initialized; other can be
# configured through files
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also define the extra tokens here as well 😉
copying the way we init them in the slow one!

tests/models/qwen2/test_tokenization_qwen2.py Outdated Show resolved Hide resolved
src/transformers/models/qwen2/modeling_qwen2.py Outdated Show resolved Hide resolved
return hidden_states.reshape(batch, num_key_value_heads * n_rep, slen, head_dim)


class Qwen2Attention(nn.Module):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this attention class will not really support the max_swa_layers as it only uses the attention mask to perform sliced attention I doubt that it will have the same results no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning added for not using flash attention.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, just a final nit on the tokenizer and I think we can merge!
(doc + unk_token for fast)

src/transformers/models/qwen2/tokenization_qwen2.py Outdated Show resolved Hide resolved
src/transformers/models/qwen2/tokenization_qwen2_fast.py Outdated Show resolved Hide resolved
src/transformers/models/qwen2/tokenization_qwen2_fast.py Outdated Show resolved Hide resolved
src/transformers/models/qwen2/modeling_qwen2.py Outdated Show resolved Hide resolved
@ArthurZucker
Copy link
Collaborator

Thanks a lot for this pr and bearing with me! 🤗

@ArthurZucker ArthurZucker merged commit d6ffe74 into huggingface:main Jan 17, 2024
21 checks passed
wgifford pushed a commit to wgifford/transformers that referenced this pull request Jan 21, 2024
* add config, modeling, and tokenization

* add auto and init

* update readme

* update readme

* update team name

* fixup

* fixup

* update config

* update code style

* update for fixup

* update for fixup

* update for fixup

* update for testing

* update for testing

* fix bug for config and tokenization

* fix bug for bos token

* not doctest

* debug tokenizer

* not doctest

* debug tokenization

* debug init for tokenizer

* fix style

* update init

* delete if in token auto

* add tokenizer doc

* add tokenizer in init

* Update dummy_tokenizers_objects.py

* update

* update

* debug

* Update tokenization_qwen2.py

* debug

* Update convert_slow_tokenizer.py

* add copies

* add copied from and make style

* update files map

* update test

* fix style

* fix merge reading and update tests

* fix tests

* fix tests

* fix style

* debug a variable in readme

* Update src/transformers/models/qwen2/configuration_qwen2.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* update test and copied from

* fix style

* update qwen2 tokenization  and tests

* Update tokenization_qwen2.py

* delete the copied from after property

* fix style

* update tests

* update tests

* add copied from

* fix bugs

* update doc

* add warning for sliding window attention

* update qwen2 tokenization

* fix style

* Update src/transformers/models/qwen2/modeling_qwen2.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix tokenizer fast

---------

Co-authored-by: Ren Xuancheng <jklj077@users.noreply.github.com>
Co-authored-by: renxuancheng.rxc <renxuancheng.rxc@alibaba-inc.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
AjayP13 pushed a commit to AjayP13/transformers that referenced this pull request Jan 22, 2024
* add config, modeling, and tokenization

* add auto and init

* update readme

* update readme

* update team name

* fixup

* fixup

* update config

* update code style

* update for fixup

* update for fixup

* update for fixup

* update for testing

* update for testing

* fix bug for config and tokenization

* fix bug for bos token

* not doctest

* debug tokenizer

* not doctest

* debug tokenization

* debug init for tokenizer

* fix style

* update init

* delete if in token auto

* add tokenizer doc

* add tokenizer in init

* Update dummy_tokenizers_objects.py

* update

* update

* debug

* Update tokenization_qwen2.py

* debug

* Update convert_slow_tokenizer.py

* add copies

* add copied from and make style

* update files map

* update test

* fix style

* fix merge reading and update tests

* fix tests

* fix tests

* fix style

* debug a variable in readme

* Update src/transformers/models/qwen2/configuration_qwen2.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* update test and copied from

* fix style

* update qwen2 tokenization  and tests

* Update tokenization_qwen2.py

* delete the copied from after property

* fix style

* update tests

* update tests

* add copied from

* fix bugs

* update doc

* add warning for sliding window attention

* update qwen2 tokenization

* fix style

* Update src/transformers/models/qwen2/modeling_qwen2.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix tokenizer fast

---------

Co-authored-by: Ren Xuancheng <jklj077@users.noreply.github.com>
Co-authored-by: renxuancheng.rxc <renxuancheng.rxc@alibaba-inc.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
@xd2333
Copy link

xd2333 commented Feb 11, 2024

Qwen1.5-0.5B-Chat lose output.weight after load from_pretrained and save_pretrained
image
image

@fxmarty
Copy link
Contributor

fxmarty commented Feb 27, 2024

SDPA & eager implem don't seem to match on main (5c341d4), even when not using an attn_mask:

FAILED tests/models/qwen2/test_modeling_qwen2.py::Qwen2ModelTest::test_eager_matches_sdpa_inference_0_float16 - AssertionError: False is not true : padding_side=left, use_mask=False, batch_size=1, enable_kernels=False: mean relative difference: 2.188e+00, torch at...
FAILED tests/models/qwen2/test_modeling_qwen2.py::Qwen2ModelTest::test_eager_matches_sdpa_inference_1_bfloat16 - AssertionError: False is not true : padding_side=left, use_mask=False, batch_size=1, enable_kernels=False: mean relative difference: 1.156e+00, torch at...
FAILED tests/models/qwen2/test_modeling_qwen2.py::Qwen2ModelTest::test_eager_matches_sdpa_inference_2_float32 - AssertionError: False is not true : padding_side=left, use_mask=False, batch_size=1, enable_kernels=False: mean relative difference: 1.369e+00, torch at...

@fxmarty
Copy link
Contributor

fxmarty commented Feb 27, 2024

cc @JustinLin610 have you tested both code paths?

@YuanzeSun
Copy link

Hi, Can I use it on qwen1? If not, how can I adapt it to qwen1? Thank you!

@ydshieh
Copy link
Collaborator

ydshieh commented Mar 13, 2024

@JustinLin610 In the tests, Qwen/Qwen2-450m-beta is used in several places, but there is no such repository on Huggingface Hub. Could you let us know which repository you used when adding those tests, please. Thank you in advance.

Currently, this kind implies there is no integration tests (being run) at all when this model was added to transformers.

@JustinLin610
Copy link
Contributor Author

I'll fix the code paths. Previously we have tested it and passed all the test. However, we had name changes and caused troubles. Sry about this.

@JustinLin610
Copy link
Contributor Author

@JustinLin610 In the tests, Qwen/Qwen2-450m-beta is used in several places, but there is no such repository on Huggingface Hub. Could you let us know which repository you used when adding those tests, please. Thank you in advance.

Currently, this kind implies there is no integration tests (being run) at all when this model was added to transformers.

https://huggingface.co/Qwen/Qwen1.5-0.5B This is the one that corresponds to the original Qwen/Qwen2-450m-beta

@JustinLin610
Copy link
Contributor Author

SDPA & eager implem don't seem to match on main (5c341d4), even when not using an attn_mask:

FAILED tests/models/qwen2/test_modeling_qwen2.py::Qwen2ModelTest::test_eager_matches_sdpa_inference_0_float16 - AssertionError: False is not true : padding_side=left, use_mask=False, batch_size=1, enable_kernels=False: mean relative difference: 2.188e+00, torch at...
FAILED tests/models/qwen2/test_modeling_qwen2.py::Qwen2ModelTest::test_eager_matches_sdpa_inference_1_bfloat16 - AssertionError: False is not true : padding_side=left, use_mask=False, batch_size=1, enable_kernels=False: mean relative difference: 1.156e+00, torch at...
FAILED tests/models/qwen2/test_modeling_qwen2.py::Qwen2ModelTest::test_eager_matches_sdpa_inference_2_float32 - AssertionError: False is not true : padding_side=left, use_mask=False, batch_size=1, enable_kernels=False: mean relative difference: 1.369e+00, torch at...

This is strange to me btw as this part of code is exactly copied from the mistral impl.

@JustinLin610
Copy link
Contributor Author

Hi, Can I use it on qwen1? If not, how can I adapt it to qwen1? Thank you!

No, you can't use it directly btw. You need to transform the state dict for the adaptation

merges_file,
errors="replace",
unk_token="<|endoftext|>",
bos_token=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why None?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cross-posting from slack: is it expected that qwen1.5/qwen1.5-moe models have bos_token_id in their config.json, but not in tokenizer_config.json, while bos_token defaults to None in the tokenizer class?

This results in a loaded tokenizer not having a BOS token and I wonder if this is intended

cc @Giuseppe5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants