Refactor Cohere Model #30027

saurabhdash2512 · 2024-04-03T19:21:34Z

What does this PR do?

Refactor Cohere Model

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

ArthurZucker

Thanks.
Let's keep BC for the layer norm name.
While we are at it, missing copied from! 🤗

ArthurZucker · 2024-04-04T06:24:55Z

src/transformers/models/cohere/modeling_cohere.py

@@ -76,10 +76,9 @@ def _get_unpad_data(attention_mask):


 class CohereLayerNorm(nn.Module):
-    def __init__(self, hidden_size, eps=1e-5, bias=False):
+    def __init__(self, param_shape=None, eps=1e-5, bias=False):


Suggested change

def __init__(self, param_shape=None, eps=1e-5, bias=False):

def __init__(self, hidden_size =None, eps=1e-5, bias=False):

""" The hidden size can be a tuple or an int. If a tuple is used, the layer will be applied on both dim """

src/transformers/models/cohere/modeling_cohere.py

ArthurZucker · 2024-04-04T06:29:18Z

src/transformers/models/cohere/modeling_cohere.py

+    dtype = q.dtype
+    q = q.float()
+    k = k.float()
    cos = cos.unsqueeze(unsqueeze_dim)
    sin = sin.unsqueeze(unsqueeze_dim)
    q_embed = (q * cos) + (rotate_half(q) * sin)
    k_embed = (k * cos) + (rotate_half(k) * sin)
-    return q_embed, k_embed
+    return q_embed.to(dtype=dtype), k_embed.to(dtype=dtype)


if this is done outside apply_rotary_pos_emb we can keep the copied from 😉 but it's not an issue

ArthurZucker · 2024-04-04T06:30:34Z

src/transformers/models/cohere/modeling_cohere.py


        if (self.head_dim * self.num_heads) != self.hidden_size:
            raise ValueError(
                f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
                f" and `num_heads`: {self.num_heads})."
            )

+        if self.use_qk_norm:
+            # When sharding the model using Tensor Parallelism, need to be careful to use n_local_heads


Not sure if this comment is relevant as the model is not sharded by default

Oh this is to warn others who port this to other frameworks from HF.

ArthurZucker · 2024-04-04T10:46:16Z

As a followup, adding integration tests for the new model will be nice.

fblissjr · 2024-04-08T18:11:04Z

I think there's an issue with the tokenizer.json (or the way transformers is converting). Random multilingual out of nowhere.

See: https://huggingface.co/CohereForAI/c4ai-command-r-plus/discussions/15

Found this after converting to mlx-lm, but was found to be reproducible via the following:

You can reproduce with:

 
tokenizer = AutoTokenizer.from_pretrained("CohereForAI/c4ai-command-r-plus")
tokenizer.save_pretrained(".")```

awni · 2024-04-08T18:17:15Z

Concretely, loading -> saving -> loading the tokenizer all through HF AutoTokenizer produces a new tokenizer than the initial loaded one with worse behavior. Presumably that should not be the case.

fblissjr · 2024-04-08T20:11:06Z

worth noting - the bitsandbytes 4 bit repo linked from the main cohere repo is also a different size and looks very different from the original model repo's tokenizer.json (https://huggingface.co/CohereForAI/c4ai-command-r-plus-4bit/blob/main/tokenizer.json).

fblissjr · 2024-04-08T20:29:28Z

another interesting difference between the 4 bit bnb tokenizer and the original - in the original one, token id token id 255001 <|END_OF_TURN_TOKEN|>, special is set to False. In the 4bit bnb one, it's True.

ArthurZucker · 2024-04-08T22:15:30Z

I have no idea what you are talking about? A 4bit tokenizer? could you actually open an issue with a repro and the issue?

ahmetustun · 2024-04-08T22:39:07Z

I think the difference between two tokenizers.json is unicode encoding after tokenizer.save_pretrained which is used to save tokenizer in 4bit model. Also, in the config <|END_OF_TURN_TOKEN|> token is also set as special because it is used as eos_token, which is also overwritten in the original tokenizer (command-r-plus) as well. Therefore, two tokenizers should work the same. @fblissjr it would be helpful if you post the text you tokenize to double-check and reproduce.

fblissjr · 2024-04-08T23:06:37Z

I have no idea what you are talking about? A 4bit tokenizer? could you actually open an issue with a repro and the issue?

What @ahmetustun mentioned is the difference - not a 4 bit tokenizer, the tokenizer.json in https://huggingface.co/CohereForAI/c4ai-command-r-plus vs. https://huggingface.co/CohereForAI/c4ai-command-r-plus-4bit, which @ahmetustun clarified.

Not at my workstation now, but if nobody else is seeing this issue, I'll assume I've got something wrong on my end. Thanks for the clarification.

ahmetustun · 2024-04-08T23:18:27Z

Please left further comment in the model repo if the problem continues. Thanks @fblissjr.

* changes * addressing comments * smol fix

changes

0490ce3

saurabhdash2512 changed the title ~~Updates~~ Refactor Cohere Model Apr 3, 2024

ArthurZucker reviewed Apr 4, 2024

View reviewed changes

saurabhdash2512 added 2 commits April 4, 2024 09:22

addressing comments

907dcb4

smol fix

9326091

ArthurZucker approved these changes Apr 4, 2024

View reviewed changes

ArthurZucker merged commit 517a3e6 into huggingface:main Apr 4, 2024
17 checks passed

fblissjr mentioned this pull request Apr 8, 2024

Command-R-Plus, Context Window Limitations ml-explore/mlx-examples#660

Open

fblissjr mentioned this pull request Apr 8, 2024

It's not possible to enter <|END_OF_TURN_TOKEN|> when using cmd-r+ ggerganov/llama.cpp#6551

Open

ArthurZucker pushed a commit that referenced this pull request Apr 22, 2024

Refactor Cohere Model (#30027)

6919180

* changes * addressing comments * smol fix

drbh mentioned this pull request Apr 22, 2024

fix: default use_qk_norm false in cohere huggingface/text-generation-inference#1758

Closed

itazap pushed a commit that referenced this pull request May 14, 2024

Refactor Cohere Model (#30027)

9fe64d3

* changes * addressing comments * smol fix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor Cohere Model #30027

Refactor Cohere Model #30027

saurabhdash2512 commented Apr 3, 2024 •

edited

ArthurZucker left a comment

ArthurZucker Apr 4, 2024

ArthurZucker Apr 4, 2024

ArthurZucker Apr 4, 2024

saurabhdash2512 Apr 4, 2024

ArthurZucker commented Apr 4, 2024

fblissjr commented Apr 8, 2024

awni commented Apr 8, 2024

fblissjr commented Apr 8, 2024

fblissjr commented Apr 8, 2024

ArthurZucker commented Apr 8, 2024

ahmetustun commented Apr 8, 2024 •

edited

fblissjr commented Apr 8, 2024

ahmetustun commented Apr 8, 2024

	def __init__(self, param_shape=None, eps=1e-5, bias=False):
	def __init__(self, hidden_size =None, eps=1e-5, bias=False):
	""" The hidden size can be a tuple or an int. If a tuple is used, the layer will be applied on both dim """

Refactor Cohere Model #30027

Refactor Cohere Model #30027

Conversation

saurabhdash2512 commented Apr 3, 2024 • edited

What does this PR do?

Before submitting

Who can review?

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker Apr 4, 2024

Choose a reason for hiding this comment

ArthurZucker Apr 4, 2024

Choose a reason for hiding this comment

ArthurZucker Apr 4, 2024

Choose a reason for hiding this comment

saurabhdash2512 Apr 4, 2024

Choose a reason for hiding this comment

ArthurZucker commented Apr 4, 2024

fblissjr commented Apr 8, 2024

awni commented Apr 8, 2024

fblissjr commented Apr 8, 2024

fblissjr commented Apr 8, 2024

ArthurZucker commented Apr 8, 2024

ahmetustun commented Apr 8, 2024 • edited

fblissjr commented Apr 8, 2024

ahmetustun commented Apr 8, 2024

saurabhdash2512 commented Apr 3, 2024 •

edited

ahmetustun commented Apr 8, 2024 •

edited