Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cohere Model Release #29622

Merged
merged 27 commits into from
Mar 15, 2024
Merged

Cohere Model Release #29622

merged 27 commits into from
Mar 15, 2024

Conversation

saurabhdash2512
Copy link
Contributor

@saurabhdash2512 saurabhdash2512 commented Mar 12, 2024

Cohere Model Release

What does this PR do?

This PR adds the Cohere Model Family with the release of the Command-R model weights

More information about the model can be found here

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@LysandreJik

Copy link
Contributor

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huge work ! Thanks a lot for this !

I left couple of comments - mostly being that we can leverage # Copied from mechanism almost all over the places and simply put # Ignore copy statements at the places where the implementation differs from Llama, please have a look at my comments
Also it seems the documentation, and testing files are missing - they should be pretty easy to add as the model is mostly copied from llama so I'd expect all tests to pass out of the box ! For a similar PR: #29215 Starcoder was mostly copied from Mistral with minor architectural changes, please have a look at that PR as well and try to take some inspiration from there - I really think it shouldn't be hard to make this PR mergeable ASAP ! 🚀

Also, let's remove pretraining_tp as from the checkpoints I inspected, it's either set to 1: https://huggingface.co/CohereForAI/c4ai-command-r-v01/blob/main/config.json#L27 or None so we can remove it completely from the code base of this model

For making the CI happy, you can first make sure that make fixup passes on your local machine (this will check the # Copied from mechanism)

Let me know if you need any help !

@@ -331,6 +331,7 @@ Current number of checkpoints: ![](https://img.shields.io/endpoint?url=https://h
1. **[CLVP](https://huggingface.co/docs/transformers/model_doc/clvp)** released with the paper [Better speech synthesis through scaling](https://arxiv.org/abs/2305.07243) by James Betker.
1. **[CodeGen](https://huggingface.co/docs/transformers/model_doc/codegen)** (from Salesforce) released with the paper [A Conversational Paradigm for Program Synthesis](https://arxiv.org/abs/2203.13474) by Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong.
1. **[CodeLlama](https://huggingface.co/docs/transformers/model_doc/llama_code)** (from MetaAI) released with the paper [Code Llama: Open Foundation Models for Code](https://ai.meta.com/research/publications/code-llama-open-foundation-models-for-code/) by Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Défossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom, Gabriel Synnaeve.
1. **[Cohere](https://huggingface.co/docs/transformers/main/model_doc/cohere)** (from Cohere) released with the paper [Command-R: Retrieval Augmented Generation at Production Scale](<https://txt.cohere.com/command-r/>) by Cohere.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to run make fix-copies to propagate these changes to all other READMEs

)


class LayerNorm(nn.Module):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you prepend the name of these modules with Cohere? -- > CohereLayerNorm



class LayerNorm(nn.Module):
def __init__(self, hidden_size, eps=1e-5, bias=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This layer norm is different from the Llama layernorm because of the bias term, can we explcitly mention that in the docstring of this module?

def __init__(self, hidden_size, eps=1e-5, bias=False):
super().__init__()
self.weight = nn.Parameter(torch.ones(hidden_size))
self.bias = nn.Parameter(torch.zeros(hidden_size)) if bias else None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually bias seems to be always set to None (looking at the LayerNorm definitions below) - can we instead here remove bias , copy the LayerNorm module from Llama and add a # Copied from statement on top of the module?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Llama LayerNorm is a bit different from this one which happens in FP32. I kept the bias incase we decide to add bias in the Future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok !

return causal_mask


class CohereForCausalLM(CoherePreTrainedModel):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class CohereForCausalLM(CoherePreTrainedModel):
# Copied from transformers.models.llama.modeling_llama.LlamaForCausalLM with Llama->Cohere
class CohereForCausalLM(CoherePreTrainedModel):


@add_start_docstrings_to_model_forward(COHERE_INPUTS_DOCSTRING)
@replace_return_docstrings(output_type=CausalLMOutputWithPast, config_class=_CONFIG_FOR_DOC)
def forward(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def forward(
# Ignore copy
def forward(

Since here the difference with llama is that we multiply the lm logits with logits_scale

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A file for modeling and tokenizer tests is missing

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of tests are required for modeling and tokenization @younesbelkada ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can copy over tests/models/llama/test_modeling_llama.py and tests/models/llama/test_tokenization_llama.py and adapt it for Cohere. Note we also need an integration test so you can remove the llama integration tests and replace it with new ones

@@ -310,6 +310,8 @@
title: CodeGen
- local: model_doc/code_llama
title: CodeLlama
- local: model_doc/cohere
title: Cohere
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cohere doc page is missing

Copy link
Contributor

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you need to add a field for the slow tokenizer in order for the failing CI to pass, e.g.:

("bloom", (None, "BloomTokenizerFast" if is_tokenizers_available() else None)),

@@ -137,6 +143,10 @@
),
),
("codegen", ("CodeGenTokenizer", "CodeGenTokenizerFast" if is_tokenizers_available() else None)),
(
"cohere",
("CohereTokenizerFast" if is_tokenizers_available() else None,),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
("CohereTokenizerFast" if is_tokenizers_available() else None,),
(None, "CohereTokenizerFast" if is_tokenizers_available() else None,),

(
"cohere",
(
"CohereTokenizerFast" if is_tokenizers_available() else None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"CohereTokenizerFast" if is_tokenizers_available() else None,
None, "CohereTokenizerFast" if is_tokenizers_available() else None,

Copy link
Contributor

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the huge work ! This is much cleaner ! 🤩 I left few possible enhancements to make the code easier to maintain in the future ! We should be really close merging this ! 🚀

Comment on lines 40 to 41
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

These won't be needed once we merge the PR right?

Comment on lines 75 to 76
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

Comment on lines 102 to 103
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True, quantization_config=bnb_config)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True, quantization_config=bnb_config)
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config)

def __init__(self, hidden_size, eps=1e-5, bias=False):
super().__init__()
self.weight = nn.Parameter(torch.ones(hidden_size))
self.bias = nn.Parameter(torch.zeros(hidden_size)) if bias else None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok !

Comment on lines 111 to 116
t = torch.arange(self.max_seq_len_cached, device=device, dtype=torch.int64).type_as(self.inv_freq)
t = t / self.scaling_factor
freqs = torch.outer(t, self.inv_freq)
emb = torch.repeat_interleave(freqs, 2, dim=-1)
self.register_buffer("_cos_cached", emb.cos().to(torch.get_default_dtype()), persistent=False)
self.register_buffer("_sin_cached", emb.sin().to(torch.get_default_dtype()), persistent=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
t = torch.arange(self.max_seq_len_cached, device=device, dtype=torch.int64).type_as(self.inv_freq)
t = t / self.scaling_factor
freqs = torch.outer(t, self.inv_freq)
emb = torch.repeat_interleave(freqs, 2, dim=-1)
self.register_buffer("_cos_cached", emb.cos().to(torch.get_default_dtype()), persistent=False)
self.register_buffer("_sin_cached", emb.sin().to(torch.get_default_dtype()), persistent=False)

_sin_cached and _cos_cached seems to be not used below, I think we can remove them !

class CohereForCausalLM(CoherePreTrainedModel):
_tied_weights_keys = ["model.embed_tokens.weight", "lm_head.weight"]

def __init__(self, config):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def __init__(self, config):
# Ignore copy
def __init__(self, config):

return causal_mask


class CohereForCausalLM(CoherePreTrainedModel):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know why the copied from cannot be used here? It will be very useful to easily maintain the methods below such as _prepare_inputs_for_generation, otherwise you can also try to put a copied from statement on the _prepare_inputs_for_generation method

from transformers import CohereForCausalLM, CohereModel


class CohereModelTester:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class CohereModelTester:
# Copied from transformers.tests.models.llama.test_modeling_llama.LlamaModelTester with Llama->Cohere
class CohereModelTester:

Can we add also copied from on tests as well ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are differences in the tests.



@require_torch
class CohereModelTest(unittest.TestCase):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here for copied from

class CohereIntegrationTest(unittest.TestCase):
@unittest.skip("Logits are not exactly the same, once we fix the instabalities somehow, will update!")
@slow
def test_model_logits(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these the real values obtained?
Could you rather add some end-to-end generation tests similarly as:

def test_starcoder2_batched_generation_sdpa(self):

saurabhdash2512 and others added 2 commits March 14, 2024 11:43
* fix modeling final nits and add proper test file

* for now leave empty tests

* add integration test

* push new test
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@Rocketknight1
Copy link
Member

Rocketknight1 commented Mar 14, 2024

Hi @saurabhdash2512, this looks really good! We've just made some updates to the transformers library here to support some of the new chat template features this model uses - if possible, can you rebase your PR so we can use them in this port? I'll make a PR to your branch to add support for the features once you rebase!

(I realize you've already rebased quite recently - I'm sorry about this!)

@saurabhdash2512
Copy link
Contributor Author

Hi @saurabhdash2512, this looks really good! We've just made some updates to the transformers library here to support some of the new chat template features this model uses - if possible, can you rebase your PR so we can use them in this port? I'll make a PR to your branch to add support for the features once you rebase!

(I realize you've already rebased quite recently - I'm sorry about this!)

@Rocketknight1 Done! Sync'ed with main.

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! LGTM

Copy link
Contributor

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @saurabhdash and team !

@LysandreJik LysandreJik merged commit 0e4a1c3 into huggingface:main Mar 15, 2024
20 of 22 checks passed
itazap pushed a commit that referenced this pull request May 14, 2024
* Cohere Model Release (#1)

Cohere Model Release

* Remove unnecessary files and code (#2)

Some cleanup

* Delete cohere-model directory (#3)

* Make Fix (#5)

* Pr fixes (#6)

* fixes for pr

* pr fixes for the format

* pr fixes for the format

* src/transformers/models/auto/tokenization_auto.py

* Tokenizer test (#8)

* tokenizer test

* format fix

* Adding Docs and other minor changes (#7)

* Add modeling tests (#9)

* Smol Fix (#11)

* tokenization tests are fixed

* format fixes

* fix pr doc tests

* fix pr doc tests

* fix pr doc tests

* fix pr style check

* small changes in cohere.md

* FIX: Address final comments for transformers integration (#13)

* fix modeling final nits and add proper test file

* for now leave empty tests

* add integration test

* push new test

* fix modeling cohere (#14)

* Update chat templates to use the new API (#15)

---------

Co-authored-by: ahmetustun <ahmetustun89@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
@ArthurZucker ArthurZucker mentioned this pull request Jul 26, 2024
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants