add model resources for CPMAnt (new) #20906

pioliverse · 2022-12-27T02:48:11Z

What does this PR do?

Since the previous submission(#20711 ) had problems here and there, we have now resubmitted a new one.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

pioliverse · 2022-12-27T03:13:39Z

Thanks very much @pioliverse for iterating! I left a couple of comments, I think that some refactoring needs to be considered, after that we should be close to merge this! My main comments are:

I think that you can wrap CPMAntEmbedding around a nn.Embedding layer even though scaling is needed. You can scale down after each call to the embedding module and make sure the input is scaled down before the projection call.

Make sure to inherit CPMAntForCausalLM from CPMAntPreTrainedModel, also make sure to follow the convention / good practices by checking what is done in OPT for instance:

transformers/src/transformers/models/opt/modeling_opt.py

Line 808 in 1543cee

class OPTForCausalLM(OPTPreTrainedModel):

this includes defining correctly a lm_head module, functions such as get_input_embeddings, set_input_embeddings, etc.

A lot of arguments from module's init seems to be unused, e.g. init_std. Try also to take the config object as a single argument from the init whenever possible (e.g. CPMAntEncoder)

Please make sure to follow the correct styling for docstrings (check my comments about that below)

If you have to initialize some weights with a specific distribution, try to initialize all the submodules weights inside _init_weights function from CPMAntPreTrainedModel

It's unclear to me why forward function is not defined in CPMAntForCausalLM

The code can be optimized here and there, I left some comments below on how you can achieve that

Please do not raise RuntimeErrors outside if torch_is_available(), otherwise flax & tf tests will fail
Again thanks a lot for your efforts!

@younesbelkada Thanks for your patience in reviewing, I followed OPT convention and made the following changes:

CPMAntEmbedding and CPMAntLinear has been replaced by nn.Embedding and nn.Linear respectively.

CPMAntForCausalLM has been inherited from CPMAntPreTrainedModel, and lm_head and some functions have been added.

Useless initial arguments have been removed.

forward has been defined in CPMAntForCausalLM

pioliverse · 2023-01-03T05:13:26Z

@younesbelkada Thanks again for your patience in reviewing.

younesbelkada

Thanks so much for your patience ! Looks pretty clean thank you! We should be close merging this once most of the comments are addressed. My comments being:

Docstring and comments:

please harmonize the function docstrings to match the convention transformers model follow
please make sure to clean up some comments
Also would be nice to add a small explanation on the code on why generate needs to be overriden

`dtype`:

I don't think the argument dtype is needed. The dtype of the whole model is managed by the kwarg torch_dtype so you can load your model using model = xxxForCausalLM.from_pretrained(xxx, torch_dtype=torch.float16) or torch_dtype="auto" (if the weights are pushed in fp16) and the model will be loaded in the desired precision.

tests

I think that a test is failing, please double check that

general comments

For classes that are public (i.e. that are ported in __init__.py, basicallyCPMAntModel &. CPMAntForCausalLM it is preferable to adopt this logic: https://github.com/huggingface/transformers/blob/main/src/transformers/models/gpt_neox_japanese/modeling_gpt_neox_japanese.py#L693-L703 --> outputa tuple if not return_dict otherwise return a dataclass. Please check other modeling files as reference
you can wrap attention_mask creation process inside class methods, e.g. _prepare_attention_mask(

Thanks!

younesbelkada · 2023-01-05T08:29:20Z

src/transformers/models/cpmant/modeling_cpmant.py

+            # assert (
+            #     pointer.shape == array.shape
+            # ), f"Pointer shape {pointer.shape} and array shape {array.shape} mismatched"


to clean up!

younesbelkada · 2023-01-05T08:30:12Z

src/transformers/models/cpmant/modeling_cpmant.py

+        # assert x.size(-1) == self.dim_norm
+        if x.size(-1) != self.dim_norm:


to clean up

younesbelkada · 2023-01-05T08:31:22Z

src/transformers/models/cpmant/modeling_cpmant.py

+            dim_model (int): Main dimension of modules in transformer blocks.
+            num_heads (int): Number of attention heads in the Transformer encoder.
+            dim_head (int): Dimension of attention heads for each attention layer in the Transformer encoder.
+            dtype (optional): Defaults to torch.float.
+            eps (float, optional): The epsilon used by the layer normalization layers.
+            dropout_p (float, optional): Defaults to 0.


Please respect the convention as above ;)

dim_model (`int`): Main dimension of ... ...

younesbelkada · 2023-01-05T08:31:54Z

src/transformers/models/cpmant/modeling_cpmant.py

+        dim_model (int): Main dimension of modules in transformer blocks.
+        dim_ff (int): Dimension of the "intermediate" (i.e., feed-forward) layer in the Transformer encoder.
+        dtype (optional): Defaults to torch.float.
+        eps (float, optional): The epsilon used by the layer normalization layers.
+        dropout_p (float, optional): Defaults to 0.


Same comment as above!

younesbelkada · 2023-01-05T08:32:14Z

src/transformers/models/cpmant/modeling_cpmant.py

+        num_layers (int): Number of layers.
+        dim_model (int): Main dimension of modules in transformer blocks.
+        dim_ff (int): Dimension of the "intermediate" (i.e., feed-forward) layer in the Transformer encoder.
+        num_heads (int): Number of attention heads in the Transformer encoder.
+        dim_head (int): Dimension of attention heads for each attention layer in the Transformer encoder.
+        dtype (optional): Defaults to torch.float.
+        eps (float, optional): The epsilon used by the layer normalization layers.
+        dropout_p (float, optional): Defaults to 0.


same as above

younesbelkada · 2023-01-05T08:34:49Z

src/transformers/models/cpmant/modeling_cpmant.py

+                    attention_mask,
+                    position_bias,
+                    past_key_value=past_key_values[i] if past_key_values else None,
+                    use_cache=use_cache,


I see that use_cache is also passed here, I am slightly confused why you don't pass it above too

younesbelkada · 2023-01-05T08:35:08Z

src/transformers/models/cpmant/modeling_cpmant.py

+            # assert key_pos.size(0) == query_pos.size(0)
+            # assert keylen == key_segment.size(1) and querylen == query_segment.size(1)


to clean up

younesbelkada · 2023-01-05T08:36:01Z

src/transformers/models/cpmant/modeling_cpmant.py

+        with torch.no_grad():
+            device = input.device
+            directional_mask_2d = torch.arange(seqlen, device=device) <= torch.arange(seqlen, device=device).view(
+                -1, 1
+            )
+            attention_mask = context[:, None, :] | (
+                context[:, :, None].logical_not() & directional_mask_2d.view(1, seqlen, seqlen)
+            )
+            attention_mask = attention_mask & (span[:, None, :] == span[:, :, None])
+            mask_1d = torch.arange(seqlen, device=device)[None, :].repeat(batch, 1) < length[:, None]
+            attention_mask = mask_1d.view(batch, seqlen, 1) & mask_1d.view(batch, 1, seqlen) & attention_mask


nit: You can probably wrap that in a class method _prepare_attention_mask

younesbelkada · 2023-01-05T08:36:16Z

src/transformers/models/cpmant/modeling_cpmant.py

+        with torch.no_grad():
+            device = input.device
+            directional_mask_2d = torch.arange(seqlen, device=device) <= torch.arange(seqlen, device=device).view(
+                -1, 1
+            )
+            attention_mask = context[:, None, :] | (
+                context[:, :, None].logical_not() & directional_mask_2d.view(1, seqlen, seqlen)
+            )
+            attention_mask = attention_mask & (span[:, None, :] == span[:, :, None])
+            # mask for left paddding
+            mask_1d = (
+                torch.tensor(list(range(seqlen))[::-1], device=device)[None, :].repeat(batch, 1) < length[:, None]
+            )
+            attention_mask = mask_1d.view(batch, seqlen, 1) & mask_1d.view(batch, 1, seqlen) & attention_mask


same, you can wrap that in a method

younesbelkada · 2023-01-05T08:37:33Z

src/transformers/models/cpmant/modeling_cpmant.py

+            hidden_states, attention_mask, position_bias, True, past_key_values
+        )
+        logits = self.lm_head(hidden_states)
+        return logits, hidden_states, present_key_values


Can you return a dataclass object instead? For example:

transformers/src/transformers/models/gpt_neox_japanese/modeling_gpt_neox_japanese.py

Line 697 in 8fb4d0e

return CausalLMOutputWithPast(

or you can also define your own class

HuggingFaceDocBuilderDev · 2023-01-09T17:08:50Z

The documentation is not available anymore as the PR was closed or merged.

HuggingFaceDocBuilderDev · 2023-01-12T03:34:18Z

The documentation is not available anymore as the PR was closed or merged.

pioliverse · 2023-02-02T10:58:10Z

Thanks so much for your patience ! Looks pretty clean thank you! We should be close merging this once most of the comments are addressed. My comments being:

Docstring and comments:

please harmonize the function docstrings to match the convention transformers model follow

please make sure to clean up some comments

Also would be nice to add a small explanation on the code on why generate needs to be overriden

dtype:

I don't think the argument dtype is needed. The dtype of the whole model is managed by the kwarg torch_dtype so you can load your model using model = xxxForCausalLM.from_pretrained(xxx, torch_dtype=torch.float16) or torch_dtype="auto" (if the weights are pushed in fp16) and the model will be loaded in the desired precision.

tests

I think that a test is failing, please double check that

general comments

For classes that are public (i.e. that are ported in __init__.py, basicallyCPMAntModel &. CPMAntForCausalLM it is preferable to adopt this logic: https://github.com/huggingface/transformers/blob/main/src/transformers/models/gpt_neox_japanese/modeling_gpt_neox_japanese.py#L693-L703 --> outputa tuple if not return_dict otherwise return a dataclass. Please check other modeling files as reference

you can wrap attention_mask creation process inside class methods, e.g. _prepare_attention_mask(

Thanks!

Hi @younesbelkada, we have made some changes as follows:

add some docstrings.
modified forward following the style of transformers.
rewrote some functions to adapt the generate function

in modeling_cpmant.py, we rewrote some functions like prepare_inputs_for_generation, _expand_inputs_for_generation

in tokenization_cpmant.py, rewrote some functions like prepare_for_model, _pad, _encode_plus, _batch_encode_plus

cleaned some comments.

younesbelkada

Thanks a lot for addressing most of the comments of the previous review! And thank you for your huge work on refactoring the modeling script
I left some comments, mostly nits that can be solved easily. Note that for arguments such as use_cache etc, we prefer to pass them through the forward pass rather than setting them as a class attribute.
Also, please consider passing a CPMAntConfig for the classes that have several attributes such as CPMAntEncoder
Make sure also to correctly pass the required keyword arguments such as past_key_values, output_attentions etc, that are crucial for caching mechanism. You can check how this is done in OPT for example
Finally, the naming convention in transformers has changed a bit, we prefer to name models with a single capital letter (i.e. here CPMAnt -> Cpmant)
Again thanks for your efforts on this! Once the comments being solved, we should be very close merging this!

younesbelkada · 2023-02-03T18:57:41Z

src/transformers/models/cpmant/modeling_cpmant.py

+]
+
+
+def load_tf_weights_in_cpmant(model, config, tf_checkpoint_path):


This seems to be a function that is adapted from:

transformers/src/transformers/models/bert/modeling_bert.py

Line 107 in 6c62cfb

def load_tf_weights_in_bert(model, config, tf_checkpoint_path):

, can you add a # Adapted from statement on the top of the function?

This can be removed entirely, there are no TF weights to convert here, no?

younesbelkada · 2023-02-03T18:58:12Z

src/transformers/models/cpmant/modeling_cpmant.py

+
+        super().__init__()
+
+        self.eps = eps
+        self.dim_norm = dim_norm
+        self.weight = torch.nn.parameter.Parameter(torch.full((dim_norm,), init_var))


Suggested change

super().__init__()

self.eps = eps

self.dim_norm = dim_norm

self.weight = torch.nn.parameter.Parameter(torch.full((dim_norm,), init_var))

super().__init__()

self.eps = eps

self.dim_norm = dim_norm

self.weight = torch.nn.parameter.Parameter(torch.full((dim_norm,), init_var))

younesbelkada · 2023-02-03T18:58:28Z

src/transformers/models/cpmant/modeling_cpmant.py

+    """RMS LayerNorm"""
+


Could you add more description here?

We added some comments.

younesbelkada · 2023-02-03T18:59:08Z

src/transformers/models/cpmant/modeling_cpmant.py

+                Indices of input sequence tokens of shape `(batch, len_q, dim_model)`. It will be embedded by model's
+                internal embedding lookup matrix.


This description seems to be wrong

younesbelkada · 2023-02-03T18:59:17Z

src/transformers/models/cpmant/modeling_cpmant.py

+                Avoid invalid areas to participate in the calculation of self-attention.
+            position_bias (`torch.Tensor` of shape `(batch, len_seq, len_seq)`):
+                Provide positional information to self-attention block.
+            past_kv (`Tuple(torch.FloatTensor)`, *optional*): Cached past key and value projection states.


Suggested change

past_kv (`Tuple(torch.FloatTensor)`, *optional*): Cached past key and value projection states.

past_kv (`Tuple(torch.FloatTensor)`, *optional*):

Cached past key and value projection states.

younesbelkada · 2023-02-03T19:07:06Z

src/transformers/models/cpmant/modeling_cpmant.py

+        return hidden_states, current_key_values
+
+
+class CPMAntIntermediate(nn.Module):


This seems to be copied from BertIntermediate:

transformers/src/transformers/models/bert/modeling_bert.py

Line 439 in 6c62cfb

class BertIntermediate(nn.Module):

can you add a # Copied from statement?

Here I meant:

# Copied from transformers.models.bert.modeling_bert.BertIntermediate with Bert->CPMAnt

Check for example here:

transformers/src/transformers/models/switch_transformers/modeling_switch_transformers.py

Line 233 in b29e2dc

# Copied from transformers.models.t5.modeling_t5.T5LayerNorm with T5->SwitchTransformers

younesbelkada · 2023-02-03T19:08:37Z

src/transformers/models/cpmant/modeling_cpmant.py

+        return relative_buckets
+
+
+class CPMAntOutput(nn.Module):


This seems to be copied from BertOutput:

transformers/src/transformers/models/bert/modeling_bert.py

Line 454 in 6c62cfb

class BertOutput(nn.Module):

Yes, we added a statement.

Same as above, what I meant is:

# Copied from transformers.models.bert.modeling_bert.BertOutput with Bert->CPMAnt

Check

transformers/src/transformers/models/switch_transformers/modeling_switch_transformers.py

Line 233 in b29e2dc

# Copied from transformers.models.t5.modeling_t5.T5LayerNorm with T5->SwitchTransformers

younesbelkada · 2023-02-03T19:09:35Z

src/transformers/models/cpmant/modeling_cpmant.py

+        position_bias = self.position_bias(position, position, segment, segment)
+
+        hidden_states = self.encoder(hidden_states, attention_mask, position_bias)
+        logits = F.linear(hidden_states, self.input_embedding.weight)


can't you call self.input_embedding directly?

Maybe not, self.input_embedding works for input_ids, but not for hidden_states.

younesbelkada · 2023-02-03T19:09:57Z

src/transformers/models/cpmant/modeling_cpmant.py

+        if not return_dict:
+            return tuple(v for v in [logits, hidden_states] if v is not None)
+
+        return BaseModelOutput(hidden_states=hidden_states)


some attributes are missing such as the attention outputs etc.

We added some attributes.

younesbelkada · 2023-02-03T19:10:30Z

src/transformers/models/cpmant/modeling_cpmant.py

+        span: Optional[torch.Tensor] = None,
+        return_dict: Optional[bool] = False,
+        **kwargs,


keyword arguments that are needed for public models such as output_attentions seem to be missing here

We added the output_attentions .

pioliverse · 2023-02-07T08:20:25Z

I am a bit surprised that when I use make style, some other files are also reformatted, which causes check_code_quality to fail.

younesbelkada · 2023-02-07T14:23:41Z

Hi @pioliverse
You need to rebase with main branch as the styling has been updated for most of the files in transformers , and update your black version as follows:

pip install --upgrade -e .["quality"]

Then make style or make fixup

pioliverse · 2023-02-07T16:01:37Z

Hi @pioliverse You need to rebase with main branch as the styling has been updated for most of the files in transformers , and update your black version as follows:
pip install --upgrade -e .["quality"]
Then make style or make fixup

Thanks @younesbelkada , this has been solved.

pioliverse · 2023-02-09T09:02:02Z

Thanks a lot for addressing most of the comments of the previous review! And thank you for your huge work on refactoring the modeling script I left some comments, mostly nits that can be solved easily. Note that for arguments such as use_cache etc, we prefer to pass them through the forward pass rather than setting them as a class attribute. Also, please consider passing a CPMAntConfig for the classes that have several attributes such as CPMAntEncoder Make sure also to correctly pass the required keyword arguments such as past_key_values, output_attentions etc, that are crucial for caching mechanism. You can check how this is done in OPT for example Finally, the naming convention in transformers has changed a bit, we prefer to name models with a single capital letter (i.e. here CPMAnt -> Cpmant) Again thanks for your efforts on this! Once the comments being solved, we should be very close merging this!

Thanks for your review @younesbelkada , we have modified some code.

We pass the use_cache in forward function from a class attribute.

We simplify the code for the class attribute assignment and replace it with CPMAntconfig.

We added past_key_values and output_attentions in forward of CPMAntModel.

I kind of wonder if all files that contain the name CPMAnt should be changed to Cpmant?

gongbaitao · 2023-02-21T15:27:42Z

Hi @younesbelkada , I am a member of OpenBMB, and I will help @pioliverse finish this PR.

All the issues mentioned above have been resolved. Please kindly have a look.

For the unit tests, I rebase pioliverse:cpmantmodel with huggingface:main, but it cannot pass the test. It seems some other models cause the failure?

for instance, in tests_onnx I met the error:
ERROR tests/models/altclip/test_modeling_altclip.py ============ 72 passed, 551 skipped, 29 warnings, 1 error in 28.26s ============
How can I avoid such error?

younesbelkada · 2023-02-27T15:27:37Z

Hi @gongbaitao
Thanks for jumping in! And sorry for the delay
Rebasing with main should be probably solve this issue, will look into the PR asap, let me know once you think this is ready for review!

gongbaitao · 2023-03-05T16:14:16Z

@younesbelkada @sgugger Thanks for the valued comments!
According to the new comments, I have dropped some redundant codes, and rename the model class in a camel-cased way
: )

sgugger

Thanks for iterating! There are still a couple of issues here and there. Also the test added for the tokenizer will need to be decorated with a requires_jieba (that you will need to define in testing_utils, similar to the other requires_xxx functions). Lastly, you need to also add an import error for jieba in import_utils.py so that requires_backend(["jieba"]) works without error.

sgugger · 2023-03-06T15:39:37Z

README.md

@@ -309,6 +309,7 @@ Current number of checkpoints: ![](https://img.shields.io/endpoint?url=https://h
 1. **[ConvBERT](https://huggingface.co/docs/transformers/model_doc/convbert)** (from YituTech) released with the paper [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https://arxiv.org/abs/2008.02496) by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan.
 1. **[ConvNeXT](https://huggingface.co/docs/transformers/model_doc/convnext)** (from Facebook AI) released with the paper [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) by Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie.
 1. **[CPM](https://huggingface.co/docs/transformers/model_doc/cpm)** (from Tsinghua University) released with the paper [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https://arxiv.org/abs/2012.00413) by Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun.
+1. **[CpmAnt](https://huggingface.co/docs/transformers/main/model_doc/cpmant)** (from OpenBMB) released by the [OpenBMB](https://www.openbmb.org/).


Suggested change

1. **[CpmAnt](https://huggingface.co/docs/transformers/main/model_doc/cpmant)** (from OpenBMB) released by the [OpenBMB](https://www.openbmb.org/).

1. **[CPMAnt](https://huggingface.co/docs/transformers/main/model_doc/cpmant)** (from OpenBMB) released by the [OpenBMB](https://www.openbmb.org/).

This should use the model name casing. It's only the model/config/tokenizer classes that should be CpmAntXxx

sgugger · 2023-03-06T15:40:05Z

src/transformers/models/auto/configuration_auto.py

@@ -402,6 +404,7 @@
        ("convbert", "ConvBERT"),
        ("convnext", "ConvNeXT"),
        ("cpm", "CPM"),
+        ("cpmant", "CpmAnt"),


Suggested change

("cpmant", "CpmAnt"),

("cpmant", "CPM-Ant"),

sgugger · 2023-03-06T15:40:29Z

src/transformers/models/cpmant/configuration_cpmant.py

+logger = logging.get_logger(__name__)
+
+CPMANT_PRETRAINED_CONFIG_ARCHIVE_MAP = {
+    "cpm-ant-10b": "https://huggingface.co/openbmb/cpm-ant-10b/blob/main/config.json"


Suggested change

"cpm-ant-10b": "https://huggingface.co/openbmb/cpm-ant-10b/blob/main/config.json"

"openbmb/cpm-ant-10b": "https://huggingface.co/openbmb/cpm-ant-10b/blob/main/config.json"

sgugger · 2023-03-06T15:40:36Z

src/transformers/models/cpmant/configuration_cpmant.py

+    This is the configuration class to store the configuration of a [`CpmAntModel`]. It is used to instantiate an
+    CPMAnt model according to the specified arguments, defining the model architecture. Instantiating a configuration
+    with the defaults will yield a similar configuration to that of the CPMAnt
+    [cpm-ant-10b](https://huggingface.co/openbmb/cpm-ant-10b) architecture.


Suggested change

[cpm-ant-10b](https://huggingface.co/openbmb/cpm-ant-10b) architecture.

[openbmb/cpm-ant-10b](https://huggingface.co/openbmb/cpm-ant-10b) architecture.

sgugger · 2023-03-06T15:40:51Z

src/transformers/models/cpmant/tokenization_cpmant.py

+
+PRETRAINED_VOCAB_FILES_MAP = {
+    "vocab_file": {
+        "cpm-ant-10b": "https://huggingface.co/openbmb/cpm-ant-10b/blob/main/vocab.txt",


Suggested change

"cpm-ant-10b": "https://huggingface.co/openbmb/cpm-ant-10b/blob/main/vocab.txt",

"openbmb/cpm-ant-10b": "https://huggingface.co/openbmb/cpm-ant-10b/blob/main/vocab.txt",

sgugger · 2023-03-06T15:40:57Z

src/transformers/models/cpmant/tokenization_cpmant.py

+}
+
+PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = {
+    "cpm-ant-10b": 1024,


Suggested change

"cpm-ant-10b": 1024,

"openbmb/cpm-ant-10b": 1024,

sgugger · 2023-03-06T15:41:13Z

src/transformers/models/cpmant/tokenization_cpmant.py

+        bod_token (`str`, *optional*, defaults to `<d>`):
+            The beginning of document token.
+        eod_token (`str`, *optional*, defaults to `</d>`):
+            The end of document token.
+        bos_token (`str`, *optional*, defaults to `<s>`):
+            The beginning of sequence token.
+        eos_token (`str`, *optional*, defaults to `</s>`):
+            The end of sequence token.
+        pad_token (`str`, *optional*, defaults to `<pad>`):
+            The token used for padding.
+        unk_token (`str`, *optional*, defaults to `<unk>`):
+            The unknown token.
+        line_token (`str`, *optional*, defaults to `</n>`):
+            The line token.
+        space_token (`str`, *optional*, defaults to `</_>`):


Default values are all missing double quotes here.

sgugger · 2023-03-06T15:42:27Z

tests/models/cpmant/test_tokenization_cpmant.py

+    from transformers.models.cpmant import CpmAntTokenizer
+
+
+@unittest.skip("CPMAntTokenizer process vocab in list format, so we skip the common test.")


The test below won't be executed because of this global skip here.

I skip the test because it's load_vocab logic is different from the TokenizerTesterMixin. While refactor is not that convenient and necessary i think, so I just skip it. Is it OK? Need I make some changes?

No this is not ok, since the test you wrote below will never be executed. If all tests fail in the TokenizerTesterMixin, that means your tokenizer does not have an API consistent with the other tokenizers of Transformers, and thus we can't accept it. You should fix your tokenizer so that it passes most of the tests of the common tester.

OK I will fix it, thanks!

The test below won't be executed because of this global skip here.

Solved.

github-actions · 2023-03-31T15:02:28Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

gongbaitao · 2023-04-10T03:09:35Z

Hi @sgugger @younesbelkada , sorry for the delay!
In the last few weeks, I have fixed the problems mentioned above and refactored the CPMAnt tokenizer. Please kindly have a look again, thanks for your help!

sgugger

Thanks for iterating! We're almost good to go, just a couple of comments left to address in the tests.

sgugger · 2023-04-10T12:39:21Z

tests/models/cpmant/test_modeling_cpmant.py

+@require_torch
+class CPMAntModelIntegrationTest(unittest.TestCase):
+    @slow
+    @unittest.skip("skip this test as the model is very large for our daily runner")


Corresponding changes here do not seem to have been pushed.

sgugger · 2023-04-10T12:39:27Z

tests/models/cpmant/test_modeling_cpmant.py

+@require_torch
+class CPMAntForCausalLMlIntegrationTest(unittest.TestCase):
+    @slow
+    @unittest.skip("skip this test as the model is very large for our daily runner")


sgugger · 2023-04-10T12:39:40Z

tests/models/cpmant/test_tokenization_cpmant.py

+
+@custom_tokenizers
+class CPMAntTokenizationTest(TokenizerTesterMixin, unittest.TestCase):
+    def test_pre_tokenization(self):


Still needs to be addressed.

sgugger · 2023-04-10T12:40:01Z

src/transformers/models/cpmant/modeling_cpmant.py

+    def __init__(
+        self,
+        config: CpmAntConfig,
+    ):


Suggested change

def __init__(

self,

config: CpmAntConfig,

):

def __init__(self, config: CpmAntConfig):

It seems to be result of make style. After I fits the code in one line, I cannot pass the
CI check in code quality.

That's because you are not copy-pasting this suggestion as is (there is a button to accept it directly in GitHub FYI) but are leaving a trailing comma.

Sorry I misunderstand the problem as fits in one line. Now the trailing comma has beed fixed:)

sgugger · 2023-04-10T12:40:29Z

src/transformers/models/cpmant/modeling_cpmant.py

+    def __init__(
+        self,
+        config: CpmAntConfig,
+    ):


Suggested change

def __init__(

self,

config: CpmAntConfig,

):

def __init__(self, config: CpmAntConfig):

gongbaitao · 2023-04-11T03:51:18Z

Thanks for your quick review! @sgugger
It seems this problem #20906 (comment) is because the changed file didn't show all commits. Maybe check this page https://github.com/huggingface/transformers/pull/20906/files will be helpful:)
As the #20906 (comment), it cannot pass the code quality check, so shall I keep it unchanged?

sgugger

Replied on the comment for the styling issue. The exact same line is present multiple times in the modeling file so I think you did not take the suggestion as it is written.

As for the tests, i'm sorry I was unclear: I meant that we should use the @tooslow decorator instead of skip.

sgugger · 2023-04-11T13:15:06Z

tests/models/cpmant/test_modeling_cpmant.py

+@require_torch
+class CPMAntModelIntegrationTest(unittest.TestCase):
+    @slow
+    @unittest.skip("skip this test as the model is very large for our daily runner")


Sorry I meant we should use the @tooslowdecorator here instead of skipping. You can import it from testing_utils

gongbaitao · 2023-04-12T03:37:37Z

@sgugger Thanks for your meaningful comments!
Sorry I forget to drop the trailing comma in styling issue. Now I have fixed the trailing comma problem and add tooslow decorator. Please kindly have a review:)

sgugger

Perfect, thanks for bearing with me :-)
Congrats on getting this new model merged into Transformers and thanks again for all your work!

* resolve conflicts * rebase and make style * test * test * test * rebase and make style * rebase and make style * tests * tests * rewrite some functions * rebase and make style * fix load_tf_weights_in_cpmant * reformat some unrelated files * upgrade quality * fix some bugs & docstring * add models and tests * solve conflicts * resolve conflicts * resolve conflicts * resolve conflicts * resolve conflicts * tests * resolve conflicts * resolve conflicts * fix load_tf_weights_in_cpmant * reformat some unrelated files * upgrade quality * fix some bugs & docstring * save resolution * make style * delete redefinition code * reformat function * reformat * resolve conflicts * resolve conflicts * resolve conflicts * resolve conflicts * resolve conflicts * tests * resolve conflicts * resolve conflicts * fix load_tf_weights_in_cpmant * reformat some unrelated files * upgrade quality * resolve conflicts * resolve conflicts * resolve conflicts * resolve conflicts * resolve conflicts * fix load_tf_weights_in_cpmant * reformat some unrelated files * upgrade quality * resolve conflicts * make style * fix bugs and refactor * modify docstrings and make style * unify import format in __init__.py * fix import-altclp bug * fix copies to update index.md * fix unused config parameters * fix unused config parameters * fix unused config parameters * update README_ja.md * dummy commit for unit test * fix attention mask * add CPMAntTokenizer&-Fast to auto-mapping * drop redundant changes in README_ko * fix defaults in docstring * fix use_cache and some docstring * add missing args in tokenizer * modify tester inheritance * add is_jieba_available * fix some bugs * make style and fix-copies * add doctests * skip integration tests * add is_jieba_available * fix bugs in common tests * adjust docstrings and make style * add argument docstring * adjust code to some specifications * make style and fix-copies * add fast tokenization test * dummy commit for unit test * dummy commit for unit test * dummy commit for unit test * normalize some comments and names * Bert->CPMAnt * camel names and drop redundant codes * make style and fix-coies * add CpmTokenizerFast _import_structure * drop cpmanttokenizerfast in model_doc * fix some problems * fix CPMAnt tokenization for common test * make style and fixup * fix copies and fixup * fix bugs in tokenization test * dummy commit for connection failure in unittest * fix copies * drop trailing comma * fix decorator in tests * dummy commit for connection failure in unittest --------- Co-authored-by: Gong Baitao <gongbaitao11@gmail.com>

younesbelkada reviewed Jan 5, 2023

View reviewed changes

pioliverse closed this Jan 12, 2023

pioliverse reopened this Jan 12, 2023

pioliverse marked this pull request as draft January 12, 2023 03:22

pioliverse changed the title ~~add model resources for CPMAnt (new)~~ 【WIP】add model resources for CPMAnt (new) Jan 12, 2023

pioliverse changed the title ~~【WIP】add model resources for CPMAnt (new)~~ [WIP] add model resources for CPMAnt (new) Jan 12, 2023

pioliverse changed the title ~~[WIP] add model resources for CPMAnt (new)~~ WIP: add model resources for CPMAnt (new) Jan 12, 2023

pioliverse changed the title ~~WIP: add model resources for CPMAnt (new)~~ add model resources for CPMAnt (new) Feb 2, 2023

pioliverse marked this pull request as ready for review February 2, 2023 10:58

younesbelkada reviewed Feb 3, 2023

View reviewed changes

gongbaitao force-pushed the cpmantmodel branch from ca65283 to 6cad05d Compare February 21, 2023 15:05

gongbaitao force-pushed the cpmantmodel branch from b20d325 to 8f55e39 Compare February 27, 2023 16:50

gongbaitao and others added 7 commits February 28, 2023 11:41

resolve conflicts

e97c7e0

rebase and make style

d20ca43

test

3a09047

test

b0dda30

test

108a519

rebase and make style

fc378c7

rebase and make style

16d557d

gongbaitao added 5 commits March 5, 2023 20:17

camel names and drop redundant codes

4226143

Merge remote-tracking branch 'upstream/main' into cpmantmodel

850b719

make style and fix-coies

1b2adf4

add CpmTokenizerFast _import_structure

6f2dd63

drop cpmanttokenizerfast in model_doc

438024b

sgugger reviewed Mar 6, 2023

View reviewed changes

gongbaitao added 7 commits April 7, 2023 11:43

fix some problems

b886289

fix CPMAnt tokenization for common test

6856103

solve conflict with upstream/main

9c6190a

make style and fixup

ba0607b

fix copies and fixup

0bd4d08

fix bugs in tokenization test

03472ef

dummy commit for connection failure in unittest

f9f87d8

sgugger reviewed Apr 10, 2023

View reviewed changes

gongbaitao force-pushed the cpmantmodel branch from e5bd7d8 to f9f87d8 Compare April 11, 2023 03:57

gongbaitao added 2 commits April 11, 2023 14:19

solve conflict

d2070ca

fix copies

34167c4

sgugger reviewed Apr 11, 2023

View reviewed changes

gongbaitao added 4 commits April 12, 2023 10:21

drop trailing comma

016a006

fix decorator in tests

5fe0cf0

Merge remote-tracking branch 'upstream/main' into cpmantmodel

338f3f6

dummy commit for connection failure in unittest

a6f5255

sgugger approved these changes Apr 12, 2023

View reviewed changes

sgugger merged commit 523ca4e into huggingface:main Apr 12, 2023

		# assert x.size(-1) == self.dim_norm
		if x.size(-1) != self.dim_norm:

		# assert key_pos.size(0) == query_pos.size(0)
		# assert keylen == key_segment.size(1) and querylen == query_segment.size(1)

		]


		def load_tf_weights_in_cpmant(model, config, tf_checkpoint_path):

		Indices of input sequence tokens of shape `(batch, len_q, dim_model)`. It will be embedded by model's
		internal embedding lookup matrix.

	past_kv (`Tuple(torch.FloatTensor)`, optional): Cached past key and value projection states.
	past_kv (`Tuple(torch.FloatTensor)`, optional):
	Cached past key and value projection states.

		return hidden_states, current_key_values


		class CPMAntIntermediate(nn.Module):

	1. [CpmAnt](https://huggingface.co/docs/transformers/main/model_doc/cpmant) (from OpenBMB) released by the [OpenBMB](https://www.openbmb.org/).
	1. [CPMAnt](https://huggingface.co/docs/transformers/main/model_doc/cpmant) (from OpenBMB) released by the [OpenBMB](https://www.openbmb.org/).

	"cpm-ant-10b": "https://huggingface.co/openbmb/cpm-ant-10b/blob/main/config.json"
	"openbmb/cpm-ant-10b": "https://huggingface.co/openbmb/cpm-ant-10b/blob/main/config.json"

	[cpm-ant-10b](https://huggingface.co/openbmb/cpm-ant-10b) architecture.
	[openbmb/cpm-ant-10b](https://huggingface.co/openbmb/cpm-ant-10b) architecture.

		from transformers.models.cpmant import CpmAntTokenizer


		@unittest.skip("CPMAntTokenizer process vocab in list format, so we skip the common test.")

add model resources for CPMAnt (new) #20906

add model resources for CPMAnt (new) #20906

Conversation

pioliverse commented Dec 27, 2022 • edited Loading

What does this PR do?

Before submitting

Who can review?

pioliverse commented Dec 27, 2022 • edited Loading

pioliverse commented Jan 3, 2023

younesbelkada left a comment

Choose a reason for hiding this comment

Docstring and comments:

dtype:

tests

general comments

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Jan 9, 2023 • edited Loading

HuggingFaceDocBuilderDev commented Jan 12, 2023 • edited Loading

pioliverse commented Feb 2, 2023

Docstring and comments:

dtype:

tests

general comments

younesbelkada left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pioliverse commented Feb 7, 2023

younesbelkada commented Feb 7, 2023

pioliverse commented Feb 7, 2023

pioliverse commented Feb 9, 2023

gongbaitao commented Feb 21, 2023 • edited Loading

younesbelkada commented Feb 27, 2023

gongbaitao commented Mar 5, 2023 • edited Loading

sgugger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pioliverse commented Dec 27, 2022 •

edited

Loading

pioliverse commented Dec 27, 2022 •

edited

Loading

`dtype`:

HuggingFaceDocBuilderDev commented Jan 9, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Jan 12, 2023 •

edited

Loading

`dtype`:

younesbelkada left a comment •

edited

Loading

gongbaitao commented Feb 21, 2023 •

edited

Loading

gongbaitao commented Mar 5, 2023 •

edited

Loading