-
Notifications
You must be signed in to change notification settings - Fork 25.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add model resources for CPMAnt (new) #20906
Conversation
@younesbelkada Thanks for your patience in reviewing, I followed OPT convention and made the following changes:
|
@younesbelkada Thanks again for your patience in reviewing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for your patience ! Looks pretty clean thank you! We should be close merging this once most of the comments are addressed. My comments being:
Docstring and comments:
- please harmonize the function docstrings to match the convention
transformers
model follow - please make sure to clean up some comments
- Also would be nice to add a small explanation on the code on why
generate
needs to be overriden
dtype
:
- I don't think the argument
dtype
is needed. The dtype of the whole model is managed by the kwargtorch_dtype
so you can load your model usingmodel = xxxForCausalLM.from_pretrained(xxx, torch_dtype=torch.float16)
ortorch_dtype="auto"
(if the weights are pushed in fp16) and the model will be loaded in the desired precision.
tests
- I think that a test is failing, please double check that
general comments
- For classes that are public (i.e. that are ported in
__init__.py
, basicallyCPMAntModel
&.CPMAntForCausalLM
it is preferable to adopt this logic: https://github.com/huggingface/transformers/blob/main/src/transformers/models/gpt_neox_japanese/modeling_gpt_neox_japanese.py#L693-L703 --> outputa tuple if notreturn_dict
otherwise return a dataclass. Please check other modeling files as reference - you can wrap
attention_mask
creation process inside class methods, e.g._prepare_attention_mask(
Thanks!
# assert ( | ||
# pointer.shape == array.shape | ||
# ), f"Pointer shape {pointer.shape} and array shape {array.shape} mismatched" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to clean up!
# assert x.size(-1) == self.dim_norm | ||
if x.size(-1) != self.dim_norm: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to clean up
dim_model (int): Main dimension of modules in transformer blocks. | ||
num_heads (int): Number of attention heads in the Transformer encoder. | ||
dim_head (int): Dimension of attention heads for each attention layer in the Transformer encoder. | ||
dtype (optional): Defaults to torch.float. | ||
eps (float, optional): The epsilon used by the layer normalization layers. | ||
dropout_p (float, optional): Defaults to 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please respect the convention as above ;)
dim_model (`int`):
Main dimension of ...
...
dim_model (int): Main dimension of modules in transformer blocks. | ||
dim_ff (int): Dimension of the "intermediate" (i.e., feed-forward) layer in the Transformer encoder. | ||
dtype (optional): Defaults to torch.float. | ||
eps (float, optional): The epsilon used by the layer normalization layers. | ||
dropout_p (float, optional): Defaults to 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above!
num_layers (int): Number of layers. | ||
dim_model (int): Main dimension of modules in transformer blocks. | ||
dim_ff (int): Dimension of the "intermediate" (i.e., feed-forward) layer in the Transformer encoder. | ||
num_heads (int): Number of attention heads in the Transformer encoder. | ||
dim_head (int): Dimension of attention heads for each attention layer in the Transformer encoder. | ||
dtype (optional): Defaults to torch.float. | ||
eps (float, optional): The epsilon used by the layer normalization layers. | ||
dropout_p (float, optional): Defaults to 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
attention_mask, | ||
position_bias, | ||
past_key_value=past_key_values[i] if past_key_values else None, | ||
use_cache=use_cache, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that use_cache
is also passed here, I am slightly confused why you don't pass it above too
# assert key_pos.size(0) == query_pos.size(0) | ||
# assert keylen == key_segment.size(1) and querylen == query_segment.size(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to clean up
with torch.no_grad(): | ||
device = input.device | ||
directional_mask_2d = torch.arange(seqlen, device=device) <= torch.arange(seqlen, device=device).view( | ||
-1, 1 | ||
) | ||
attention_mask = context[:, None, :] | ( | ||
context[:, :, None].logical_not() & directional_mask_2d.view(1, seqlen, seqlen) | ||
) | ||
attention_mask = attention_mask & (span[:, None, :] == span[:, :, None]) | ||
mask_1d = torch.arange(seqlen, device=device)[None, :].repeat(batch, 1) < length[:, None] | ||
attention_mask = mask_1d.view(batch, seqlen, 1) & mask_1d.view(batch, 1, seqlen) & attention_mask |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: You can probably wrap that in a class method _prepare_attention_mask
with torch.no_grad(): | ||
device = input.device | ||
directional_mask_2d = torch.arange(seqlen, device=device) <= torch.arange(seqlen, device=device).view( | ||
-1, 1 | ||
) | ||
attention_mask = context[:, None, :] | ( | ||
context[:, :, None].logical_not() & directional_mask_2d.view(1, seqlen, seqlen) | ||
) | ||
attention_mask = attention_mask & (span[:, None, :] == span[:, :, None]) | ||
# mask for left paddding | ||
mask_1d = ( | ||
torch.tensor(list(range(seqlen))[::-1], device=device)[None, :].repeat(batch, 1) < length[:, None] | ||
) | ||
attention_mask = mask_1d.view(batch, seqlen, 1) & mask_1d.view(batch, 1, seqlen) & attention_mask |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same, you can wrap that in a method
hidden_states, attention_mask, position_bias, True, past_key_values | ||
) | ||
logits = self.lm_head(hidden_states) | ||
return logits, hidden_states, present_key_values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you return a dataclass object instead? For example:
transformers/src/transformers/models/gpt_neox_japanese/modeling_gpt_neox_japanese.py
Line 697 in 8fb4d0e
return CausalLMOutputWithPast( |
The documentation is not available anymore as the PR was closed or merged. |
The documentation is not available anymore as the PR was closed or merged. |
Hi @younesbelkada, we have made some changes as follows:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for addressing most of the comments of the previous review! And thank you for your huge work on refactoring the modeling script
I left some comments, mostly nits that can be solved easily. Note that for arguments such as use_cache
etc, we prefer to pass them through the forward pass rather than setting them as a class attribute.
Also, please consider passing a CPMAntConfig
for the classes that have several attributes such as CPMAntEncoder
Make sure also to correctly pass the required keyword arguments such as past_key_values
, output_attentions
etc, that are crucial for caching mechanism. You can check how this is done in OPT for example
Finally, the naming convention in transformers
has changed a bit, we prefer to name models with a single capital letter (i.e. here CPMAnt -> Cpmant
)
Again thanks for your efforts on this! Once the comments being solved, we should be very close merging this!
] | ||
|
||
|
||
def load_tf_weights_in_cpmant(model, config, tf_checkpoint_path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be a function that is adapted from:
def load_tf_weights_in_bert(model, config, tf_checkpoint_path): |
# Adapted from
statement on the top of the function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be removed entirely, there are no TF weights to convert here, no?
|
||
super().__init__() | ||
|
||
self.eps = eps | ||
self.dim_norm = dim_norm | ||
self.weight = torch.nn.parameter.Parameter(torch.full((dim_norm,), init_var)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super().__init__() | |
self.eps = eps | |
self.dim_norm = dim_norm | |
self.weight = torch.nn.parameter.Parameter(torch.full((dim_norm,), init_var)) | |
super().__init__() | |
self.eps = eps | |
self.dim_norm = dim_norm | |
self.weight = torch.nn.parameter.Parameter(torch.full((dim_norm,), init_var)) |
"""RMS LayerNorm""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add more description here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We added some comments.
Indices of input sequence tokens of shape `(batch, len_q, dim_model)`. It will be embedded by model's | ||
internal embedding lookup matrix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This description seems to be wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revised.
Avoid invalid areas to participate in the calculation of self-attention. | ||
position_bias (`torch.Tensor` of shape `(batch, len_seq, len_seq)`): | ||
Provide positional information to self-attention block. | ||
past_kv (`Tuple(torch.FloatTensor)`, *optional*): Cached past key and value projection states. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
past_kv (`Tuple(torch.FloatTensor)`, *optional*): Cached past key and value projection states. | |
past_kv (`Tuple(torch.FloatTensor)`, *optional*): | |
Cached past key and value projection states. |
return hidden_states, current_key_values | ||
|
||
|
||
class CPMAntIntermediate(nn.Module): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be copied from BertIntermediate
:
class BertIntermediate(nn.Module): |
# Copied from
statement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I meant:
# Copied from transformers.models.bert.modeling_bert.BertIntermediate with Bert->CPMAnt
Check for example here:
transformers/src/transformers/models/switch_transformers/modeling_switch_transformers.py
Line 233 in b29e2dc
# Copied from transformers.models.t5.modeling_t5.T5LayerNorm with T5->SwitchTransformers |
return relative_buckets | ||
|
||
|
||
class CPMAntOutput(nn.Module): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be copied from BertOutput
:
class BertOutput(nn.Module): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we added a statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, what I meant is:
# Copied from transformers.models.bert.modeling_bert.BertOutput with Bert->CPMAnt
Check
transformers/src/transformers/models/switch_transformers/modeling_switch_transformers.py
Line 233 in b29e2dc
# Copied from transformers.models.t5.modeling_t5.T5LayerNorm with T5->SwitchTransformers |
position_bias = self.position_bias(position, position, segment, segment) | ||
|
||
hidden_states = self.encoder(hidden_states, attention_mask, position_bias) | ||
logits = F.linear(hidden_states, self.input_embedding.weight) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't you call self.input_embedding
directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe not, self.input_embedding
works for input_ids
, but not for hidden_states
.
if not return_dict: | ||
return tuple(v for v in [logits, hidden_states] if v is not None) | ||
|
||
return BaseModelOutput(hidden_states=hidden_states) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some attributes are missing such as the attention outputs etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We added some attributes.
span: Optional[torch.Tensor] = None, | ||
return_dict: Optional[bool] = False, | ||
**kwargs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keyword arguments that are needed for public models such as output_attentions
seem to be missing here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We added the output_attentions
.
I am a bit surprised that when I use |
Hi @pioliverse
Then |
Thanks @younesbelkada , this has been solved. |
Thanks for your review @younesbelkada , we have modified some code.
|
ca65283
to
6cad05d
Compare
Hi @younesbelkada , I am a member of OpenBMB, and I will help @pioliverse finish this PR. All the issues mentioned above have been resolved. Please kindly have a look. For the unit tests, I rebase for instance, in tests_onnx I met the error: |
Hi @gongbaitao |
b20d325
to
8f55e39
Compare
@younesbelkada @sgugger Thanks for the valued comments! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating! There are still a couple of issues here and there. Also the test added for the tokenizer will need to be decorated with a requires_jieba
(that you will need to define in testing_utils, similar to the other requires_xxx functions). Lastly, you need to also add an import error for jieba in import_utils.py
so that requires_backend(["jieba"])
works without error.
README.md
Outdated
@@ -309,6 +309,7 @@ Current number of checkpoints: ![](https://img.shields.io/endpoint?url=https://h | |||
1. **[ConvBERT](https://huggingface.co/docs/transformers/model_doc/convbert)** (from YituTech) released with the paper [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https://arxiv.org/abs/2008.02496) by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan. | |||
1. **[ConvNeXT](https://huggingface.co/docs/transformers/model_doc/convnext)** (from Facebook AI) released with the paper [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) by Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie. | |||
1. **[CPM](https://huggingface.co/docs/transformers/model_doc/cpm)** (from Tsinghua University) released with the paper [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https://arxiv.org/abs/2012.00413) by Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun. | |||
1. **[CpmAnt](https://huggingface.co/docs/transformers/main/model_doc/cpmant)** (from OpenBMB) released by the [OpenBMB](https://www.openbmb.org/). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1. **[CpmAnt](https://huggingface.co/docs/transformers/main/model_doc/cpmant)** (from OpenBMB) released by the [OpenBMB](https://www.openbmb.org/). | |
1. **[CPMAnt](https://huggingface.co/docs/transformers/main/model_doc/cpmant)** (from OpenBMB) released by the [OpenBMB](https://www.openbmb.org/). |
This should use the model name casing. It's only the model/config/tokenizer classes that should be CpmAntXxx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solved.
@@ -402,6 +404,7 @@ | |||
("convbert", "ConvBERT"), | |||
("convnext", "ConvNeXT"), | |||
("cpm", "CPM"), | |||
("cpmant", "CpmAnt"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
("cpmant", "CpmAnt"), | |
("cpmant", "CPM-Ant"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solved.
logger = logging.get_logger(__name__) | ||
|
||
CPMANT_PRETRAINED_CONFIG_ARCHIVE_MAP = { | ||
"cpm-ant-10b": "https://huggingface.co/openbmb/cpm-ant-10b/blob/main/config.json" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"cpm-ant-10b": "https://huggingface.co/openbmb/cpm-ant-10b/blob/main/config.json" | |
"openbmb/cpm-ant-10b": "https://huggingface.co/openbmb/cpm-ant-10b/blob/main/config.json" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solved.
This is the configuration class to store the configuration of a [`CpmAntModel`]. It is used to instantiate an | ||
CPMAnt model according to the specified arguments, defining the model architecture. Instantiating a configuration | ||
with the defaults will yield a similar configuration to that of the CPMAnt | ||
[cpm-ant-10b](https://huggingface.co/openbmb/cpm-ant-10b) architecture. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[cpm-ant-10b](https://huggingface.co/openbmb/cpm-ant-10b) architecture. | |
[openbmb/cpm-ant-10b](https://huggingface.co/openbmb/cpm-ant-10b) architecture. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solved.
|
||
PRETRAINED_VOCAB_FILES_MAP = { | ||
"vocab_file": { | ||
"cpm-ant-10b": "https://huggingface.co/openbmb/cpm-ant-10b/blob/main/vocab.txt", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"cpm-ant-10b": "https://huggingface.co/openbmb/cpm-ant-10b/blob/main/vocab.txt", | |
"openbmb/cpm-ant-10b": "https://huggingface.co/openbmb/cpm-ant-10b/blob/main/vocab.txt", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solved.
} | ||
|
||
PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = { | ||
"cpm-ant-10b": 1024, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"cpm-ant-10b": 1024, | |
"openbmb/cpm-ant-10b": 1024, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solved.
bod_token (`str`, *optional*, defaults to `<d>`): | ||
The beginning of document token. | ||
eod_token (`str`, *optional*, defaults to `</d>`): | ||
The end of document token. | ||
bos_token (`str`, *optional*, defaults to `<s>`): | ||
The beginning of sequence token. | ||
eos_token (`str`, *optional*, defaults to `</s>`): | ||
The end of sequence token. | ||
pad_token (`str`, *optional*, defaults to `<pad>`): | ||
The token used for padding. | ||
unk_token (`str`, *optional*, defaults to `<unk>`): | ||
The unknown token. | ||
line_token (`str`, *optional*, defaults to `</n>`): | ||
The line token. | ||
space_token (`str`, *optional*, defaults to `</_>`): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default values are all missing double quotes here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solved.
from transformers.models.cpmant import CpmAntTokenizer | ||
|
||
|
||
@unittest.skip("CPMAntTokenizer process vocab in list format, so we skip the common test.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test below won't be executed because of this global skip here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I skip the test because it's load_vocab
logic is different from the TokenizerTesterMixin
. While refactor is not that convenient and necessary i think, so I just skip it. Is it OK? Need I make some changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No this is not ok, since the test you wrote below will never be executed. If all tests fail in the TokenizerTesterMixin, that means your tokenizer does not have an API consistent with the other tokenizers of Transformers, and thus we can't accept it. You should fix your tokenizer so that it passes most of the tests of the common tester.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I will fix it, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test below won't be executed because of this global skip here.
Solved.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Hi @sgugger @younesbelkada , sorry for the delay! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating! We're almost good to go, just a couple of comments left to address in the tests.
@require_torch | ||
class CPMAntModelIntegrationTest(unittest.TestCase): | ||
@slow | ||
@unittest.skip("skip this test as the model is very large for our daily runner") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corresponding changes here do not seem to have been pushed.
@require_torch | ||
class CPMAntForCausalLMlIntegrationTest(unittest.TestCase): | ||
@slow | ||
@unittest.skip("skip this test as the model is very large for our daily runner") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
|
||
@custom_tokenizers | ||
class CPMAntTokenizationTest(TokenizerTesterMixin, unittest.TestCase): | ||
def test_pre_tokenization(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still needs to be addressed.
def __init__( | ||
self, | ||
config: CpmAntConfig, | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def __init__( | |
self, | |
config: CpmAntConfig, | |
): | |
def __init__(self, config: CpmAntConfig): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to be result of make style
. After I fits the code in one line, I cannot pass the
CI check in code quality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's because you are not copy-pasting this suggestion as is (there is a button to accept it directly in GitHub FYI) but are leaving a trailing comma.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I misunderstand the problem as fits in one line. Now the trailing comma has beed fixed:)
def __init__( | ||
self, | ||
config: CpmAntConfig, | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def __init__( | |
self, | |
config: CpmAntConfig, | |
): | |
def __init__(self, config: CpmAntConfig): |
Thanks for your quick review! @sgugger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replied on the comment for the styling issue. The exact same line is present multiple times in the modeling file so I think you did not take the suggestion as it is written.
As for the tests, i'm sorry I was unclear: I meant that we should use the @tooslow
decorator instead of skip
.
@require_torch | ||
class CPMAntModelIntegrationTest(unittest.TestCase): | ||
@slow | ||
@unittest.skip("skip this test as the model is very large for our daily runner") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I meant we should use the @tooslow
decorator here instead of skipping. You can import it from testing_utils
@sgugger Thanks for your meaningful comments! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect, thanks for bearing with me :-)
Congrats on getting this new model merged into Transformers and thanks again for all your work!
* resolve conflicts * rebase and make style * test * test * test * rebase and make style * rebase and make style * tests * tests * rewrite some functions * rebase and make style * fix load_tf_weights_in_cpmant * reformat some unrelated files * upgrade quality * fix some bugs & docstring * add models and tests * solve conflicts * resolve conflicts * resolve conflicts * resolve conflicts * resolve conflicts * tests * resolve conflicts * resolve conflicts * fix load_tf_weights_in_cpmant * reformat some unrelated files * upgrade quality * fix some bugs & docstring * save resolution * make style * delete redefinition code * reformat function * reformat * resolve conflicts * resolve conflicts * resolve conflicts * resolve conflicts * resolve conflicts * tests * resolve conflicts * resolve conflicts * fix load_tf_weights_in_cpmant * reformat some unrelated files * upgrade quality * resolve conflicts * resolve conflicts * resolve conflicts * resolve conflicts * resolve conflicts * fix load_tf_weights_in_cpmant * reformat some unrelated files * upgrade quality * resolve conflicts * make style * fix bugs and refactor * modify docstrings and make style * unify import format in __init__.py * fix import-altclp bug * fix copies to update index.md * fix unused config parameters * fix unused config parameters * fix unused config parameters * update README_ja.md * dummy commit for unit test * fix attention mask * add CPMAntTokenizer&-Fast to auto-mapping * drop redundant changes in README_ko * fix defaults in docstring * fix use_cache and some docstring * add missing args in tokenizer * modify tester inheritance * add is_jieba_available * fix some bugs * make style and fix-copies * add doctests * skip integration tests * add is_jieba_available * fix bugs in common tests * adjust docstrings and make style * add argument docstring * adjust code to some specifications * make style and fix-copies * add fast tokenization test * dummy commit for unit test * dummy commit for unit test * dummy commit for unit test * normalize some comments and names * Bert->CPMAnt * camel names and drop redundant codes * make style and fix-coies * add CpmTokenizerFast _import_structure * drop cpmanttokenizerfast in model_doc * fix some problems * fix CPMAnt tokenization for common test * make style and fixup * fix copies and fixup * fix bugs in tokenization test * dummy commit for connection failure in unittest * fix copies * drop trailing comma * fix decorator in tests * dummy commit for connection failure in unittest --------- Co-authored-by: Gong Baitao <gongbaitao11@gmail.com>
What does this PR do?
Since the previous submission(#20711 ) had problems here and there, we have now resubmitted a new one.
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.