Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing some warnings in DeBerta #8176

Merged
merged 2 commits into from
Oct 30, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions src/transformers/modeling_deberta.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
""" PyTorch DeBERTa model. """

import math
from collections import Sequence
from collections.abc import Sequence

import torch
from packaging import version
Expand Down Expand Up @@ -480,7 +480,7 @@ class DisentangledSelfAttention(torch.nn.Module):
Parameters:
config (:obj:`str`):
A model config class instance with the configuration to build a new model. The schema is similar to
`BertConfig`, \ for more details, please refer :class:`~transformers.DebertaConfig`
`BertConfig`, for more details, please refer :class:`~transformers.DebertaConfig`

"""

Expand Down
10 changes: 5 additions & 5 deletions src/transformers/tokenization_deberta.py
Original file line number Diff line number Diff line change
Expand Up @@ -297,15 +297,15 @@ class GPT2Tokenizer(object):
Args:
vocab_file (:obj:`str`, optional):
The local path of vocabulary package or the release name of vocabulary in `DeBERTa GitHub releases
<https://github.com/microsoft/DeBERTa/releases>`_, \ e.g. "bpe_encoder", default: `None`.
<https://github.com/microsoft/DeBERTa/releases>`_, e.g. "bpe_encoder", default: `None`.

If it's `None`, then it will download the vocabulary in the latest release from GitHub. The vocabulary file
is a \ state dictionary with three items, "dict_map", "vocab", "encoder" which correspond to three files
used in `RoBERTa`, i.e. `dict.txt`, `vocab.txt` and `encoder.json`. \ The difference between our wrapped
GPT2 tokenizer and RoBERTa wrapped tokenizer are,
is a state dictionary with three items, "dict_map", "vocab", "encoder" which correspond to three files used
in `RoBERTa`, i.e. `dict.txt`, `vocab.txt` and `encoder.json`. The difference between our wrapped GPT2
tokenizer and RoBERTa wrapped tokenizer are,

- Special tokens, unlike `RoBERTa` which use `<s>`, `</s>` as the `start` token and `end` token of a
sentence. We use `[CLS]` and `[SEP]` as the `start` and `end`\ token of input sentence which is the same
sentence. We use `[CLS]` and `[SEP]` as the `start` and `end` token of input sentence which is the same
as `BERT`.

- We remapped the token ids in our dictionary with regarding to the new special tokens, `[PAD]` => 0,
Expand Down