BERT tokenizer - set special tokens #599

adigoryl · 2019-05-10T08:38:43Z

Hi,

I was wondering whether the team could expand BERT so that fine-tuning with newly defined special tokens would be possible - just like the GPT allows.

@thomwolf Could you share your thought with me on that?

Regards,
Adrian.

thomwolf · 2019-05-10T09:48:12Z

Hi Adrian, BERT already has a few unused tokens that can be used similarly to the special_tokens of GPT/GPT-2.
For more details see google-research/bert#9 (comment) and issue #405 for instance.

AlanHassen · 2019-05-22T11:39:45Z

In case we use an unused special token from the vocabulary, is it enough to finetune a classification task or do we need to train an embedding from scratch? Did anyone already do this?

Two different and somehow related questions I had when looking into the implementation:

The Bert paper mentions a (learned) positional embedding. How is this implemented here? examples/extract_features/convert_examples_to_features() defines tokens (representation), input_type_ids (the difference between the first and second sequence) and an input_mask (distinguishing padding/real tokens) but no positional embedding. Is this done internally?
Can I use a special token as input_type_ids for Bert? In the classification example, only values of [0,1] are possible and I'm wondering what would happen if I would choose a special token instead? Is this possible with a pretrained embedding or do i need to retrain the whole embedding as a consequence?

stale · 2019-07-21T11:49:50Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

* Update transformers.js version * Update Token.jsx

stale bot added the wontfix label Jul 21, 2019

stale bot closed this as completed Jul 28, 2019

SeongIkKim mentioned this issue May 3, 2021

[TODO] Special Token 추가 관련 정보 탐색 및 예시코드 작성 VumBleBot/Group-Activity#20

Closed

alexisflive pushed a commit to alexisflive/transformers that referenced this issue Jun 9, 2024

Update tokenizer playground dependencies (huggingface#599)

ce4fd62

* Update transformers.js version * Update Token.jsx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BERT tokenizer - set special tokens #599

BERT tokenizer - set special tokens #599

adigoryl commented May 10, 2019

thomwolf commented May 10, 2019

AlanHassen commented May 22, 2019 •

edited

stale bot commented Jul 21, 2019

BERT tokenizer - set special tokens #599

BERT tokenizer - set special tokens #599

Comments

adigoryl commented May 10, 2019

thomwolf commented May 10, 2019

AlanHassen commented May 22, 2019 • edited

stale bot commented Jul 21, 2019

AlanHassen commented May 22, 2019 •

edited