Skip to content

Deberta tokenizer.detokenize() errors out with mask token #829

@mattdangerw

Description

@mattdangerw

When working on the Deberta masked language model, we had to do some special treatment for the mask token in the tokenizer.

We left one outstanding bug on the main PR, which is that detokenize will error out with a mask token. See:
#732 (comment)

Here's a colab:
https://colab.research.google.com/gist/mattdangerw/5164a7cad80e9f5fcbb9a495264f80e1/deberta-detokenize-error.ipynb

We should either strip or properly render the mask token during detokenize so the call does not error out.

Metadata

Metadata

Assignees

Labels

stat:contributions welcomeAdd this label to feature request issues so they are separated out from bug reporting issuestype:BugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions