bert-base-multilingual-uncased vocabulary not consecutive

## 🐛 Bug

When I was checking out bert-base-multilingual-uncased vocabulary. I receive the warning "Saving vocabulary to ./vocab.txt: vocabulary indices are not consecutive. Please check that the vocabulary is not corrupted"

I ran the similar command on two different machine and got the same warning.

from pytorch_transformers import *
tokenizer = BertTokenizer.from_pretrained('bert-base-multilingual-uncased',do_lower_case=True)
tokenizer.save_vocabulary('./')

I ran it on 

* OS:
* Python version: python3.5 
* PyTorch version: pytorch1.0.1.post2
* PyTorch Transformers version (or branch): 1.0
* Using GPU ? Yes
* Distributed of parallel setup ?no

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bert-base-multilingual-uncased vocabulary not consecutive #990

🐛 Bug

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

bert-base-multilingual-uncased vocabulary not consecutive #990

Description

🐛 Bug

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions