Skip to content

Latest commit

 

History

History
19 lines (12 loc) · 515 Bytes

README.md

File metadata and controls

19 lines (12 loc) · 515 Bytes

bert-character-mlm

character tokenizer using BertTokenizer (uncased)

bert-char-mlm

Usages

Charcter tokenizer & Character MLM

from transformers import AutoTokenizer, BertForMaskedLM, BertConfig

MODEL_NAME = 'char-bert-base-uncased'
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, use_fast=True)

config = BertConfig(vocab_size=len(tokenizer))
model = BertForMaskedLM(config)