-
Notifications
You must be signed in to change notification settings - Fork 351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KeyError: '[SEP]' #50
Comments
Hi @elyesmanai, do you train with TPU? Then the |
hello, yes i'm using tpus and put everything in cloud bucket and still got that |
Could you post the output of I've already trained some models with own vocab and pre-training was always working 🤔 |
turns out I was not changing the link to the vocab which has to be done in the configure_pretrain.py file. changed it and it works |
when running run_pretraining.py I get this error before it pretrains:
================================================================================
Running training
2020-04-28 04:43:55.132186: W tensorflow/core/distributed_runtime/rpc/grpc_session.cc:356] GrpcSession::ListDevices will initialize the session with an empty graph and other
defaults because the session has not yet been created.
ERROR:tensorflow:Error recorded from training_loop: '[SEP]'
Traceback (most recent call last):
File "run_pretraining.py", line 384, in
main()
.
(lines ignored because they're not useful)
.
File "/home/manai_elye2s/pretrain/electra/pretrain/pretrain_helpers.py", line 121, in _get_candidates_mask
ignore_ids = [vocab["[SEP]"], vocab["[CLS]"], vocab["[MASK]"]]
KeyError: '[SEP]'
I got this both with my own vocab and the default one I downloaded from this repo.
In both vocab.txt files there are the [SEP] [CLS] and [MASK] tokens, without space
The text was updated successfully, but these errors were encountered: