KeyError: '[SEP]' #50

elyesmanai · 2020-04-28T04:47:41Z

when running run_pretraining.py I get this error before it pretrains:

================================================================================
Running training

2020-04-28 04:43:55.132186: W tensorflow/core/distributed_runtime/rpc/grpc_session.cc:356] GrpcSession::ListDevices will initialize the session with an empty graph and other
defaults because the session has not yet been created.
ERROR:tensorflow:Error recorded from training_loop: '[SEP]'
Traceback (most recent call last):
File "run_pretraining.py", line 384, in
main()
.
(lines ignored because they're not useful)
.
File "/home/manai_elye2s/pretrain/electra/pretrain/pretrain_helpers.py", line 121, in _get_candidates_mask
ignore_ids = [vocab["[SEP]"], vocab["[CLS]"], vocab["[MASK]"]]
KeyError: '[SEP]'

I got this both with my own vocab and the default one I downloaded from this repo.
In both vocab.txt files there are the [SEP] [CLS] and [MASK] tokens, without space

stefan-it · 2020-04-28T07:47:48Z

Hi @elyesmanai, do you train with TPU?

Then thevocab.txt file needs to be stored on your Google Bucket, e.g. located under gs://<bucket-name>/vocab.txt (if you didn't change the path in configure_pretraining.py).

elyesmanai · 2020-04-28T12:49:25Z

hello, yes i'm using tpus and put everything in cloud bucket and still got that

stefan-it · 2020-04-29T08:50:41Z

Could you post the output of grep "\]$" vocab.txt or could you point to the vocab file that you've found in the repo?

I've already trained some models with own vocab and pre-training was always working 🤔

elyesmanai · 2020-04-29T17:40:02Z

the output has been humongous so I changed it to grep "\SEP]$" and I got this

elyesmanai · 2020-04-30T01:07:57Z

turns out I was not changing the link to the vocab which has to be done in the configure_pretrain.py file. changed it and it works

elyesmanai closed this as completed Apr 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KeyError: '[SEP]' #50

KeyError: '[SEP]' #50

elyesmanai commented Apr 28, 2020 •

edited

Loading

stefan-it commented Apr 28, 2020

elyesmanai commented Apr 28, 2020

stefan-it commented Apr 29, 2020 •

edited

Loading

elyesmanai commented Apr 29, 2020

elyesmanai commented Apr 30, 2020

KeyError: '[SEP]' #50

KeyError: '[SEP]' #50

Comments

elyesmanai commented Apr 28, 2020 • edited Loading

================================================================================ Running training

stefan-it commented Apr 28, 2020

elyesmanai commented Apr 28, 2020

stefan-it commented Apr 29, 2020 • edited Loading

elyesmanai commented Apr 29, 2020

elyesmanai commented Apr 30, 2020

elyesmanai commented Apr 28, 2020 •

edited

Loading

================================================================================
Running training

stefan-it commented Apr 29, 2020 •

edited

Loading