Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

no entry found for key #260

Closed
djstrong opened this issue May 6, 2020 · 7 comments
Closed

no entry found for key #260

djstrong opened this issue May 6, 2020 · 7 comments

Comments

@djstrong
Copy link

djstrong commented May 6, 2020

I am finetuning a model and for some hyperparameters (different number of epochs or learning rate) I get an error:

thread '<unnamed>' panicked at 'no entry found for key', /rustc/6d0e58bff88f620c1a4f641a627f046bf4cde4ad/src/libstd/collections/hash/map.rs:1023:9
stack backtrace:
   0: backtrace::backtrace::libunwind::trace
             at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.44/src/backtrace/libunwind.rs:86
   1: backtrace::backtrace::trace_unsynchronized
             at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.44/src/backtrace/mod.rs:66
   2: std::sys_common::backtrace::_print_fmt
             at src/libstd/sys_common/backtrace.rs:78
   3: <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt
             at src/libstd/sys_common/backtrace.rs:59
   4: core::fmt::write
             at src/libcore/fmt/mod.rs:1052
   5: std::io::Write::write_fmt
             at src/libstd/io/mod.rs:1428
   6: std::sys_common::backtrace::_print
             at src/libstd/sys_common/backtrace.rs:62
   7: std::sys_common::backtrace::print
             at src/libstd/sys_common/backtrace.rs:49
   8: std::panicking::default_hook::{{closure}}
             at src/libstd/panicking.rs:204
   9: std::panicking::default_hook
             at src/libstd/panicking.rs:224
  10: std::panicking::rust_panic_with_hook
             at src/libstd/panicking.rs:470
  11: rust_begin_unwind
             at src/libstd/panicking.rs:378
  12: core::panicking::panic_fmt
             at src/libcore/panicking.rs:85
  13: core::option::expect_failed
             at src/libcore/option.rs:1203
  14: serde::ser::Serializer::collect_map
  15: <tokenizers::models::bpe::model::BPE as tokenizers::tokenizer::Model>::save
  16: tokenizers::models::__init2023689508296420652::__init2023689508296420652::__wrap
  17: _PyMethodDef_RawFastCallKeywords
             at Objects/call.c:694
  18: _PyCFunction_FastCallKeywords
             at Objects/call.c:734
  19: call_function
             at Python/ceval.c:4568
  20: _PyEval_EvalFrameDefault
             at Python/ceval.c:3139
  21: _PyEval_EvalCodeWithName
             at Python/ceval.c:3930
  22: _PyFunction_FastCallKeywords
             at Objects/call.c:433
  23: call_function
             at Python/ceval.c:4616
  24: _PyEval_EvalFrameDefault
             at Python/ceval.c:3110
  25: function_code_fastcall
             at Objects/call.c:283
  26: _PyFunction_FastCallKeywords
             at Objects/call.c:408
  27: call_function
             at Python/ceval.c:4616
  28: _PyEval_EvalFrameDefault
             at Python/ceval.c:3110
  29: function_code_fastcall
             at Objects/call.c:283
  30: _PyFunction_FastCallKeywords
             at Objects/call.c:408
  31: call_function
             at Python/ceval.c:4616
  32: _PyEval_EvalFrameDefault
             at Python/ceval.c:3110
  33: _PyEval_EvalCodeWithName
             at Python/ceval.c:3930
  34: _PyFunction_FastCallKeywords
             at Objects/call.c:433
  35: call_function
             at Python/ceval.c:4616
  36: _PyEval_EvalFrameDefault
             at Python/ceval.c:3139
  37: _PyEval_EvalCodeWithName
             at Python/ceval.c:3930
  38: _PyFunction_FastCallDict
             at Objects/call.c:376
  39: _PyObject_Call_Prepend
             at Objects/call.c:908
  40: PyObject_Call
             at Objects/call.c:245
  41: do_call_core
             at Python/ceval.c:4645
  42: _PyEval_EvalFrameDefault
             at Python/ceval.c:3191
  43: _PyEval_EvalCodeWithName
             at Python/ceval.c:3930
  44: _PyFunction_FastCallKeywords
             at Objects/call.c:433
  45: call_function
             at Python/ceval.c:4616
  46: _PyEval_EvalFrameDefault
             at Python/ceval.c:3139
  47: _PyEval_EvalCodeWithName
             at Python/ceval.c:3930
  48: PyEval_EvalCodeEx
             at Python/ceval.c:3959
  49: PyEval_EvalCode
             at Python/ceval.c:524
  50: run_mod
             at Python/pythonrun.c:1035
  51: PyRun_FileExFlags
             at Python/pythonrun.c:988
  52: PyRun_SimpleFileExFlags
             at Python/pythonrun.c:429
  53: pymain_run_file
             at Modules/main.c:427
  54: pymain_run_filename
             at Modules/main.c:1606
  55: pymain_run_python
             at Modules/main.c:2867
  56: pymain_main
             at Modules/main.c:3028
  57: _Py_UnixMain
             at Modules/main.c:3063
  58: __libc_start_main
  59: <unknown>
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
fatal runtime error: failed to initiate panic, error 5
/var/spool/slurmd/job18823086/slurm_script: line 29:  4069 Aborted                 $1

Version: 0.5.2

@djstrong djstrong changed the title fatal runtime error no entry found for key May 6, 2020
@djstrong
Copy link
Author

djstrong commented May 10, 2020

It was a problem with vocab.json missing one entry.

@ecchochan
Copy link

ecchochan commented May 19, 2020

I encountered this issue too due to special_tokens contains tokens that does not exists in the corpus (as I was testing it out using a small corpus)

I encountered this issue too due to having a "<n>" in special_tokens .

:3

Why?

@GarethAusten
Copy link

I'm also having this issue, is there a way to know what entry is missing? It seems to save the vocab.json file but fails on the merges.txt file.

@n1t0
Copy link
Member

n1t0 commented Jun 4, 2020

This should be fixed with the latest release 0.8.0.dev2, can you confirm?

@GarethAusten
Copy link

Yes, it's working in 0.8.0.dev2. Thanks for the quick response!

@GarethAusten
Copy link

GarethAusten commented Jun 4, 2020

tl:dr; use save_model instead of save in 0.8.0.dev2

Leaving this comment here in case anyone else stumbles upon this issue but the functionality between 0.7.0 and 0.8.0.dev2 is different. In 0.8.0.dev2 the save method for the tokenizer seems to save the whole tokenizer object while save_model in 0.8.0.dev2 seems to perform the same way as save in 0.7.0.

This is a little confusing because if you want to load a saved tokenizer the tokenizer object expects a merges file and vocab file. I don't see any functionality to load the whole object as saved so it seems the best bet is to use save_model in 0.8.0.dev2.

@n1t0
Copy link
Member

n1t0 commented Jun 4, 2020

If you want to load a model (BPE for example), then save_model will save the needed files to load a model directly. You can then load a saved BPE using BPE(vocab_file, merges_file).

But if you want to load the whole tokenizer, then you can do

tokenizer = Tokenizer.from_file(tokenizer_file)

This file will have been saved with save. Does it make sense?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants