create_pretraining_data.py has unknown argument FLAGS.ngrams_file #3

ShimonMalnick · 2021-06-18T11:18:04Z

while running the script with the default command given in https://github.com/oriram/splinter

cd pretraining
python create_pretraining_data.py
--input_file=$INPUT_PATTERN
--output_dir=$OUTPUT_DIR
--vocab_file=vocabs/bert-cased-vocab.txt
--do_lower_case=False
--do_whole_word_mask=False
--max_seq_length=512
--num_processes=63
--dupe_factor=5
--max_span_length=10
--recurring_span_selection=True
--only_recurring_span_selection=True
--max_questions_per_seq=30

the script fails because there is no ngrams_file argument given, and the following error occurs:

Traceback (most recent call last):
File "create_pretraining_data.py", line 453, in
tf.app.run()
File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 303, in run
_run_main(main, args)
File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "create_pretraining_data.py", line 441, in main
with tf.gfile.GFile(FLAGS.ngrams_file, "w") as writer:
File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/platform/flags.py", line 85, in getattr
return wrapped.getattr(name)
File "/usr/local/lib/python3.7/dist-packages/absl/flags/_flagvalues.py", line 480, in getattr
raise AttributeError(name)
AttributeError: ngrams_file

I suggest a small fixup:
adding on line 78:
flags.DEFINE_string("ngrams_file", None, "The file that will store the ngrams.")

adding on line 453:
flags.mark_flag_as_required("ngrams_file")

and adding to the script the ngrams_file parameter($NGRAMS_FILE :
cd pretraining
python create_pretraining_data.py
--input_file=$INPUT_PATTERN
--output_dir=$OUTPUT_DIR
--vocab_file=vocabs/bert-cased-vocab.txt
--do_lower_case=False
--do_whole_word_mask=False
--max_seq_length=512
--num_processes=63
--dupe_factor=5
--max_span_length=10
--recurring_span_selection=True
--only_recurring_span_selection=True
--ngrams_file=$NGRAMS_FILE
--max_questions_per_seq=30

oriram · 2021-06-19T08:44:54Z

Thanks for bringing this to my attention!
I fixed this issue - Now n-gram statistics are written to output_dir/ngrams.txt by default.

oriram closed this as completed Jun 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

create_pretraining_data.py has unknown argument FLAGS.ngrams_file #3

create_pretraining_data.py has unknown argument FLAGS.ngrams_file #3

ShimonMalnick commented Jun 18, 2021

oriram commented Jun 19, 2021

create_pretraining_data.py has unknown argument FLAGS.ngrams_file #3

create_pretraining_data.py has unknown argument FLAGS.ngrams_file #3

Comments

ShimonMalnick commented Jun 18, 2021

oriram commented Jun 19, 2021