Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create_pretraining_data.py has unknown argument FLAGS.ngrams_file #3

Closed
ShimonMalnick opened this issue Jun 18, 2021 · 1 comment
Closed

Comments

@ShimonMalnick
Copy link

while running the script with the default command given in https://github.com/oriram/splinter

cd pretraining
python create_pretraining_data.py
--input_file=$INPUT_PATTERN
--output_dir=$OUTPUT_DIR
--vocab_file=vocabs/bert-cased-vocab.txt
--do_lower_case=False
--do_whole_word_mask=False
--max_seq_length=512
--num_processes=63
--dupe_factor=5
--max_span_length=10
--recurring_span_selection=True
--only_recurring_span_selection=True
--max_questions_per_seq=30

the script fails because there is no ngrams_file argument given, and the following error occurs:

Traceback (most recent call last):
File "create_pretraining_data.py", line 453, in
tf.app.run()
File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 303, in run
_run_main(main, args)
File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "create_pretraining_data.py", line 441, in main
with tf.gfile.GFile(FLAGS.ngrams_file, "w") as writer:
File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/platform/flags.py", line 85, in getattr
return wrapped.getattr(name)
File "/usr/local/lib/python3.7/dist-packages/absl/flags/_flagvalues.py", line 480, in getattr
raise AttributeError(name)
AttributeError: ngrams_file

I suggest a small fixup:
adding on line 78:
flags.DEFINE_string("ngrams_file", None, "The file that will store the ngrams.")

adding on line 453:
flags.mark_flag_as_required("ngrams_file")

and adding to the script the ngrams_file parameter($NGRAMS_FILE :
cd pretraining
python create_pretraining_data.py
--input_file=$INPUT_PATTERN
--output_dir=$OUTPUT_DIR
--vocab_file=vocabs/bert-cased-vocab.txt
--do_lower_case=False
--do_whole_word_mask=False
--max_seq_length=512
--num_processes=63
--dupe_factor=5
--max_span_length=10
--recurring_span_selection=True
--only_recurring_span_selection=True
--ngrams_file=$NGRAMS_FILE
--max_questions_per_seq=30

@oriram
Copy link
Owner

oriram commented Jun 19, 2021

Thanks for bringing this to my attention!
I fixed this issue - Now n-gram statistics are written to output_dir/ngrams.txt by default.

@oriram oriram closed this as completed Jun 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants