Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lm/train.py Data Loss Error #7

Closed
Muhammd-Hamza-Sabir opened this issue Jul 5, 2019 · 7 comments
Closed

lm/train.py Data Loss Error #7

Muhammd-Hamza-Sabir opened this issue Jul 5, 2019 · 7 comments

Comments

@Muhammd-Hamza-Sabir
Copy link

I am trying to train generator model on CPU and getting error given below. Could you please provide why that error is occured:

DataLossError (see above for traceback): corrupted record at 0

@rowanz
Copy link
Owner

rowanz commented Jul 8, 2019

hi Muhammad, my guess is that you don't have the RealNews in the right format -- it only accepts .tfrecord files. However, I don't think you'll be able to train a generator model on CPU, since it's quite large.

@rowanz rowanz closed this as completed Jul 8, 2019
@leehelenah
Copy link

leehelenah commented Jul 9, 2019

I am trying to prepare the .tfrecord files before training the generation model. I run the prepare_lm_data.py with the generator=mega~dataset=p0.94.jsonl as input_fn.

Could you please help me to solve the following error:

Traceback (most recent call last):
File "realnews/prepare_lm_data.py", line 218, in
final_desired_size=max(args.max_seq_length + 1, 1025)):
File "realnews/prepare_lm_data.py", line 187, in buffered_and_sliding_window_article_iterator
for article in article_iterator(encoder, final_desired_size=final_desired_size):
File "realnews/prepare_lm_data.py", line 145, in article_iterator
unconditional_prob=.35)
File "/workspace/grover/sample/encoder.py", line 258, in tokenize_for_grover_training
article_pieces = _tokenize_article_pieces(encoder, item)
File "/workspace/grover/sample/encoder.py", line 175, in _tokenize_article_pieces
'article': [encoder.begin_article] + encoder.encode(item['text']) + [encoder.end_article],
KeyError: 'text'

Thanks.

@Muhammd-Hamza-Sabir
Copy link
Author

I am getting following error while training even i have converted input files into .tfrecord. I didn't find any help regrding this error. Will you please provide what this error is about and how it could be resolved?
"absl.flags._exceptions.IllegalFlagValueError: flag --input_file=None: Flag --input_file must have a value other than None."

Thanks

@rowanz
Copy link
Owner

rowanz commented Jul 10, 2019

@helenalee1994: unfortunately that file (for discrimination) can't be used for generation. Among other things, it has additional entries, and the metadata fields have different names 😢 so for instance, I used 'text' instead of article.

would it help if I publicly shared a small subset of RealNews for debugging? for the full version, please fill out the google form on the github repo.

@Muhammd-Hamza-Sabir You need to pass in --input_file=your_tf_record.tfrecord or something like that 😄

@Muhammd-Hamza-Sabir
Copy link
Author

@rowanz Thanks for your response.
I am doing the same but still getting these errors.
Yeah, it would be very helpful if you share a small subset of news.
Thanks

@leehelenah
Copy link

@rowanz Thanks for your quick reply. It would be a great help if you publicly share a small subset of RealNews. 😄😄😄

@kino188
Copy link

kino188 commented Dec 29, 2020

@Muhammd-Hamza-Sabir Hallo, have u solved the input_file error when converting input files into .tfrecord?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants