Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libprotobuf error causes crash in the middle of training #16

Open
Feynman27 opened this issue Sep 21, 2017 · 4 comments
Open

libprotobuf error causes crash in the middle of training #16

Feynman27 opened this issue Sep 21, 2017 · 4 comments

Comments

@Feynman27
Copy link

I'm running into a strange error. During training (after several thousand iterations), my training script crashes with the error

libprotobuf FATAL google/protobuf/wire_format.cc:830] CHECK failed: (output->ByteCount()) == (expected_endpoint): : Protocol message serialized to a size different from what was originally expected.  Perhaps it was modified by another thread during serialization?
terminate called after throwing an instance of 'google::protobuf::FatalException'
  what():  CHECK failed: (output->ByteCount()) == (expected_endpoint): : Protocol message serialized to a size different from what was originally expected.  Perhaps it was modified by another thread during serialization?
Command terminated by signal 6

It doesn't look like I'm running out of memory on my gpu. Could this be a result of writing too much information to tensorboard?

@Feynman27 Feynman27 changed the title libprotobuf error causes crash in the middle if training libprotobuf error causes crash in the middle of training Sep 21, 2017
@ruotianluo
Copy link
Owner

I met this too. It doensn't always happen for me. But it does happen occasionally. Currently I just ignore this. But sure it's good to investigate it sometime

@Feynman27
Copy link
Author

It's happening to me ~ every 10k iterations after 130k iterations. Before this, it only happened every 70k iterations or so. Do you have any idea what may be causing it. I'm also getting a bad_alloc error, so something funny is happening with the memory.

@ruotianluo
Copy link
Owner

@Feynman27
Copy link
Author

Yeah, I just commented it out. I'll post if I discover a more permanent solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants