Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error Message type "caffe.SolverParameter" has no field named "name". #88

Closed
Isaacpm opened this issue Mar 22, 2016 · 4 comments
Closed

Comments

@Isaacpm
Copy link

Isaacpm commented Mar 22, 2016

Hi,

I'm getting the following error when launching a training service in a server.
I've got the same configuration in a VM which I was using for testing a script and it works perfectly well.
I think the only difference is that the server has the latest dede and the has a version that's from sunday.
I may be missing something, but can't figure it out, because I've got the same ubuntu, fully patched, same folder structure, same commands from the script, etc...
Could this be cause by the new version?

[libprotobuf ERROR google/protobuf/text_format.cc:245] Error parsing text-format caffe.SolverParameter: 1:5: Message type "caffe.SolverParameter" has no field named "name".

INFO - user batch_size=300 / inputc batch_size=19093

INFO - batch_size=313 / test_batch_size=217 / test_iter=22
[libprotobuf FATAL /usr/include/google/protobuf/repeated_field.h:625] CHECK failed: (index) < (size()):

Thanks,
Isaac

@beniz
Copy link
Collaborator

beniz commented Mar 22, 2016

Hi, so FYI the unit tests run fine on the last version. Now, there are two errors reported in your log excerpt:

  • Message type "caffe.SolverParameter" has no field named "name".: this is very weird, there's no name field in the SolverParameter message type from the caffe.proto, nor has it been modified recently. Not sure what happens here, but you may want to verify and possibly post the content of your mlp_solver.prototoxt file for instance.
  • [libprotobuf FATAL /usr/include/google/protobuf/repeated_field.h:625] CHECK failed: (index) < (size()): this very much possibly means that there's a discrepancy between something you are asking via the API and something in the dataset or an existing model for instance.

In summary, these are very uncommon errors, if you cannot figure the origin out, post your API calls in full, and list the content of your model repository, and possibly give number of classes in your dataset, etc...

@Isaacpm
Copy link
Author

Isaacpm commented Mar 23, 2016

This is really weird, because what I'm doing is launching several services and training them, one after the other, via a python script.
The same python script works in the VM, but fails on the second service launched in the physical server.
I'm going to try a few things following your points and post here my results.

@beniz
Copy link
Collaborator

beniz commented Mar 23, 2016

The first error I believe clearly indicates that somehow the server is getting a model .prototxt file in place of a _solver.prototxt file. I never thought I would get results by googling it but you can witness this type of mistake here: https://groups.google.com/forum/#!topic/caffe-users/kRDWEzewrHE

Now, dd reads a model repository and automatically decides which file is what (solver or model). So I'd say either your script does something with the model files, either you are trying to train multiple models from the same model repository, thus mixing up files somehow, either files are named in a way that confuses detection by dd...

Listing your model repository right after your second training call fails should help debug this. Make sure there's no automatic DELETE call with clear=full after the failure or the full content of your model repository will be gone before inspection.

@beniz beniz added the type:bug label Mar 27, 2016
@beniz
Copy link
Collaborator

beniz commented Mar 27, 2016

OK, from our other exchanges, this is due to the model path that confuses the detection of files by dd. I'll fix that shortly with a more narrow check on Caffe model file naming scheme for dd. In the meantime, remove any occurence of solver in your model directory path.

@beniz beniz self-assigned this Mar 27, 2016
@beniz beniz closed this as completed in a8888fb Mar 27, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants