New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
usage of the checkpoint throws error #82
Comments
Could you provide us some more information about the following:
|
However, when I try to freeze the graph using the following:
I got error: With a little debug, I found all gather nodes are converted from resourceGather which has two inputs:
These two are both converted to gather node with the same name. I am not sure what is a proper way to resolve this. Any suggestions? Thanks for help. |
Hello, Thanks for your interest in our work and the code.
This warning looks normal to me. -- For evaluation, only model weights are useful. The optimizer states are not useful and thus are not loaded from the checkpoint file (which contains both model weights and optimizer states). So please disregard the warning, especially as you replied that your code is able to run with these warnings, but please make sure that the evaluation result matches the expected number.
I think you are using it correctly. The checkpoint loader is designed in a flexible way (supports a file or a directory) so that we can load not only trained weights for evaluation, but also ImageNet pretrained checkpoints, as well as online evaluation of the lastest checkpoint in a directory while training.
I have not seen such an error before. What are you trying to achieve with the frozen graph? Is it for model exporting/deployment? Does the frozen graph relate to the warning for evaluation mentioned in this issue? Best, |
Hi Huiyu, thanks for your answer. I figured out the checkpoint and the warning message(not clear when reading the documents), however, the last error is still not clear to me what is the root cause. Yes, i am trying to export the graph to some other formats, onnx mainly. The issue occurs when tensorflow convert resourceGather to a frozen node, it actually creates two nodes with the same name since resourceGather in this case has two inputs as its edge. I slightly modified the tensorflow python code with a map for deduplication, but i am not super clear about the root cause. |
Hi, glad to know that the checkpoint issue is resolved. I am not an expert in exporting graphs so I am not certain what the root cause is, but there are three points that may be helpful.
That said, were you able to make it work with your deduplication modification? |
Thanks again for the tips. Deduplication got me around the first issue, but I have to remove the post processing part out to eliminate loops, which is not supported in ONNX or tflite. However, after tested with onnxRuntime, the final result is not matching. I got hang there since debugging layer wise is a lot of work. I think all 3 points you mentioned worth a retry to see if it fixes anything. Thanks again! |
Hi @csrhddlam I am trying to export a coco panoptic deeplab semantic segmentation model with a resent beta backbone to a saved_model file. (can I/should I export to something else?) Thanks! |
OK, I partly answered myself. In order to change the setting it needs to be changed in the model/encoder/axial_resnet_instances.py file. |
Closing the issue, since @fschvart has opened a new one, and @posEdgeOfLife could reopen it if there is still any issue. |
I want to evaluate axial-deep lab.
Here is my cmd:
train.py
--config_file
../configs/cityscapes/axial_deeplab/max_deeplab_s_backbone_os16.textproto
--mode
eval
--model_dir
C:\develop\max_deeplab_s_backbone_os16_axial_deeplab_cityscapes_trainfine\
I also updated initial_checkpoint to C:\develop\max_deeplab_s_backbone_os16_axial_deeplab_cityscapes_trainfine\ckpt-60000, which is the prefix for both the data and index file downloaded from the official checkpoint.
when I run the script, i keep getting a warning:
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer
W0201 20:33:08.415140 51764 util.py:204] Unresolved object in checkpoint: (root).optimizer
....
The dataset pattern and experiment name are correct I believe since there isn't error in previous stages, but only when the evaluation starts.
Am I using the checkpoint int a wrong way? I am confused by checkpath path, which could be a dir, or one of the files(data, index), or it could be dir/ckpt-60000?
The text was updated successfully, but these errors were encountered: