-
Notifications
You must be signed in to change notification settings - Fork 7.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluation giving inconsistent results #739
Comments
It doesn't look like this has happened for builtin models and we can't help you since we don't know what you did. |
The model I'm using is identical to the mask rcnn except for the FPN backbone where I've implemented Path aggregation as in the PANet paper. I don't understand how changing the fpn can be the cause of the evaluations being different in the two stages. Isn't there only one way to do evaluation? Here's a comparison of their configs if it helps Mask RCNN
Mine
Apart from the fpn , the output directory and eval period, they're identical. |
If the model evaluation done during training is correct and the evaluation done after saving the checkpoint is the issue then obviously the problem lies either in the saving or loading processes. I tried training a mask rcnn from scratch and did not observe the problem. So the problem does lie in the FPN modification I made. I think what's happening is that my fpn modification (which is basically an additional bottom up branch) is preventing the fpn weights from being stored/loaded properly in the checkpoint which might be the cause of this huge difference in accuracy while evaluating. How this is happening, I have no idea. |
The problem was that I had not explicitly added a list of convolution ops as a module via the |
While writing a custom model (to train on MS COCO), I had set
TEST.EVAL_PERIOD
to 5000 so that I could frequently evaluate the model to see how it's doing. The last logs show that I got 37.2 box AP and 33.8 mask AP. But when I run the evaluation command it gives me different results. So I guess my question is why is it that evaluation done during training (last epoch) and evaluation done after training giving drastically different results?Tensorboard training logs:
Instructions To Reproduce the Issue:
git diff
) or what code you wroteLogs observed by running the above command:
run, such as a private dataset.
Expected behavior:
I was expecting the same performance I got during the last evaluation done before training ended i.e. 37.2 box AP and 33.8 mask AP
Following are the logs for the last evaluation:
Environment:
Please paste the output of
python -m detectron2.utils.collect_env
.If detectron2 hasn't been successfully installed, use
python detectron2/utils/collect_env.py
.The text was updated successfully, but these errors were encountered: