Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation quality metrics "get stuck" #20

Closed
saskra opened this issue Apr 23, 2021 · 3 comments
Closed

Validation quality metrics "get stuck" #20

saskra opened this issue Apr 23, 2021 · 3 comments

Comments

@saskra
Copy link

saskra commented Apr 23, 2021

With certain splits of training, validation and test data sets, I can always observe a strange behaviour that does not occur with a different distribution of the same overall data set. Namely, the values on the validation data seem to stand still from the beginning during training, on enormously bad numbers, while the values on the training data keep improving. Unfortunately, it is not really possible to reconstruct which combination of images triggers this behaviour, but it probably really depends on a combination and not on individual images. Is such a behaviour known, or even a solution for it?

Estimated 76.0 GB model memory usage
Train shape: (1166, 672, 672, 1); Val shape: (292, 672, 672, 1); Test shape: (59, 672, 672, 1)
  0%|| 2/15000 [04:11<523:59:44, 125.78s/epoch, val_loss=12.7, val_jaccard_round=0.171, loss=0.43, jaccard_round=0.748]
@nibtehaz
Copy link
Owner

Hi @saskra . Thanks for your interest in our project and thanks for sharing this. It's a bit weird, but unless we can reproduce the exact issue, it would be difficult to do solve this. Maybe somehow it is acting as an adversarial attack or something. What do you think?

@saskra
Copy link
Author

saskra commented Apr 28, 2021

Yes, this could well be such a case. Unfortunately, it is also difficult for me to reproduce. A few examples:

  • Images taken with microscope A do not have this phenomenon, but those taken with microscope B do, although everything else remains identical.
  • It does not seem to depend on individual images, but on a combination of images.
  • If you crop the images into smaller tiles, it even seems to depend only on certain sub-areas.
  • If the validation split is very large or very small, it happens more often.
  • Changing the batch size or the scaling down factor (the original resolution does not fit in the graphics memory) seems to turn this behaviour on and off, even if the images are otherwise the same.

Presumably it would not be possible to clear it up without greater effort and the original data. It would only have been interesting to know if I am actually the only one with this event, so whether it is perhaps due to my data or the hardware.

@nibtehaz
Copy link
Owner

Interesting points and observations. Sorry I can't suggest or add anything as I haven't faced such. I hope this gets resolved and you get some significant findings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants