Validation quality metrics "get stuck" #20

saskra · 2021-04-23T11:41:08Z

With certain splits of training, validation and test data sets, I can always observe a strange behaviour that does not occur with a different distribution of the same overall data set. Namely, the values on the validation data seem to stand still from the beginning during training, on enormously bad numbers, while the values on the training data keep improving. Unfortunately, it is not really possible to reconstruct which combination of images triggers this behaviour, but it probably really depends on a combination and not on individual images. Is such a behaviour known, or even a solution for it?

Estimated 76.0 GB model memory usage
Train shape: (1166, 672, 672, 1); Val shape: (292, 672, 672, 1); Test shape: (59, 672, 672, 1)
  0%|| 2/15000 [04:11<523:59:44, 125.78s/epoch, val_loss=12.7, val_jaccard_round=0.171, loss=0.43, jaccard_round=0.748]

nibtehaz · 2021-04-28T06:35:08Z

Hi @saskra . Thanks for your interest in our project and thanks for sharing this. It's a bit weird, but unless we can reproduce the exact issue, it would be difficult to do solve this. Maybe somehow it is acting as an adversarial attack or something. What do you think?

saskra · 2021-04-28T07:17:42Z

Yes, this could well be such a case. Unfortunately, it is also difficult for me to reproduce. A few examples:

Images taken with microscope A do not have this phenomenon, but those taken with microscope B do, although everything else remains identical.
It does not seem to depend on individual images, but on a combination of images.
If you crop the images into smaller tiles, it even seems to depend only on certain sub-areas.
If the validation split is very large or very small, it happens more often.
Changing the batch size or the scaling down factor (the original resolution does not fit in the graphics memory) seems to turn this behaviour on and off, even if the images are otherwise the same.

Presumably it would not be possible to clear it up without greater effort and the original data. It would only have been interesting to know if I am actually the only one with this event, so whether it is perhaps due to my data or the hardware.

nibtehaz · 2021-04-29T06:52:09Z

Interesting points and observations. Sorry I can't suggest or add anything as I haven't faced such. I hope this gets resolved and you get some significant findings.

nibtehaz closed this as completed Oct 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validation quality metrics "get stuck" #20

Validation quality metrics "get stuck" #20

saskra commented Apr 23, 2021

nibtehaz commented Apr 28, 2021

saskra commented Apr 28, 2021

nibtehaz commented Apr 29, 2021

Validation quality metrics "get stuck" #20

Validation quality metrics "get stuck" #20

Comments

saskra commented Apr 23, 2021

nibtehaz commented Apr 28, 2021

saskra commented Apr 28, 2021

nibtehaz commented Apr 29, 2021