Skip to content
This repository has been archived by the owner on Jun 25, 2024. It is now read-only.

In training, why the loss is decreasing, while the val loss is increasing? #90

Open
kitterive opened this issue Jun 24, 2017 · 11 comments

Comments

@kitterive
Copy link

I user VOCtest_06-Nov-2007 dataset , first I user get_data_from_XML.py convert xml ground truth to VOC2007.pkl, and use it to train the network, In the training , I found the loss is decreasing, while the the val loss is increasing , is it overfit?
train

@meetps
Copy link

meetps commented Jun 29, 2017

I'm observing the same phenomena. Is there some fix to this?

@kitterive - What initial weights are you using? Also how are you normalizing the coordinates?

@oarriaga
Copy link
Contributor

I also observe the same behavior. However, I was able to get a val loss of 1.4 after 20 epochs and afterwards the val loss started increasing.

@meetps
Copy link

meetps commented Jun 29, 2017

@oarriaga - In that case, which model weights did you use to finetune it VGG16 ( with top removed) or the caffe converted SSD weights ?

@oarriaga
Copy link
Contributor

oarriaga commented Jun 29, 2017

I used the pre-trained weights provided in the README file, which I believe are the weights from an older original implementation in caffe.

@meetps
Copy link

meetps commented Jul 4, 2017

@oarriaga @rykov8 - Has anyone successfully tried to train the SSD from scratch ( i.e using only VGG16 weights) using this code ? If not then perhaps, it would be wise to rethink the loss function.

@MicBA
Copy link

MicBA commented Jul 4, 2017

Hi @meetshah1995
try to add BN layer after the Conv..
(the weight wouldn't be the best match but can be good start for train )

@Kramins
Copy link

Kramins commented Jul 4, 2017

I am seeing the same issue with training off of the MS COCO data set of images.

I was following the training example form SSD_training.ipynb

@oarriaga
Copy link
Contributor

oarriaga commented Jul 4, 2017

@meetshah1995 I have trained SSD with only the VGG16 weights and it was overfiting after ~20 epochs my lowest validation loss was of 1.4. I believe that better results can be obtained from the correct implementation of the random_size_crop function in the data augmentation part. Also the architecture ported in the repository is not the newest model from the latest arxiv version and this might lead to significant differences between the implementation here presented and the other ones around such as the TF, pytorch and original caffe one.

@ujsyehao
Copy link

Hi, @oarriaga Can you show your training log? I want to know loss after 120k iterations.
Thank you in advance!

@Hydrogenion
Copy link

I am seeing the same issue while training my own datasets.
is it overfit or not?

@jamebozo
Copy link

jamebozo commented Aug 1, 2019

My minimum loss is also around 1.39 ~ 1.4.
would adding random_size_crop help?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants