Failing to converge on small datasets (Getting zeros on small custom data) #125

eslambakr · 2020-06-30T18:53:59Z

❓ How to do something using DETR

Hello All,

I am using DETR on custom data, which contains 2k images for training. I have followed your suggestion proposed in #9 to fine-tune to avoid getting zeros, and I succeeded in achieving comparable accuracy.

But when I tried to train from scratch using the default configuration in main.py, I got zeros for the first 300 epochs until now, so should I wait for more epochs? I think it is so weird
So what do u think should I do to be able to get a good accuracy from scratch?
Is this a limitation in DETR due to the fact that transformers needs more data to converge? I think we should have some tricks to overcome this :D
Another question if there is no hope to train with too small data like this, so what is the minimum size which is proven DETR is working properly with it?

Final note: I am posting new issue as #9 has other questions which are irrelevant so I opened a new issue after reading the whole thread there.

Thanks for sharing your amazing work with the community, I hope to be able to give it back and contribute or add any benefit for your amazing work

alcinos · 2020-06-30T19:54:49Z

Hi @eslambakr
Thanks for your interest in DETR.

2K does sound to small to me, we had success with 10-15k but never tried smaller than that.
It's a bit difficult to know what's going on, you could check the predictions to see if the model is doing something at all (both on test and train images). I'd also look at the train/test losses and look for sign of divergence (the most likely explanation here). I'd not rule out the possibility of a bug either, especially if your mAP is exactly 0.
Finally, I'd like to point out that the important metric is not really the number of epochs but rather the number of updates. Since your dataset is about 50x smaller than coco, one coco epoch correspond to 50 in your dataset. In other words, it's as if you trained for 6 epochs so far.

Hope this helps.

eslambakr · 2020-06-30T19:59:31Z

Aha, I understand your point, Thanks for your clarification.
I will do more analysis and update you with my observation to benefit others who may stack in the same issue

m-klasen · 2020-06-30T20:45:32Z

@eslambakr
Below some of my experiments with a ~2k images dataset featuring only 4 classes, my best result exceeded detectron2 MaskRCNN ResNet50 FPN by ~5% mAP. If you have further questions please feel free to ask.

eslambakr · 2020-07-02T15:28:16Z

Thanks for sharing your results, but I am wondering
1- Is the x-axis in terms of epochs! :D, do you mean you train the model for only 50 epochs?
2- If yes, I wonder how you could achieve that while training from scratch without loading any weights? Do you change the default arguments or what?
3- The class Error is stack at 100. Do you face it, or from your experience, have an explanation for that?

I trained for almost 600 epoch, and I am getting zeros, so It is wired for me though the same data in the same format, I trained other models on it, so I think there is no error in my dataset.
And unfortunately, I didn't know I have to set the output_dir to get output logs and weights, I thought It should default was set on so I couldn't draw training curves or test the model on images to debug this behavior, I will rerun another experiment while setting it on and update you, and I will change the number of classes variable to 2 as my data has only one class and change the num_quiries to 30 to make it easier to the model, but I am asking you as your results are impressive.

Thanks in advance.

fmassa · 2020-07-02T15:37:44Z

@eslambakr I believe @mlk1337 is fine-tuning his model from a pre-trained model on COCO.

m-klasen · 2020-07-02T15:50:48Z

Hi @eslambakr,
i wrote a small gist on how i trained my model with COCO using weights. Here.
Hope this helps.

unanan · 2020-07-03T09:26:22Z

Thanks for sharing your results, but I am wondering
1- Is the x-axis in terms of epochs! :D, do you mean you train the model for only 50 epochs?
2- If yes, I wonder how you could achieve that while training from scratch without loading any weights? Do you change the default arguments or what?
3- The class Error is stack at 100. Do you face it, or from your experience, have an explanation for that?

I trained for almost 600 epoch, and I am getting zeros, so It is wired for me though the same data in the same format, I trained other models on it, so I think there is no error in my dataset.
And unfortunately, I didn't know I have to set the output_dir to get output logs and weights, I thought It should default was set on so I couldn't draw training curves or test the model on images to debug this behavior, I will rerun another experiment while setting it on and update you, and I will change the number of classes variable to 2 as my data has only one class and change the num_quiries to 30 to make it easier to the model, but I am asking you as your results are impressive.

Thanks in advance.

If you mean the "too high & abnormal class_error", you can check this reply at #41

eslambakr · 2020-07-06T20:21:30Z

Hello @alcinos @fmassa ,
Here you are my results using fine tuning on my custom dataset (2k images)

For me it is a very good results :D

And this is the result of training from scratch, I think it is too bad, so do u have any ideas to make DETR able to converge on small datasets? or from the graphs do u think I have to tune any hyper-parameters?
Note: I think I make a mistake in this experiment by keeping args.lr_drop=200, I will rerun after making it 700

sompjang · 2020-07-10T06:51:42Z

I have got similar results on small datasets. I tried several different configurations but with no success.

I have trained my dataset on 560 images.

params:

lr_backbone = 1e-5
lr = 1e-2
weight_decay = 1e-4
epochs = 1200
lr_drop = 400
num_queries = 20
num_classes = 1
batch_size = 2

fmassa · 2020-07-10T12:40:25Z

@sompjang please try finetuning instead of training from scratch, I'm afraid training on 560 images from scratch might suffer from severe overfitting.

sompjang · 2020-07-10T13:02:14Z

@sompjang please try finetuning instead of training from scratch, I'm afraid training on 560 images from scratch might suffer from severe overfitting.

@fmassa Thanks for your answer. After fine-tuning results look much better. Are there any recommendations on the dataset size.

guysoft · 2020-08-10T15:25:59Z

How are you plotting the loss functions?

fmassa · 2020-08-13T14:30:06Z

@guysoft we have a plotting utility in util/plot_utils.py

cyy21 · 2020-11-26T15:32:35Z

@sompjang hi, i got the same problem, do you meet the condition that networks output is all the same, and after you follow the finetune, did the output go well? how many class are there in your dataset?

azamshoaib · 2020-12-16T17:03:39Z

Hi @eslambakr,
i wrote a small gist on how i trained my model with COCO using weights. Here.
Hope this helps.

@m-klasen The link to your gist is not working. Can you please provide the link. I am training my network from scratch and it is not converging. Any insights to my problem will be very helpful. Thank you.

@eslambakr did you solve your issue with training the network from scratch.

Flyooofly · 2022-11-18T08:58:01Z

@sompjang请尝试微调而不是从头开始训练，恐怕从头开始训练 560 图像可能会出现严重的过度拟合。

Hello, I used the pre-trained model of the DETR architecture provided in DETReg (https://github.com/amirbar/DETReg) for fine-tuning. The number of fine-tuned images is about 1000, and I fine-tuned the training for 50 epochs（I have modified the num_classes is max id of calsses+1） , but all indicators are still 0, can you help to find out why? thanks.

fmassa mentioned this issue Jul 12, 2020

mAP decreases with training loss on a custom dataset #148

Closed

alcinos mentioned this issue Jul 28, 2020

FineTuning with similar format dataset as coco but different classes #153

Closed

This was referenced Aug 5, 2020

training fails on custom dataset #179

Closed

Need suggestions to improve custom training using detectron2 wrapper #192

Open

micheleantonazzi mentioned this issue Jul 22, 2021

Train DETR using small dataset (3k examples) #419

Open

NielsRogge mentioned this issue Aug 25, 2021

DETR tutorials to use it on custom data :) #428

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failing to converge on small datasets (Getting zeros on small custom data) #125

Failing to converge on small datasets (Getting zeros on small custom data) #125

eslambakr commented Jun 30, 2020

alcinos commented Jun 30, 2020

eslambakr commented Jun 30, 2020

m-klasen commented Jun 30, 2020

eslambakr commented Jul 2, 2020

fmassa commented Jul 2, 2020

m-klasen commented Jul 2, 2020

unanan commented Jul 3, 2020

eslambakr commented Jul 6, 2020

sompjang commented Jul 10, 2020 •

edited

Loading

fmassa commented Jul 10, 2020

sompjang commented Jul 10, 2020

guysoft commented Aug 10, 2020

fmassa commented Aug 13, 2020

cyy21 commented Nov 26, 2020

azamshoaib commented Dec 16, 2020 •

edited

Loading

Flyooofly commented Nov 18, 2022

Failing to converge on small datasets (Getting zeros on small custom data) #125

Failing to converge on small datasets (Getting zeros on small custom data) #125

Comments

eslambakr commented Jun 30, 2020

❓ How to do something using DETR

alcinos commented Jun 30, 2020

eslambakr commented Jun 30, 2020

m-klasen commented Jun 30, 2020

eslambakr commented Jul 2, 2020

fmassa commented Jul 2, 2020

m-klasen commented Jul 2, 2020

unanan commented Jul 3, 2020

eslambakr commented Jul 6, 2020

sompjang commented Jul 10, 2020 • edited Loading

fmassa commented Jul 10, 2020

sompjang commented Jul 10, 2020

guysoft commented Aug 10, 2020

fmassa commented Aug 13, 2020

cyy21 commented Nov 26, 2020

azamshoaib commented Dec 16, 2020 • edited Loading

Flyooofly commented Nov 18, 2022

sompjang commented Jul 10, 2020 •

edited

Loading

azamshoaib commented Dec 16, 2020 •

edited

Loading