Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New SOTA reported by original TF repo, any plans to sync-up the changes? #70

Closed
factplay1 opened this issue Aug 12, 2020 · 8 comments
Closed

Comments

@factplay1
Copy link

Thanks for your good work.

https://github.com/google/automl/tree/master/efficientdet

They now report 34.3 val mAP for EffDet-D0. Not sure what exactly they did, but seems like they made some changes in their model definition etc. It'd be great if you could have a look at their changes, and potentially sync-up your repo so it reproduces those numbers.

Thanks a million :)

@rwightman
Copy link
Owner

Just weight changes from a new set of runs, also not clear on what they changed for the better result. I have D0-D5 converted and evaluated, haven't had a moment to validate D6. I'd already done the D3, D7, D7X updates which were trained with longer epochs.

@factplay1
Copy link
Author

Longer epochs? How many epochs? More than 300?

On D0, I have played lots with your code on the hyper-parameters, I don't think I can achieve 34.3 to be honest, just with the hyper-params changes

@hal-314
Copy link

hal-314 commented Aug 19, 2020

@factplay1 Reviewing the changes between version 6 and 7 of the paper, it seems that they now use soft-NMS for all models instead of NMS. They didn't train during more epochs. Here is an extract of the paragraph (section 5.1, first paragraph, at the end):

During training, we apply horizontal flipping and scale jittering [0.1, 2.0], which randomly resizes images between 0.1x and 2.0x of the original size before cropping. We apply soft-NMS [3] for eval. For D0-D6, each model is trained for 300 epochs with total batch size 128 on 32 TPUv3 cores,but to push the envelope, we train D7/D7x for 600 epochson 128 TPUv3 cores.

While the old was:

We use RetinaNnet[23] preprocessing with training-time flipping and scaling. For D0-D6, each model is trained for 300 epochs with total batch size 128 on 32 TPUv3 cores and evaluated with standard NMS. To further push the envelope,we train D7 for 600 epochs and apply soft-NMS [3].

@rwightman
Copy link
Owner

softnms is the default for the eval results on latest models, but most of the gains aren't due to that, but due to the retrainining. So still not clear what they changed for D0-D6 that improved the accuracy with the same number of epochs

@factplay1
Copy link
Author

It'd be very interesting to find that out. I'll try to also investigate the difference, will report if I find anything useful.

@Naivepro1990
Copy link

Hi @rwightman

How you trained D7, as far as I know it is not possible to fit in 16GB GPUs with even batch size of one.

@afaq-ahmad
Copy link

Hi @rwightman

How you trained D7, as far as I know it is not possible to fit in 16GB GPUs with even batch size of one.

Its the same problem, I am facing.

@rwightman
Copy link
Owner

@afaq-ahmad @Naivepro1990 I have never tried to train a D7, with the resolution in the paper you'd need 4-8 32-GB V100 cards or A100 cards (if that even works) and many weeks. The originals were trained with (a lot) of TPUs. I have no plans to spend $100K+ on a system or spend $20-50K in cloud compute so no, there is no point. I think a few people have fine-tuned a D7 at lower resolutions for some Kaggle challenges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants