Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is EMA used in this work? #5

Closed
JacobYuan7 opened this issue Jan 6, 2022 · 4 comments
Closed

Is EMA used in this work? #5

JacobYuan7 opened this issue Jan 6, 2022 · 4 comments

Comments

@JacobYuan7
Copy link

Hello author, thanks for your great work. I raise a question about the usage of Exponential Moving Average (EMA) in this paper, hoping you can provide me with some clues. It seems that this paper does not detail in this part. As far as I know, MDETR uses it and evaluate use the EMA model. So I wonder is it used in this work? If it is actually used, why should we evaluate by the EMA model rather than the original one?

@mmaaz60
Copy link
Owner

mmaaz60 commented Jan 6, 2022

Hi @JacobYuan7,

Thank you for your interest in this work. Similar to MDETR, MDef-DETR also uses EMA during training and for evaluating the weights are loaded from the original model. You can try using the ema model for testing by loading the weights from checkpoint["model_ema"] instead of checkpoint["model"], and it should give almost the same results. Let me know if you have any questions.

@JacobYuan7
Copy link
Author

JacobYuan7 commented Jan 7, 2022

Hi @JacobYuan7,

Thank you for your interest in this work. Similar to MDETR, MDef-DETR also uses EMA during training and for evaluating the weights are loaded from the original model. You can try using the ema model for testing by loading the weights from checkpoint["model_ema"] instead of checkpoint["model"], and it should give almost the same results. Let me know if you have any questions.

As I understand it, MDETR uses 'model_ema' to evaluate the model, which is shown in:
https://github.com/ashkamath/mdetr/blob/bf09d98b0b41cd615185dcb0082299a5ba24c319/scripts/eval_lvis.py#L101
Correct me if I am wrong, many thanks!

BTW, the training of the language model follows MDETR, right? With a warmup schedule and then decrease linearly back to zero for the rest of the training.

@mmaaz60
Copy link
Owner

mmaaz60 commented Jan 9, 2022

As I understand it, MDETR uses 'model_ema' to evaluate the model, which is shown in: https://github.com/ashkamath/mdetr/blob/bf09d98b0b41cd615185dcb0082299a5ba24c319/scripts/eval_lvis.py#L101 Correct me if I am wrong, many thanks!

Hi, my apologies for the delayed reply. Yes, your understanding is correct. MDETR is using model_ema for evaluation during training and using model for inference (hubconf.py). However, I think using model_ema as well for inference would be more appropriate.

BTW, the training of the language model follows MDETR, right? With a warmup schedule and then decrease linearly back to zero for the rest of the training.

Yes, this is the case. Further, we are planning to release the training scripts by the end of this month. Stay tuned!

@JacobYuan7
Copy link
Author

As I understand it, MDETR uses 'model_ema' to evaluate the model, which is shown in: https://github.com/ashkamath/mdetr/blob/bf09d98b0b41cd615185dcb0082299a5ba24c319/scripts/eval_lvis.py#L101 Correct me if I am wrong, many thanks!

Hi, my apologies for the delayed reply. Yes, your understanding is correct. MDETR is using model_ema for evaluation during training and using model for inference (hubconf.py). However, I think using model_ema as well for inference would be more appropriate.

BTW, the training of the language model follows MDETR, right? With a warmup schedule and then decrease linearly back to zero for the rest of the training.

Yes, this is the case. Further, we are planning to release the training scripts by the end of this month. Stay tuned!

Sure, I will! Thx so much for your kind response.

@mmaaz60 mmaaz60 closed this as completed Feb 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants