Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to reproduce the 71+% accuracy of EVE-Image model. #7

Open
HearyShen opened this issue Apr 23, 2020 · 5 comments
Open

Unable to reproduce the 71+% accuracy of EVE-Image model. #7

HearyShen opened this issue Apr 23, 2020 · 5 comments

Comments

@HearyShen
Copy link

Considering the paper's codes have not been released yet, I tried implementing the EVE-Image architecture according to your paper.

However, it looks difficult to reproduce the 71.16% top1 accuracy reported in your paper.

  1. The standard implementation reaches no more than 68% top1 accuracy (with residual connection, dropout, LayerNorm optimizations);
  2. By replacing the SDP attention with a Transformer Encoder Layer, the model reaches higher accuracy but still no more than 69%.

Would you please release your paper's codes?

@farleylai
Copy link
Contributor

farleylai commented May 16, 2020

Hi,

We are currently short of hands to maintain past intern's work and cannot guarantee a release date. Nonetheless, there are following works based on a similar model and even transformer as listed on the leaderboard achieving better results. Those should serve as the SOTA baselines for your research on our dataset.

Good luck.

@furukawayuan-Yao
Copy link

Considering the paper's codes have not been released yet, I tried implementing the EVE-Image architecture according to your paper.

However, it looks difficult to reproduce the 71.16% top1 accuracy reported in your paper.

  1. The standard implementation reaches no more than 68% top1 accuracy (with residual connection, dropout, LayerNorm optimizations);
  2. By replacing the SDP attention with a Transformer Encoder Layer, the model reaches higher accuracy but still no more than 69%.

Would you please release your paper's codes?

Could you please share the epoches of model you've trained? The paper suggestes 100 epoches as maximum but the model I reproduced converges so few epoches. 😥

@farleylai
Copy link
Contributor

@furukawayuan-Yao
100 epochs are just for reference.
We have seen converged results between 3x - 7x epochs.
However, this highly depends on the architecture and implementation.
While we cannot comment on what to expect from your model without the details, you may refer to recent SOTA baselines on the leaderboard.

@HearyShen
Copy link
Author

FYI, by extracting image feature map from raw image and bilinear interpolating to fixed size, I recently got a 70.86% test accuracy in the same experiment setting mentioned in your paper, which is similar to the 71.16% announced in your paper.

@farleylai
Copy link
Contributor

@HearyShen Glad to hear that!
It is expectable to have some difference within a reasonable range given the feature extractor and object detector used.
One may feed the detection boxes to a fine-tuned feature extractor or simply train everything end-to-end.
This is indeed possible as long as you evaluate the related work in the same way for a fair comparison.
Good luck.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants