Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Neural Abstract Reasoner: claims that they has achieved 80% accuracy on this dataset #82

Closed
bransGl opened this issue Mar 29, 2021 · 5 comments

Comments

@bransGl
Copy link

bransGl commented Mar 29, 2021

I have found https://arxiv.org/pdf/2011.09860.pdf Neural Abstract Reasoner research. They has claims tabout 80% accuracy on this dataset. @fchollet How do you think are they realy got 80% accuracy on a such hard dataset? Look sucpitious to me. Can't find code, submissions on kagle, or references on their paper.

@jmmcd
Copy link
Contributor

jmmcd commented Mar 29, 2021

"As this work is still in progress, these are preliminary results evaluated on grids up to 10×10."

I guess this 80% is only on the grids of small sizes, which are disproportionately the "easy" tasks involving mere rotation/mirroring.

@enceladus2000
Copy link

Has anyone tried implementing this paper? I can't find a working demonstration anywhere.

@hassanshallal
Copy link

hassanshallal commented Feb 27, 2022 via email

@Sebastian-0
Copy link

I think there are several strange things about their paper.

  • In a later presentation (2021) they say "NAR achieves 61.13% accuracy on the Abstraction and Reasoning Corpus" with no further explanation. Is that measure on the entire dataset (i.e. all grid sizes)? Otherwise, why is number different? They use the exact same graphs as motivation in their poster as in the old article, yet with different numbers as the result, see: https://eucys2021.usal.es/computing-03-2021/
  • As far as I understand it they evaluate on the public test set, yet they compare it to the Kaggle competition which ran on completely different, hidden, tasks.
  • They claim to solve 78,8% of 100 hidden tasks but don't explain how they get .8 when the test are binary.
  • There is no discussion on what the impact is when excluding all larger grids, this is especially relevant when they are comparing agains the Kaggle competition.
  • There is no source code, no one (AFAIK) has reproduced the results, and there is no official benchmark against the hidden test set.

It's possible they have devised an approach that is better than the previous state-of-the-art, but at this point I find it hard to take their numbers at face value.

@hassanshallal
Copy link

hassanshallal commented Mar 3, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants