Neural Abstract Reasoner: claims that they has achieved 80% accuracy on this dataset #82

bransGl · 2021-03-29T16:33:27Z

I have found https://arxiv.org/pdf/2011.09860.pdf Neural Abstract Reasoner research. They has claims tabout 80% accuracy on this dataset. @fchollet How do you think are they realy got 80% accuracy on a such hard dataset? Look sucpitious to me. Can't find code, submissions on kagle, or references on their paper.

jmmcd · 2021-03-29T21:12:46Z

"As this work is still in progress, these are preliminary results evaluated on grids up to 10×10."

I guess this 80% is only on the grids of small sizes, which are disproportionately the "easy" tasks involving mere rotation/mirroring.

enceladus2000 · 2022-02-27T14:25:14Z

Has anyone tried implementing this paper? I can't find a working demonstration anywhere.

hassanshallal · 2022-02-27T15:34:04Z

Hi, I worked on ti for a few months 2020. I haven’t read that article yet thought. Hassan

…

On Feb 27, 2022, at 7:25 AM, Tanmay Bhonsale ***@***.***> wrote: Has anyone tried implementing this paper? I can't find a working demonstration anywhere. — Reply to this email directly, view it on GitHub <#82 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AESS5ZAEYMANMLPRTTMLONTU5IX5TANCNFSM4Z75RC6Q>. Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you are subscribed to this thread.

Sebastian-0 · 2022-03-03T09:05:30Z

I think there are several strange things about their paper.

In a later presentation (2021) they say "NAR achieves 61.13% accuracy on the Abstraction and Reasoning Corpus" with no further explanation. Is that measure on the entire dataset (i.e. all grid sizes)? Otherwise, why is number different? They use the exact same graphs as motivation in their poster as in the old article, yet with different numbers as the result, see: https://eucys2021.usal.es/computing-03-2021/
As far as I understand it they evaluate on the public test set, yet they compare it to the Kaggle competition which ran on completely different, hidden, tasks.
They claim to solve 78,8% of 100 hidden tasks but don't explain how they get .8 when the test are binary.
There is no discussion on what the impact is when excluding all larger grids, this is especially relevant when they are comparing agains the Kaggle competition.
There is no source code, no one (AFAIK) has reproduced the results, and there is no official benchmark against the hidden test set.

It's possible they have devised an approach that is better than the previous state-of-the-art, but at this point I find it hard to take their numbers at face value.

hassanshallal · 2022-03-03T15:21:31Z

Thank you for sharing your thoughts. I share perspective on some of these points and also noticed the papers used data augmentation. I am not convinced that using deep learning in any shape or form can tackle escalating levels of generalization from local (robustness), to broad (flexibility), and finally to extreme generalization.

…

On Mar 3, 2022, at 2:05 AM, Sebastian Hjelm ***@***.***> wrote: I think there are several strange things about their paper. In a later presentation (2021) they say "NAR achieves 61.13% accuracy on the Abstraction and Reasoning Corpus" with no further explanation. Is that measure on the entire dataset (i.e. all grid sizes)? Otherwise, why is number different? They use the exact same graphs as motivation in their poster as in the old article, yet with different numbers as the result, see: https://eucys2021.usal.es/computing-03-2021/ <https://eucys2021.usal.es/computing-03-2021/> As far as I understand it they evaluate on the public test set, yet they compare it to the Kaggle competition which ran on completely different, hidden, tasks. They claim to solve 78,8% of 100 hidden tasks but don't explain how they get .8 when the test are binary. There is no discussion on what the impact is when excluding all larger grids, this is especially relevant when they are comparing agains the Kaggle competition. There is no source code, no one (AFAIK) has reproduced the results, and there is no official benchmark against the hidden test set. It's possible they have devised an approach that is better than the previous state-of-the-art, but at this point I find it hard to take their numbers at face value. — Reply to this email directly, view it on GitHub <#82 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AESS5ZDPV257IPBCBZDNDALU6B6GLANCNFSM4Z75RC6Q>. Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you commented.

fchollet closed this as completed Jun 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Neural Abstract Reasoner: claims that they has achieved 80% accuracy on this dataset #82

Neural Abstract Reasoner: claims that they has achieved 80% accuracy on this dataset #82

bransGl commented Mar 29, 2021 •

edited

jmmcd commented Mar 29, 2021

enceladus2000 commented Feb 27, 2022

hassanshallal commented Feb 27, 2022 via email

Sebastian-0 commented Mar 3, 2022

hassanshallal commented Mar 3, 2022 via email

Neural Abstract Reasoner: claims that they has achieved 80% accuracy on this dataset #82

Neural Abstract Reasoner: claims that they has achieved 80% accuracy on this dataset #82

Comments

bransGl commented Mar 29, 2021 • edited

jmmcd commented Mar 29, 2021

enceladus2000 commented Feb 27, 2022

hassanshallal commented Feb 27, 2022 via email

Sebastian-0 commented Mar 3, 2022

hassanshallal commented Mar 3, 2022 via email

bransGl commented Mar 29, 2021 •

edited