Fathom datasets should be open, public and freely available #28

tobigithub · 2017-05-19T00:48:51Z

Hi,
just my 2 cents, the datasets that are needed to run this benchmark should be publicly, open and freely available. Currently some proposed sets ones are not: http://fathom.readthedocs.io/en/latest/quickstart/#downloading-data

Saying that one should (potentially illegally) obtain ATARI ROMs, when Atari just recently filed copyright claims against several developers, https://www.google.com/search?q=Atari+ROM+copyright not only creates a problem for developers but also users.

Also multiple datasets require signing of licenses or logins. But that is usually counter productive, tedious and limits the user base. As a solution one could use synthetic datasets, or sets that are broadly available or maybe smaller sets and just duplicate them. The benchmark will perform probably fine, of course without a renowned dataset. I am sure that is not always possible and coming from a different field I can not propose any good replacements, but I think its a valid thought.

svrama · 2017-05-19T01:31:05Z

Hi @tobigithub, thanks for your insightful feedback.

You raise many good points. Because Fathom's goal is to provide a suite of well-known models for profiling, we've sometimes had to grit our teeth and inconvenience users in obtaining the necessary datasets.

Regarding the Atari ROMs specifically, I'm not a lawyer, but I believe their use for academic research falls under fair use. That said, I agree that it would be nice for there to be a canonical task for reinforcement learning which does not skirt copyright law.

Unfortunately, several of the datasets (e.g., ImageNet, LDC datasets) do require registration, sometimes with a fee. While this is not ideal, it is the reality of obtaining the datasets considered standard in the machine learning community.

Adding synthetic datasets has been on our todo list for a while, but since most users had access to these canonical datasets already, we haven't prioritized that. If you have any specific needs which require synthetic datasets, let us know and we can try to set something up for you.

svrama closed this as completed May 19, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fathom datasets should be open, public and freely available #28

Fathom datasets should be open, public and freely available #28

tobigithub commented May 19, 2017

svrama commented May 19, 2017

Fathom datasets should be open, public and freely available #28

Fathom datasets should be open, public and freely available #28

Comments

tobigithub commented May 19, 2017

svrama commented May 19, 2017