Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upDownload all datasets contained in all R-packages #185
Comments
mllg
added
the
type-enhancement
label
Mar 17, 2016
mllg
self-assigned this
Mar 17, 2016
jakobbossek
added
prio-low
and removed
type-enhancement
labels
Mar 17, 2016
mllg
assigned
giuseppec
and unassigned
mllg
Mar 17, 2016
mllg
added
the
type-enhancement
label
Mar 17, 2016
This comment has been minimized.
This comment has been minimized.
I asked on twitter if there are ways to do this without having to install the packages. This is the best answer I got: Seems pretty promising |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Thanks Heidi! devtools::install_github("gaborcsardi/gh")
library(gh)
repos = gh("GET /search/code?q=user:cran+extension:rda")
catf("#Repos: %i", repos$total_count) This way we can download the rda files only, e.g., via Another point: why should we avoid downloading all packages by the crawler. Is it because of time and memory? We can simply download each package, extract the data sets, upload to OpenML and remove the package afterwards. The time aspect is unimportant. The crawler does not need to be fast. |
This comment has been minimized.
This comment has been minimized.
Started to work on a crawler which operates on the github cran repositories and reads 1) the data itself and 2) metadata from the corresponding Rd file. Works well so far. Just need to parallelize stuff and handle potential errors. |
This comment has been minimized.
This comment has been minimized.
cvitolo
commented
May 25, 2017
@jakobbossek I'd love to see the results of your crawler/experiment. Did you publish it? |
This comment has been minimized.
This comment has been minimized.
A huge collection can be found here http://vincentarelbundock.github.io/Rdatasets/datasets.html |
giuseppec commentedMar 17, 2016
We can do something like (ugly code) and then upload everything