Discovering good data packages
- Andy Teucher @ateucher
- Richie Cotton @richierocks
- Claudia Vitolo @cvitolo
- Jakub Nowosad @Nowosad
- Joe Stachelek @jsta
Most of us are involved in teaching R in some way, and it is always a struggle to find suitable datasets with which to teach, especially across domain expertise. There are many packages that have data, but finding them and knowing what is in them is a struggle due to inadequate documentation.
- Make it easy to discover suitable data
- Write some guidance on documenting data in packages
- Google Doc which describes best practices for documentation.
Checklist of things to document.
Make sure your documentation answers as many of these questions as possible.
- What does the data represent?
- What format is the data in?
- How big is the dataset?
- Where does the come from?
- How has the data been processed?
- What does the data look like?
- How do you analyze the data?
- Where is this data used?
- Is there a paper, or other external resource discussing this dataset?
A patch for
usethis::use_readme_rmd()to display datasets in package README files.
A flexdashboard with a searchable table that shows metadata on datasets from many CRAN packages. It has information for over 4000 datasets.
The state of data on CRAN
Installing and loading packages
What makes a good data package?
Potential Future Work
Additional Data Sources
Additional Package Stats
- Use Github URLs to pull geo-location of package maintainers
Scoring the quality of data in a package
Creating badges to advertise data quality
Contact package authors with data quality deficiencies