Data for Data Science
Researchers face a growing data management challenge, starting with data collection and continuing through data analysis, publication, and archival. Potential problems research labs may face include scalability of their data management methods to many and/or very large data files, fully documenting data and its organization, and meeting requirements of grants/publication related to data sharing. This four-week course is designed to introduce attendees to best practices in data organization and management. Each one-hour lecture will include lecture, discussion, and practice exercises. This course assumes no prior training in data science. At the end of this course, you will be able to identify resources at Fred Hutch for data management and apply best practices in data organization to your own research projects.
Software requirements for this course can be found on fredhutch.io's Software page.
- Week 1: Data entry and creating spreadsheets
- Week 2: Organizing data and project files
- Week 3: Documenting data with metadata
- Week 4: Data manipulation and reproducibility
- Each week of class has a directory containing relevant materials, including:
WeekX_Topic.pdf: PDF of slides for presentation, where X indicates the week of class
weekX_instructor.md: notes to guide instructor presentation and activity engagement