Figure 1: The data life cycle. Image from “R for Data Science”
A model of the data life cycle from the free online book “R for Data Science” is shown in Figure 1. The whirlpool in the middle, besides making you dizzy I’m sure, is one of the most rewarding parts of the process - data exploration! There are some important take home points from the “Explore” portion of the diagram:
- Data might require transformation before proper visualization.
- It is best to visualize your data before you do statistical analysis (also known as statistical modeling).
- Rarely if ever will you complete this cycle only once.
- Iterate towards perfection. In other words, don’t worry about perfectly formatted publication quality graphs and results. Exploration should be full of fun and curiosity. Save the publication quality results for the “Communicate” step.
In this tutorial, we will introduce you to the tidyverse and start with
the Transform part of the life cycle, exploring the tidyverse data
manipulation (also known as data wrangling) package dplyr
.
Quarto Live - WebAssembly powered code blocks and exercises for Quarto HTML documents. Quarto is the “next generation” of R Markdown. Quarto Live uses WebR, which facilitates running R code directly in the browser with no need for a server.
- This course is based on R for Data Science: Data Transformation
This project is licensed under the MIT License - see the LICENSE file for details.