Final project for the Coursera course "Getting and Cleaning Data"
This repository contains this README.md file, along with four key items of interest:
- An R script used to read and tidy the input data (run_analysis.R)
- The full tidy data set (tidy_data.csv)
- A summarized, tidy data set (tidy_data_summary.csv)
- A code book (codebook.md) describing the tidy data noted in 2 & 3
The original data for this project can be found here and a description of both how the data was obtained and why it could be useful can be found here. The below description assumes that you have read and are familiar with the above referenced inforamtion.
This project was seeking to, and does fulfill the following requirements.
- The submitted data set is tidy.
- The Github repo contains the required scripts.
- GitHub contains a code book that modifies and updates the available codebooks with the data to indicate all the variables and summaries calculated, along with units, and any other relevant information.
- The README that explains the analysis files is clear and understandable.
- The work submitted for this project is the work of the student who submitted it.
The key to this assignment was ensuring that the data was transformed such that it meets the three criteria of "tidy data" from Hadley:
- Each variable forms a column.
- Each observation forms a row.
- Each type of observational unit forms a table.
The transformations required to get our data in a tidy form were of three types:
- Bind columns, to pull descriptive variables into the same table
- Merge tables, to associate descriptive variable names in place of relational numbers
- Reshape 561 columns of table into a table into many rows with two columns representing that inforamtion
The first two transformations are trivial, but the third requires the use of the gather() and extract_numeric functions from the package tidyr that is well suited for this exact purpose.