This is the repository I have made for the Getting And Cleaning Data Coursera Course. This README.md explains how I got to the (hopefully) tidy data set submitted.
I took the data that was given to me. Here's how they got the data I got (taken from the project description)
(Pretty much the comments from my run_analysis.R file)
-
Ingest all of the label data found in features.txt and activity_labels.txt
-
Read in the training data sets. (subject_train.txt, X_train.txt and Y_train.txt)
-
Name the columns of X_train according to the features label data.
-
Name the unnamed columns. For this, I named the Y_train data "activity" and the subject_train data "subject"
-
Using cbind, add in the columns for activity and subject to the end of the X_train data.
-
Do steps 2-6, but for the testing data sets
-
rbind the newly built training and testing data sets to create a full set
-
To keep only the mean and std columns (and the newly created columns, "activity" and "subject"), use a regular expresion, searching for and including on the columns that have ".mean()", ".std()", "activity" and "subject".
-
Use the activities label loaded in step 1 to replace the numeric activities with their descriptive names labels.
-
Using a melt function, group the data set by subject and activity
-
Perform a aggregate mean function against the newly melted data frame
-
Write the table out to a file called "courseproject.txt"
###Code Book
-
The subject column is a numeric ID of the test subject.
-
The activity column is a description of what the test subject was doing when the measurement was taken.
-
All of the other columns are aggregated means of the particular measurement that was taken, grouped by activity and subject.