- codebook.txt
- run_analysis.R
codebook.md contains the code book for the dataset the analysis will produce.
run_analysis.R processes the information in the specified dataset directory, and produces a tidy dataset. This script defines a function called run_analysis() which is called at the bottom to produce the har.data variable.
Each of the files described in the dataset README.txt which were used for this project were read using read.table(). Most files had only one column of interest, so in the analysis script, these were subsetted into a vector immediately after read.table().
run_analysis() performs the following steps:
- Load activity_labels.txt as activity_labels
- Load features.txt as features
- Determine the subset of features we wish to keep as keep_features
- For "train" and "test" cases:
- Load test/X_*.txt using features for column names
- Drop the columns we decide not to keep
- Prepend the following columns to the dataset:
- 'subject' from subject_*.txt
- 'activity' from y_*.txt, mapped through activity_labels
- 'case' to reflect "train" or "test".
- rbind() both cases into a single data frame
- melt() the data.frame to "long form".
- dcast long form back into wide form, applying mean on every observation value.