The purpose of this project is to demonstrate your ability to collect, work with, and clean a data set.
- README.md
- CodeBook.md
- run_analysis.R
- Ignored raw data in "/Inertial Signals" as data files available in "/test" and "/train" folders produce an acceptable result. Total 563 variables with 561 features + Subject and Activity. Result 10299 obs. of 563 variables
- The features required are mean() and std() which will result in 66 Variables. Total 68 Variables including Subject and Activity. Result 10299 obs. of 68 variables
- "Uses descriptive activity names to name the activities in the data set", rename column names for the features
- "Appropriately labels the data set with descriptive activity names.", replace activity code to activity names as data values
- Create tidy dataset with the average of each variable for each activity (6) and each subject (30). Result 30 Subjects x 6 activities = 180 obs. of 68 variables
- Download "getdata-projectfiles-UCI HAR Dataset.zip" from the link above.
- Unzip "getdata-projectfiles-UCI HAR Dataset.zip"
- Place "run_analysis.R" R script in "UCI HAR Dataset" folder where "test" and "train" folders reside.
Merges the training and the test sets to create one data set.
- Used
read.table()andcbind()to create test and train data frames. - Used
read.table()andcolnames()to rename column names - Used
rbind()to merge test and train data frames.
Extracts only the measurements on the mean and standard deviation for each measurement.
- Used
grep()to select only Subject, Activity, mean() and std() columns. - Used data frame selecting (Keeping) Variables.
Uses descriptive activity names to name the activities in the data set
- Used
gsub()to remove -, ( and ) - Used
gsub()to replace column names mean to Mean and std and Std - Used
colnames()to rename column names
Appropriately labels the data set with descriptive activity names.
- Used
read.table()to add a data frame to be used as activity name values - Used
match()to replace the activity code with activity name values - Used
gsub()to replace underscores to spaces
Creates a second, independent tidy data set with the average of each variable for each activity and each subject.
- Load reshape2 library.
- Used
order()to have a better sorting view of the data - Used
melt()to prepare the measures for aggregation for Subject and Activity. Removed row names. - Used
dcast()to get the average values usingmean()on the measures based on Subject and Activity - Used
write.table()to save the tidy dataset as tidydata.txt
Code book that describes the variables, the data, and any transformations or work that you performed to clean up the data