GitHub - sadams261/Data-Cleaning: Course project for Coursera Data Cleaning course

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Code Book.txt		Code Book.txt
ReadMe		ReadMe
run_analysis.R		run_analysis.R

Repository files navigation

This script produces a tidy data set per Course Assignment, using input from 
https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip 

Configuration requirements:
The script requires that the zip file be downloaded and decompressed into a directory called /UCI HAR Dataset
  
The folder /UCI HAR Dataset needs to exist in the working directory. It will contain two folders, (test and train) along with other files.

Output:
write.table() is used with row.names=FALSE to create a text file called 'tidy.txt' in the working directory.

Tidy.txt is a "long narrow dataset" Each row of this dataset contains four entries:
*	Subject
*	Activity
*	Variable
*	Mean of Variable for this combination of Subject and Activity.

The accompanying code book describes the fields in the output in detail.

Assumptions 
The subset of measures chosen as "mean and standarde deviation measurements" explicitly contain "std" and "-mean" in their names

Operation
The script loads data (X,Y,Subjects) from train and test folders.The training and test data is combined into one dataframe.
The column names are updated with feature.txt names.
Columns with names containing "-mean" or "-std" are extracted and other data not used.
Activity labels are merged from the activity labels.txt
Data is summarized (using mean) creating one row per subject/activity/variable combination.