Skip to content

sadams261/Data-Cleaning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

This script produces a tidy data set per Course Assignment, using input from 
https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip 

Configuration requirements:
The script requires that the zip file be downloaded and decompressed into a directory called /UCI HAR Dataset
  
The folder /UCI HAR Dataset needs to exist in the working directory. It will contain two folders, (test and train) along with other files.

Output:
write.table() is used with row.names=FALSE to create a text file called 'tidy.txt' in the working directory.

Tidy.txt is a "long narrow dataset" Each row of this dataset contains four entries:
*	Subject
*	Activity
*	Variable
*	Mean of Variable for this combination of Subject and Activity.

The accompanying code book describes the fields in the output in detail.

Assumptions 
The subset of measures chosen as "mean and standarde deviation measurements" explicitly contain "std" and "-mean" in their names

Operation
The script loads data (X,Y,Subjects) from train and test folders.The training and test data is combined into one dataframe.
The column names are updated with feature.txt names.
Columns with names containing "-mean" or "-std" are extracted and other data not used.
Activity labels are merged from the activity labels.txt
Data is summarized (using mean) creating one row per subject/activity/variable combination.

About

Course project for Coursera Data Cleaning course

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages