sadams261/Data-Cleaning
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|
Repository files navigation
This script produces a tidy data set per Course Assignment, using input from https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip Configuration requirements: The script requires that the zip file be downloaded and decompressed into a directory called /UCI HAR Dataset The folder /UCI HAR Dataset needs to exist in the working directory. It will contain two folders, (test and train) along with other files. Output: write.table() is used with row.names=FALSE to create a text file called 'tidy.txt' in the working directory. Tidy.txt is a "long narrow dataset" Each row of this dataset contains four entries: * Subject * Activity * Variable * Mean of Variable for this combination of Subject and Activity. The accompanying code book describes the fields in the output in detail. Assumptions The subset of measures chosen as "mean and standarde deviation measurements" explicitly contain "std" and "-mean" in their names Operation The script loads data (X,Y,Subjects) from train and test folders.The training and test data is combined into one dataframe. The column names are updated with feature.txt names. Columns with names containing "-mean" or "-std" are extracted and other data not used. Activity labels are merged from the activity labels.txt Data is summarized (using mean) creating one row per subject/activity/variable combination.