Skip to content

lmsv-mx123/GetData_CourseProject

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Getting and Cleaning Data

The raw data files

The main folder UCI HAR Dataset contains:

  1. a train data folder of
  • X_train.txt: Train feature data set, consisting of 561 measurements/features from accelerometer and gyroscope
  • y_train.txt: Train activity data set (identified by activity_id)
  • subject_train.txt: Subject train data set (identified by subject_id)
  1. a test data folder of
  • X_test.txt: Test feature data set, consisting of 561 measurements/features from accelerometer and gyroscope
  • y_test.txt: Test activity data set (identified by activity_id)
  • subject_test.txt: Subject test data set (identified by subject_id)
  1. activity_labels.txt: Data set with the Activity_id and Activity_Label relationship (WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING, LAYING)

  2. features.txt: All of the features obtained, namely a multiplication of

Name Time Freq.
Body Linear Acceleration 1 1
Gravity Linear Acceleration 1 0
Body Linear Jerk 1 1
Body Angular Velocity 1 1
Body Angular Acceleration 1 0
Body Linear Acceleration Magnitude 1 1
Gravity Linear Acceleration Magnitude 1 0
Body Linear Jerk Magnitude 1 1
Body Angular Velocity Magnitude 1 1
Body Angular Acceleration Magnitude 1 1

with

Function Description
mean Mean value
std Standard deviation
mad Median absolute value
max Largest values in array
min Smallest value in array
sma Signal magnitude area
energy Average sum of the squares
iqr Interquartile range
entropy Signal Entropy
arCoeff Autorregresion coefficients
correlation Correlation coefficient
maxFreqInd Largest frequency component
meanFreq Frequency signal weighted average
skewness Frequency signal Skewness
kurtosis Frequency signal Kurtosis
energyBand Energy of a frequency interval
angle Angle between two vectors

Description of function from: https://www.elen.ucl.ac.be/Proceedings/esann/esannpdf/es2013-84.pdf

  1. features_info.txt: A description of the base feature list

  2. README.txt: a general readme file

Procedure for tidying up data

  1. Merge the training and the test sets to create one data set.
  2. Extract only the measurements on the mean and standard deviation for each measurement.
  3. Use descriptive activity names to name the activities in the data set.
  4. Appropriately label the data set with descriptive variable names.
  5. From the data set in step 4, create a second, independent tidy data set with the average of each variable for each activity and each subject.

Files included for Project

  1. README.md: General readme file for project.
  2. CodeBook.md: Code Book describing the variables, the data, and the transformations performed to clean up the data.
  3. run_analysis.R: R script which does the procedure described earlier to tidy up data.
  4. tidy_data.txt: Tidy data set file, output as a txt file.
  5. tidy_data.xls: Tidy data set file, output as xls file for those who see easier data in xls files :)
  • Output files not uploaded into the repository.

Running the script

  • Download the script to the home directory ("~/")

  • Execute the following commands (required libraries and the zipped data file are automatically used and if not present, are downloaded and extracted/installed)

    • Curl must be properly set-up in file system when using the script to also fetch zipped data file into working directory, otherwise download and extract the zipped file externally into working directory ("~/").
source("run_analysis.R")
run_analysis()

Viewing the text file in R

  • To view the text file in a readable way, issue
tidydata <- read.table("tidy_data.txt", header = TRUE) #tidy_data.txt must be in current working directory!
View(tidydata)

For further information

Read CodeBook.md for a description of the transformations used as well as the variables and data.

About

Course Project for Getting and Cleaning Data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages