getdata-015-courseproject

Course Project submission repository for Getting and Cleaning Data courese.

run_analysis.R

R script for the Course Project of "Getting and Cleaning data" by Sungwook Moon

Task 1.

Merges the training and the test sets to create one data set.

Download project data

 temp <- tempfile()
 fileUrl <- "https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip"
 download.file(fileUrl, temp, method="curl")

Read train data files which are under the folder train and bind subject id, activity level and train data

 subject_train <- read.table(unz(temp, "UCI HAR Dataset/train/subject_train.txt"), stringsAsFactors=FALSE)
 ylabel_train <- read.table(unz(temp, "UCI HAR Dataset/train/y_train.txt")) 
 ds_train <- read.table(unz(temp, "UCI HAR Dataset/train/X_train.txt"))  # read train dataset 7,352 obs.
 ds_train <- cbind(subject_train, ylabel_train, ds_train)

Read test data files which are under the folder test and bind subject id, activity level and test data

 subject_test <- read.table("./data/test/subject_test.txt", stringsAsFactors=FALSE)
 ylabel_test <- read.table("./data/test/y_test.txt") 
 ds_test <- read.table("./data/test/X_test.txt")
 ds_test <- cbind(subject_test, ylabel_test, ds_test)

Merge both datasets and set column names

 ds_merged <- rbind(ds_train, ds_test)
 hdr <- read.table(unz(temp, "UCI HAR Dataset/features.txt"), stringsAsFactors=FALSE)  # read column header
 names(ds_merged) <- c("subjectID", "activity", hdr$V2)

Task 2.

Extracts only the measurements on the mean and standard deviation for each measurement.

Make column names unique before select

 names(ds_merged) <- make.names(names=names(ds_merged), unique=TRUE)

Select columns which contains "mean" and "std"

 ds_extracted <- select(ds_merged, subjectID, activity, 
                contains(".mean"), contains(".std"), -contains(".meanFreq"))

Task 3.

Uses descriptive activity names to name the activities in the data set

Read activity labels from file

 activity_labels <- read.table(unz(temp, "UCI HAR Dataset/activity_labels.txt"), stringsAsFactors=FALSE)

Change activity levels to factor type with descriptive labels

 ds_extracted$activity <- as.factor(ds_extracted$activity)
 levels(ds_extracted$activity) <- activity_labels$V2

Task 4.

Appropriately labels the data set with descriptive variable names. Using gsub function, change the column names of dataset

names(ds_extracted) <- gsub(".mean","Mean", names(ds_extracted))
names(ds_extracted) <- gsub(".std","Std", names(ds_extracted))
names(ds_extracted) <- gsub("\\.","", names(ds_extracted))
names(ds_extracted) <- gsub("BodyBody","Body", names(ds_extracted))

Task 5.

From the data set in step 4, creates a second, independent tidy data set with the average of each variable for each activity and each subject.

Create tidy dataset

     ds_tidy <- ddply(ds_extracted, .(subjectID, activity), colwise(mean))

Write dataset to a txt file

     write.table(ds_tidy, file="./tidy_dataset.txt", row.names=FALSE, col.names=TRUE, sep="\t", quote=FALSE)

Disconnect and remove tempfile

     unlink(temp)
     rm(temp,fileUrl)

That is it.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
CodeBook.md		CodeBook.md
README.md		README.md
run_analysis.R		run_analysis.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

getdata-015-courseproject

run_analysis.R

Task 1.

Task 2.

Task 3.

Task 4.

Task 5.

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

getdata-015-courseproject

run_analysis.R

Task 1.

Task 2.

Task 3.

Task 4.

Task 5.

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages