Presentation

title	author	date	output
readme	phjrdla	September 30, 2017	html_document

Presentation

Files for Data Cleaning course 3 assignment are found at https://github.com/phjrdla/datasciencecoursera/tree/master

readme.md, this file
run_analysis.R, script loading an processing datasets
tidy_dataset.md, description of dataset created by tidy_dataset.md
tidy_dataset.txt, dataset required for assignment

Data to process

data packaged in a zip file is downloaded from provided url https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip

Data is made of measures made by various types of sensors while people are performing predefined physical activities.

Data is unzipped in a working directory and processed to produce a subset containing only means and standard deviations grouped by Activity and Subject which is written in working directory.

Processing

Processing implements a solution to the requirements below

Merges the training and the test sets to create one data set.

all files are loaded in dataframes
columns are named using the features dataframe
a unique key equal to row number is added to dataframes
a merged dataset is build joining the relevant dataframes on unique key and appending

Extracts only the measurements on the mean and standard deviation for each measurement.

a dataframe containing exclusively means and standard deviations is build by "greping" column names.

Uses descriptive activity names to name the activities in the data set

description of activities is added to means and standard deviations dataframe by joining it with activity labels dataframe

Appropriately labels the data set with descriptive variable names.

Column names are made somehow more intelligible by substituing cryptic abreviations by something clearer

From the data set in step 4, create a second, independent tidy data set with the average of each variable for each activity and each subject

means and standard deviations dataframe is piped to group it by activity type and subject while the mean of all columns computed over the groups

Output

The dataframe holding means and standard deviations per group is written in working directory to file 'tidy_dataset.txt'. Values are separated by a space and the 1st line holds measurements names.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Presentation

Data to process

Processing

Output

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
app.R		app.R
readme.md		readme.md
run_analysis.R		run_analysis.R
tidy_dataset.md		tidy_dataset.md
tidy_dataset.txt		tidy_dataset.txt

phjrdla/datasciencecoursera

Folders and files

Latest commit

History

Repository files navigation

Presentation

Data to process

Processing

Output

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages