Skip to content

Human Activity Recognition Using Smartphones: Getting and Cleaning Data Course Project

License

Notifications You must be signed in to change notification settings

jclopeztavera/human-activity

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Human Activity Recognition Using Smartphones: Getting and Cleaning Data Course Project

The main purpose of this project is to showcase the knowledge and skills acquired in the Getting and Cleaning Data Course by Johns Hopkins University at Coursera, as a part of the Data Science Specialization.

The first goal of this repository is to contain the raw data, the code to get and clean them, the tidy data, and the code book. I am focusing on making all self-contained, self-evident and reproducible.

The main contents of this repository, as required in the project instructions, are:

  • The tidy dataset.
  • The R script run_analysis.R for performing all the analyses. For more detail on each step taken:
  • The codebook that describes the variables, the data, and the data transformations performed. For more detail on the making of the code book:
    • The Rmarkdown file for knitting the codebook: CodeBook.Rmd

Usage

  1. Clone this repository git clone https://github.com/jclopeztavera/human-activity.git.
  2. Open the R-project file.
  3. Source the run_analysis.R file .
  4. Drop me a line if you find any areas for improvement.

To do

  • Merge the training and the test sets to create one data set.
  • Extract only the measurements on the mean and standard deviation for each measurement.
  • Use descriptive activity names to name the activities in the data set
  • Appropriately label the data set with descriptive variable names.
  • From the data set in the previous step , create a second, independent tidy data set with the average of each variable for each activity and each subject.
  • Make code book.
  • Submit assignment.

Review criteria

  • The submitted data set is tidy.
  • The Github repo contains the required scripts.
  • GitHub contains a code book that modifies and updates the available code books with the data to indicate all the variables and summaries calculated, along with units, and any other relevant information.
  • The README that explains the analysis files is clear and understandable.
  • The work submitted for this project is the work of the student who submitted it.

Next steps

  • Read the paper behind the data set.
  • Properly explore the data set, look at all variables and understand them.
  • Describe the data set in detail.
  • Hands on ML: Train, test, and compare classification algorithms (besides the paper, you can read this IPython Notebook by Mark Regan)

Built With

Acknowledgments

  • Jeff Leek, Roger D. Peng, and Brian Caffo from the Bloomberg School of Public Health.
  • R Core Team (2016). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL.
  • Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra and Jorge L. Reyes-Ortiz. A Public Domain Dataset for Human Activity Recognition Using Smartphones. 21th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, ESANN 2013. Bruges, Belgium 24-26 April 2013. URL.

Releases

No releases published

Packages

No packages published