forked from rdpeng/RepData_PeerAssessment1
-
Notifications
You must be signed in to change notification settings - Fork 0
/
kviljoen_peer_assessment1.Rmd
103 lines (88 loc) · 3.2 KB
/
kviljoen_peer_assessment1.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
Assessment 1.
================================
Load the data
-------------------------------------------
```{r, echo=TRUE}
data <- read.csv("activity.csv", header = TRUE)
data.clean <- na.omit(data)
```
What is the mean total number of steps taken per day?
-------------------------------------------
**1. Histogram**
```{r,echo=TRUE}
agr <- aggregate(steps~ date, data = data.clean,sum)
#pdf("histogram_of_steps_per_day.pdf")
hist(agr$steps, xlab = "steps per day", main = "histogram of total number of steps taken per day")
#dev.off()
```
**2.
Mean number of steps per day**
```{r,echo=TRUE}
steps.mean <- mean(agr$steps)
steps.mean
```
**Median number of steps per day**
```{r,echo=TRUE}
steps.median <- median(agr$steps)
steps.median
```
What is the average daily activity pattern?
-------------------------------------------
**1. Time series plot of the 5-minute interval (x-axis) and the average number of steps taken, averaged across all days (y-axis)**
```{r,echo=TRUE}
steps.int <- aggregate(steps~interval, data = data.clean, mean)
plot(steps.int, type = "l")
```
**2. Which 5-minute interval, on average across all the days in the dataset, contains the maximum number of steps?**
```{r, echo=TRUE}
steps.int[which.max(steps.int$steps),1]
```
Imputing missing values
-------------------------------------------
**1. The number of missing values in the dataset is**
```{r,echo=TRUE}
length(which(is.na(data)))
```
**2.&3. Use the mean for a give interval to impute missing values and assign to data set named 'data.c'**
```{r, echo=TRUE}
rownames(steps.int) <- steps.int$interval
data.c <- data
for(i in 1:dim(data.c)[1]){
data.c[i,1] <- ifelse(is.na(data.c[i,1]),steps.int[as.character(data.c[i,3]),2],data.c[i,1])
#print(steps.int[as.character(data.c[i,3]),1])
}
```
**4.
Make a histogram of the total number of steps taken each day on the imputed dataset**
```{r, echo=TRUE}
agr.imp <- aggregate(steps~ date, data = data.c,sum)
hist(agr.imp$steps, xlab = "steps per day", main = " total number of steps taken each day (imputed data)")
```
**Mean number of steps per day (imputed)**
```{r,echo=TRUE}
steps.mean.imp <- mean(agr.imp$steps)
steps.mean.imp
```
**Median number of steps per day (imputed)**
```{r,echo=TRUE}
steps.median.imp <- median(agr.imp$steps)
steps.median.imp
```
**In this case imputation of the values did not affect the estimates of the total daily number of steps taken**
Are there differences in activity patterns between weekdays and weekends?
--------------------------------
**1. Create new factor variable - weekday/weekend**
```{r, echo=TRUE}
data.c$date <- as.Date(as.character(agr.imp$date))
data.c$day <- weekdays(data.c$date)
data.c$w <- ifelse(data.c$day=="Saturday" | data.c$day=="Sunday","weekend","weekday")
data.c$w <- as.factor(data.c$w)
```
**2. Create a panel time series plot**
```{r, echo=TRUE}
agr.imp.weekday <- aggregate(steps ~ interval, data = data.c[data.c$w=="weekday",],mean)
agr.imp.weekend <- aggregate(steps ~ interval, data = data.c[data.c$w=="weekend",],mean)
par(mfrow = c(2,1))
plot(agr.imp.weekday, type = "l", xlab = "intervals",ylab = "average steps/interval",main = "weekdays")
plot(agr.imp.weekend, type = "l",xlab = "intervals",ylab = "average steps/interval", main = "weekends")
```