forked from rdpeng/RepData_PeerAssessment1
-
Notifications
You must be signed in to change notification settings - Fork 0
/
PA1_template.Rmd
130 lines (81 loc) · 2.83 KB
/
PA1_template.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
# Reproducible Research: Peer Assessment 1
## Loading data
Load and transform the data.
```{r load data}
library(ggplot2)
library(plyr)
data <- read.csv("activity.csv")
data$date <- as.Date(data$date,format="%Y-%m-%d")
```
## Mean total number of steps taken per day
We can plot the histogram of all the steps
```{r mean steps}
with(data,hist(steps, main="Histogram of total steps"))
```
But would be better to see total steps per day. For that, we need to aggregate
the data.
First we need to aggregate number of steps per day
```{r}
data.perday <- aggregate(steps ~ date, data, sum, na.rm=T)
```
Now we can see histogram of steps per day
```{r}
ggplot(data.perday, aes(steps))+geom_histogram(binwidth=2000,fill="white",colour="black")
```
Mean and media steps per day
```{r}
mean(data.perday$steps,na.rm=T)
median(data.perday$steps,na.rm=T)
```
## Average daily activity pattern
Now we aggregate steps per interval
```{r}
data.interval <- aggregate(steps ~ interval, data, mean, na.rm=T)
ggplot(data.interval, aes(x=data.interval$interval, y=data.interval$steps))+geom_line()
```
## Imputing missing values
We can impute the missing values by mean values
```{r}
data.impute <- adply(data, 1, function(x) if (is.na(x$steps)) {
x$steps = round(data.interval[data.interval$interval == x$interval, 2])
x
} else {
x
})
```
Now lets look at the per day and per interval patterns again after imputing
missing values
First we need to aggregate number of steps per day
```{r}
data.impute.perday <- aggregate(steps ~ date, data.impute, sum)
```
Now we can see histogram of steps per day
```{r}
ggplot(data.impute.perday, aes(steps))+geom_histogram(binwidth=2000,fill="white",colour="black")
```
Mean and media steps per day
```{r}
mean(data.impute.perday$steps)
median(data.impute.perday$steps)
```
Now we aggregate steps per interval
```{r}
data.impute.interval <- aggregate(steps ~ interval, data.impute, mean)
ggplot(data.impute.interval, aes(x=interval, y=steps))+geom_line()
```
## Are there differences in activity patterns between weekdays and weekends?
First find whether each day is a weekday or weekend day
```{r}
data.impute.weekend <- subset(data.impute, weekdays(date) %in% c("Saturday", "Sunday"))
data.impute.weekday <- subset(data.impute, !weekdays(date) %in% c("Saturday", "Sunday"))
data.impute.weekend <- aggregate(steps ~ interval, data.impute.weekend, mean)
data.impute.weekday <- aggregate(steps ~ interval, data.impute.weekday, mean)
data.impute.weekend <- cbind(data.impute.weekend, day = rep("Weekend"))
data.impute.weekday <- cbind(data.impute.weekday, day = rep("Weekday"))
data.impute.week <- rbind(data.impute.weekend, data.impute.weekday)
```
Now plot the data
```{r}
ggplot(data.impute.week, aes(x = interval, y = steps)) + geom_line() + facet_grid(day ~
.) + labs(x = "Interval", y = "Number of steps")
```