-
Notifications
You must be signed in to change notification settings - Fork 0
/
birthday.Rmd
105 lines (71 loc) · 2.46 KB
/
birthday.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
---
title: "R Notebook"
output:
html_document: default
html_notebook: default
---
This is an [R Markdown](http://rmarkdown.rstudio.com) Notebook. When you execute code within the notebook, the results appear beneath the code.
Try executing this chunk by clicking the *Run* button within the chunk or by placing your cursor inside it and pressing *Cmd+Shift+Enter*.
```{r}
plot(cars)
```
Add a new chunk by clicking the *Insert Chunk* button on the toolbar or by pressing *Cmd+Option+I*.
When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the *Preview* button or press *Cmd+Shift+K* to preview the HTML file).
```{r}
brth <- read.csv("birthdaysExample.csv", sep=",")
```
```{r}
summary(brth)
```
The four calls below convert the date to a datetime format, extract days and months to a separate columns, and delete the "dates" column for convenience.
```{r}
brth$dates <- as.Date(brth$dates, "%m/%d")
```
```{r}
brth$months <- as.numeric(format(brth$dates, "%m"))
```
```{r}
brth$days <- as.numeric(format(brth$dates, "%d"))
```
```{r}
brth <- brth[c("months", "days")]
```
As the data is extracted, we may commence some introductory analysis.
```{r}
summary(brth)
```
```{r}
library(ggplot2)
```
```{r}
ggplot(aes(x = months), data = brth) +
geom_histogram(bins = 12)
```
Let us explore this further with frequency plots for months and for days:
```{r}
ggplot(aes(x = months), data = brth) +
geom_freqpoly(bins = 12) +
scale_x_continuous(breaks = 1:12)
```
```{r}
ggplot(aes(x = days), data = brth) +
geom_freqpoly(bins = 12) +
scale_x_continuous(breaks = 1:31)
```
The plots indicate the following trends:
- The most common month of birth in the group is March, with close to 100 births. It is closely followed by September.
- There are definite spikes in the days of birth, with most common days (counting over 100) being: 8th, 19th, and 27th.
We will confirm the most common values for months and days by simply sorting the data:
```{r}
dtab <- table(brth$days)
```
```{r}
sort(dtab,decreasing=TRUE)[1:5]
```
```{r}
mtab <- table(brth$months)
```
```{r}
sort(mtab, decreasing = TRUE)[1:5]
```
The results of that seem to be somewhat different from results of frequency plots. The most popular day of birth turns out to be fourteents, with 48 births. The second most common value is 40 births, shared by days 9th, 17th and 19th. The most common month of birth is March, with 98 births, closely followed by September.