-
Notifications
You must be signed in to change notification settings - Fork 1
/
workshop01.Rmd
261 lines (183 loc) · 6.33 KB
/
workshop01.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
---
title: "Workshop 01"
author: "your name here"
date: "today's date here"
output:
html_document:
toc: true
number_sections: true
toc_float: true
df_print: paged
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE,
message = FALSE,
warning = FALSE,
comment = NA)
```
# Introduction
Today you will get some review of topics we have covered recently.
The Console window will display executed code
output and can also be used for quick code execution. However, any work done
in the Console window will be lost once you exit RStudio.
We will use package `ggplot2` from `tidyverse`; so let's load `tidyverse` now
with
```{r packages}
library(tidyverse)
```
It is good practice to load all packages you will use at the start of your
R Markdown document.
# Console computation
The console can be used as a calculator.
- addition: `+`
- subtraction: `-`
- division: `/`
- multiplication: `*`
- modulus: `%%`
- integer division: `%/%`
- raise to power: `^`
**Evaluate the following expressions in the Console window.**
1. `3 + 4 * 0 - (100 / 3)`
2. `(4 + 6) * (2 ^ 6)`
3. `1 / 0`
4. `10 ^ 10 ^ 10 ^ 10`
5. `0 / 0`
6. `0.0000003 * 2`
When you launch RStudio, numerous functions are immediately available to you.
A few of the mathematical and statistical functions are:
| R function | Purpose |
|-----------:|--------------------|
| `abs()` | absolute value |
| `sin()` | sine |
| `cos()` | cosine |
| `tan()` | tangent |
| `log()` | logarithm |
| `exp()` | exponential |
| `mean()` | arithmetic mean |
| `median()` | median |
| `sd()` | standard deviation |
**Evaluate the following expressions in the Console window.**
7. `abs(7)`
8. `sin(3.1415)`
9. `exp(1)`
10. logarithm of expressions 1 - 6
What logarithm did you just take? Was it the natural log, base 10, base 2?
**Type `?log` in the console.** A question mark that precedes a function's
name or built in data object will populate the help tab.
**Type the following in the Console window.**
11. `?sd`
12. `?mtcars`
13. `?longley`
**What are `mtcars` and `longley`?**
The most important aspects of R's help resource will be the description and
examples given. Examples are always at the end of the help reference.
**How many examples are given in the help of `sd`?**
14. Run the example provided in the help for `sd`.
**Investigate what the following functions do by creating an R chunk with**
**examples of each function in use.**
15. `sqrt`
16. `round`
17. `floor`
18. `ceiling`
```{r}
```
Now is good place to save your work. Stage, commit, and push your work to
GitHub. Did you remember to configure git?
```{r config-git, eval=FALSE}
library(usethis)
use_git_config(user.name = "", user.email = "")
```
# Longley data
## Examine the data
Consider Longley's Economic Regression Data. This data set is built-in to R.
That means it is available immediately once RStudio is launched.
Type `longley` in your Console to see the entire data set. The same data is
given below.
```{r echo=FALSE}
longley
```
**Answer the following questions about the `longley` data set.**
1. How many rows and columns does `longley` have?
2. What do you think is the difference between the first column of years and
the column with the label Year?
**Type `head(longley)` in the Console window. What does this do?**
**How about `tail(longley)`?**
## Summary statistics
Data set `longley` is stored in R as a data frame. Each column is a
vector with the components being of the same variable type.
We will learn about these details later this week.
For now, to access a specific vector use `longley$variable_name`, where
`variable_name` is one of the variables in `longley`. For example,
```{r}
longley$GNP
```
and
```{r}
longley$Year
```
give the GNP and Year vectors of data, respectively.
**In a new code chunk below get the following vectors:**
3. Unemployed
4. Population
5. Employed
**In another code chunk compute the mean, median, and standard deviation for**
**each of the vectors in 3 - 5.**
The `summary` function in R will give you many of these statistics. For example,
```{r}
summary(longley$GNP)
```
gives us the minimum, maximum, mean, and quartiles of the GNP vector of data.
**Insert a code chunk and use the `summary` function on two variables of**
**your choice in `longley`.**
Now is good place to save your work. Stage, commit, and push your work to
GitHub.
## Employment investigation
Suppose it is 1962. Two economists are discussing employment. Each makes
the following claim.
**Economist A:** Employment has never been higher in the past 15 years, we
have seen a gradual increase from 1947 to 1962.
**Economist B:** Employment has been range bound since 1947 and is at its
lowest level since 1947.
**Which economist is correct?**
Let's look at the variable Employed across time using function `ggplot()`.
```{r fig.width=9, fig.height=6}
ggplot(data = longley, mapping = aes(x = Year, y = Employed)) +
geom_point()
```
Is this the best way to look at employment over time? What other variables
in the data are changing as we progress from 1947 - 1962? Discuss this with
those near you. How would you create a more meaningful representation of
employment over time?
```{r}
```
**Based on your discussions, which economist do you think is correct?**
Use package `ggplot2` to recreate the plots below.
```{r}
```
*Hints:*
- color is "purple"
- size is 3
- theme is `theme_bw()`
<br/><br/>
```{r}
```
*Hints:*
- color is "darkgreen"
- size is 3
- shape is "square"
- theme is `theme_minimal()`
**Why the sudden jump in number of armed forces members from 1950 to 1951?**
<br/>
Use `longley` to create one plot of your choice. Be sure to choose an
appropriate representation for the variable(s) given the variable type(s) -
quantitative or qualitative.
# Final look over
Before you make your final commit and push, check that
1. all your code chunks have names,
2. your plots are of appropriate dimensions,
3. you have included written responses to the relevant questions.
Make your final stage, commit, and push. Verify your work is updated on GitHub.
# References
1. J. W. Longley (1967) An appraisal of least-squares programs from the point
of view of the user. Journal of the American Statistical Association 62,
819-841.