/
03-functions.qmd
420 lines (298 loc) · 16.5 KB
/
03-functions.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
# Custom Functions {#sec-functions}
[Recording](https://youtu.be/ebpEA0UaG0I)
Learn to write custom functions and include them in your package.
```{r, filename="Packages required for this chapter"}
library(usethis) # for setting up data sets in a package
library(glue) # for text joining
library(devtools) # for loading your package
```
## Function Overview
First, let's create a very basic `r glossary("function")` to learn about custom functions. Functions need a name (like any R `r glossary("object")`). They are created with the `function()` function. We're going to make a function that rounds numbers and keeps the trailing zeroes, so let's call it `round0`.
```{r, filename="R/round0.R"}
round0 <- function() {
# function code goes here
}
```
Most functions have `r glossary("argument", "arguments")` that set inputs to the function or options for how the function can work. These arguments can be required for the function to work, or have `r glossary("default value", "default values")`.
Check the help for the `round` function; we want our function to work almost the same, so it will be easier for users if the arguments have the same names in the same order, with the same default values. In the example below, the argument `digits` defaults to 0 unless you change it.
```{r, filename="R/round0.R"}
round0 <- function(x, digits = 0) {
# function code goes here
}
```
Functions use these arguments in their code to produce some kind of output (or side effect). Here, we use the value of `digits` to create a formatting string, and format the value of `x` with it using `sprintf()`. Finally, we use the `return()` function to return the value.
```{r, filename="R/round0.R"}
round0 <- function(x, digits = 0) {
fmt <- paste0("%.", digits, "f")
x0 <- sprintf(fmt, x)
return(x0)
}
```
:::{.callout-note}
You technically don't have to use the `return()` function. The last object created in the function code will be automatically returned. Most people don't use `return()`, but that can sometimes make it hard to figure out exactly what is being returned if you have a lot of if/else logic.
The `return()` function also stops all subsequent code from being run.
:::
Run the code above to define the function. After it's defined, if you type the function name into the console, without parentheses, it will show you the code for the function.
```{r, filename="run in the console"}
round0
```
:::{.callout-note}
You can do this for any function; try a few! Many of the base R functions, like `mean` have unsatisfying code like `UseMethod("mean")`, which you can lean about in the [S3 Chapter of Advanced R](https://adv-r.hadley.nz/s3.html), but other functions like `sd` will show you the code they use.
:::
Use your function like any other R function.
```{r, filename="run in the console"}
round0(7/3, 3)
round0(1.9999, 2)
```
However, you'll have to define it at the top of any script that uses it, unless you add it to a package.
## Function Development
For `demopkg`, we're going to create a function that produces the APA-formatted text for the results of a paired-samples t-test. Here's an example of APA format.
> A paired-samples t-test was conducted to compare {dv} between {level1} (M = {mean1}, SD = {sd1}) and {level2} (M = {mean2}, SD = {sd2}). There was a {non}significant difference; t({df}) = {t_value}, p = {p_value}.
### Specific Instance
The first step is to sort out a specific instance of your code. You can put this in a new R script for working out your code and delete it later. We'll load in the data we added to demopkg in @sec-load-data.
```{r}
# load example data
data("self_res_att", package = "demopkg")
```
If you haven't added the data to your package yet, use this code:
```{r, eval = FALSE}
self_res_att <- read.csv("https://raw.githubusercontent.com/PsyTeachR/intro-r-pkgs/main/data-raw/DeBruine_2004_PRSLB_att.csv")
```
Next, compare preferences for self-resembling female faces (`f_self`) to others' preferences for those same faces `f_non` using a paired-samples t-test.
```{r}
# analysis
t_results <- t.test(
x = self_res_att$f_self,
y = self_res_att$f_non,
paired = TRUE)
t_results
```
Now we set up the text template with variables inside curly brackets where we want to insert values from the analysis. Just make up the variable names, keeping them short but meaningful.
```{r}
template <- "A paired-samples t-test was conducted to compare {dv} between {level1} (M = {mean1}, SD = {sd1}) and {level2} (M = {mean2}, SD = {sd2}). There was a {non}significant difference; t({df}) = {t_value}, p = {p_value}."
```
The object `t_results` prints out like above, but the object is actually a list, Use the `str()` function to see the structure of the list.
```{r}
str(t_results)
```
Now we can set the value of each variable from the `t_results` object. You can't get the means and standard deviations from the `t_results` object, so we'll calculate those from the data. Round each numeric value to the appropriate level using the `round0` function we created above.
```{r}
glue::glue(
template,
dv = "preferences for female faces",
level1 = "participants who resembled those faces",
level2 = "non-self participants",
mean1 = round0(mean(self_res_att$f_self), 1),
sd1 = round0(sd(self_res_att$f_self), 1),
mean2 = round0(mean(self_res_att$f_non), 1),
sd2 = round0(sd(self_res_att$f_non), 1),
non = ifelse(t_results$p.value < .05, "", "non-"),
df = round0(t_results$parameter, 0),
t_value = round0(t_results$statistic, 2),
p_value = round0(t_results$p.value, 3)
)
```
::: {.callout-note}
Don't worry about `p = 0` just now. A further practice exercise is for you to add code to the function to change it to `p < .001` where appropriate.
:::
### Set up Function
Now we're ready to abstract this into a function. The function will need a name. This function will (for now) only give you the APA-style text for a paired-samples t-test, so let's call it `apa_t_pair`.
You can create an R script in the `R` directory called `apa_t_pair.R`, but the code below does this for you.
```{r, eval = FALSE, filename="run in the console"}
usethis::use_r("apa_t_pair")
```
We'll develop our function in this file. To start, set up a blank function definition.
```{r, filename="R/apa_t_pair.R"}
apa_t_pair <- function() {
}
```
Then add the code from your example script above inside the curly brackets.
```{r, filename="R/apa_t_pair.R"}
apa_t_pair <- function() {
t_results <- t.test(self_res_att$f_self,
self_res_att$f_non,
paired = TRUE)
template <- "A paired-samples t-test was conducted to compare {dv} between {level1} (M = {mean1}, SD = {sd1}) and {level2} (M = {mean2}, SD = {sd2}). There was a {non}significant difference; t({df}) = {t_value}, p = {p_value}."
glue::glue(
template,
dv = "preferences for female faces",
level1 = "participants who resembled those faces",
level2 = "non-self participants",
mean1 = round0(mean(self_res_att$f_self), 1),
sd1 = round0(sd(self_res_att$f_self), 1),
mean2 = round0(mean(self_res_att$f_non), 1),
sd2 = round0(sd(self_res_att$f_non), 1),
non = ifelse(t_results$p.value < .05, "", "non-"),
df = round0(t_results$parameter, 0),
t_value = round0(t_results$statistic, 2),
p_value = round0(t_results$p.value, 3)
)
}
```
Skip a few lines and copy the `round0` function definition below this one. You don't have to define functions in any particular order, as long as all function definitions are run before you try to use them. Run all the code in this file and test that the function works by running it once in the console.
```{r, filename="run in the console"}
apa_t_pair()
```
::: {.callout-warning}
If you get the message: `Error in apa_t_pair() : could not find function "apa_t_pair"`, this means you didn't run the code that created the function.
:::
### Add Arguments
We probably want this function to work for any pair of vectors we give it, not just the value of `f_self` and `f_non`. So we need to add `r glossary("argument", "arguments")` to the function. Add arguments `x` and `y` to the function and replace `self_res_att$f_self` with `x` and `self_res_att$f_self` with `y` everywhere in the function.
```{r, filename="R/apa_t_pair.R"}
apa_t_pair <- function(x, y) {
t_results <- t.test(x, y, paired = TRUE)
template <- "A paired-samples t-test was conducted to compare {dv} between {level1} (M = {mean1}, SD = {sd1}) and {level2} (M = {mean2}, SD = {sd2}). There was a {non}significant difference; t({df}) = {t_value}, p = {p_value}."
glue::glue(
template,
dv = "preferences for female faces",
level1 = "participants who resembled those faces",
level2 = "non-self participants",
mean1 = round0(mean(x), 1),
sd1 = round0(sd(x), 1),
mean2 = round0(mean(y), 1),
sd2 = round0(sd(y), 1),
non = ifelse(t_results$p.value < .05, "", "non-"),
df = round0(t_results$parameter, 0),
t_value = round0(t_results$statistic,2),
p_value = round0(t_results$p.value, 3)
)
}
```
Now, if you try to run the function without any arguments, you'll get an error message. This is because there are no default values for `x` and `y`.
```{r, error = TRUE}
apa_t_pair()
```
This also won't work.
```{r, error = TRUE}
x <- self_res_att$f_self
y <- self_res_att$f_non
apa_t_pair()
```
This is because the `x` and `y` inside of the function are in a different `r glossary("environment")` to any `x` and `y` outside of the function. This can seem confusing at first, but it's good that you don't need to worry about objects that exist outside of your function.
You can specify the vectors as arguments.
```{r}
apa_t_pair(x = self_res_att$f_self,
y = self_res_att$f_non)
```
### Further Arguments
Now we can further customise our function. You probably won't always be comparing "preferences for female faces" between "participants who resembled those faces" and "non-self participants", so let's add three new arguments to the function. We can set generic `r glossary("default value", "default values")` for these new arguments so that you don't have to specify them if the defaults are OK.
Since the values are defined with the variable names used in the glue template, we don't need to specify those in the `glue()` function anymore.
```{r, filename="R/apa_t_pair.R"}
apa_t_pair <- function(x, y,
dv = "the DV",
level1 = "level 1",
level2 = "level 2") {
t_results <- t.test(x, y, paired = TRUE)
template <- "A paired-samples t-test was conducted to compare {dv} between {level1} (M = {mean1}, SD = {sd1}) and {level2} (M = {mean2}, SD = {sd2}). There was a {non}significant difference; t({df}) = {t_value}, p = {p_value}."
glue::glue(
template,
mean1 = round0(mean(x), 1),
sd1 = round0(sd(x), 1),
mean2 = round0(mean(y), 1),
sd2 = round0(sd(y), 1),
non = ifelse(t_results$p.value < .05, "", "non-"),
df = round0(t_results$parameter, 0),
t_value = round0(t_results$statistic,2),
p_value = round0(t_results$p.value, 3)
)
}
```
Try running the function both with and without the new arguments.
```{r}
apa_t_pair(x = self_res_att$f_self,
y = self_res_att$f_non)
```
```{r}
apa_t_pair(x = self_res_att$f_self,
y = self_res_att$f_non,
dv = "preferences for female faces",
level1 = "participants who resembled those faces",
level2 = "non-self participants")
```
::: {.callout-warning}
If the output doesn't change, this usually means that you forgot to run the code that re-defined `apa_t_pair`.
:::
## Load in package
### Import dependencies
You need to "import" any non-base packages that you use in a function. These are called "dependencies" because your function depends on them. Our function above uses `glue()` from the glue package. The function below is a quick way to add a dependency.
```{r, eval = FALSE, filename="run in the console"}
usethis::use_package("glue")
```
You should see this output:
::: {.cell-output}
```
✔ Setting active project to '/Users/lisad/rproj/demopkg'
✔ Adding 'glue' to Imports field in DESCRIPTION
• Refer to functions with `glue::fun()`
```
:::
This means that you should always use the full form `glue::glue()`, rather than loading a package with the `library()` function and using just the function name `glue()`.
You can open the `DESCRIPTION` file to see what has changed. Alternatively, you can manually add dependencies to this file under "Imports:".
### Load
Now restart R and make sure that your environment is clear. Run the following code to load your package. You can also use a keyboard shortcut to run this function (Mac: cmd-shift-L, Windows: ctl-shift-L).
```{r, eval = FALSE, filename="run in the console"}
devtools::load_all(".")
```
The function should now be available.
```{r}
apa_t_pair(x = self_res_att$f_self,
y = self_res_att$f_non,
dv = "preferences for female faces",
level1 = "participants who resembled those faces",
level2 = "non-self participants")
```
And it should be easy to adapt for other pairs of values, such as the equivalent analysis for male faces.
```{r}
apa_t_pair(x = self_res_att$m_self,
y = self_res_att$m_non,
dv = "preferences for male faces",
level1 = "participants who resembled those faces",
level2 = "non-self participants")
```
### Check
Run the CMD check using `devtools::check()` or by clicking the Check icon in the Build tab. There will be a lot of grey output text, and hopefully a lot of green checkmarks. But at the end, you'll probably get a message like this:
::: {.cell-output}
```
❯ checking R code for possible problems ... NOTE
apa_t_pair: no visible global function definition for ‘t.test’
apa_t_pair: no visible global function definition for ‘sd’
Undefined global functions or variables:
sd t.test
Consider adding
importFrom("stats", "sd", "t.test")
to your NAMESPACE file.
0 errors ✔ | 0 warnings ✔ | 1 note ✖
R CMD check succeeded
```
:::
The `no visible global function definition` note means that we've used some functions that aren't from our own package and we haven't specified what package they are from. `t.test` and `sd` are functions from the `stats` package, which is automatically loaded when you start up R, but still needs to be added as a dependency.
You can add the `stats::` prefix to `t.test` and `sd` and add the dependency using `usethis::use_package("stats")`.
::: {.callout-note}
The part that suggests you add `importFrom("stats", "sd", "t.test")` to the NAMESAPCE file is a way for you to add specific functions from another package to your package so you can use them without specifying the package name first. However, that specific instruction should be ignored because you should never edit the NAMESPACE file yourself. Instead use roxygen to set this up, which will be explained in @sec-imports.
:::
### Install
When you're developing a package, you usually "load" it using `devtools::load_all(".")` to be able to access all of the functions in the package for testing and development. This way, the development package is only available during the current session in this project. You will load it every time you make some changes to the package.
If you want your package to be available outside of project sessions where you've explicitly loaded it, you need to **install** the package using `devtools::install()` or the Install button in the Build pane.
After your package is installed, make sure the environment is clear, and try the following code:
```{r, include=FALSE}
rm(round0)
```
```{r, error = TRUE}
round0(1, 3)
```
This is because `round0` is an internal, non-exported function, so only the developer (you) is supposed to be able to use it. Technically, you can also access internal functions using the triple-colon.
```{r}
demopkg:::round0(1, 3)
```
## Glossary
```{r, echo = FALSE, results='asis'}
glossary_table()
```
## Further Resources
* [Functions](https://adv-r.hadley.nz/functions.html) from @advanced-r
## Further Practice
1. Add an argument called `alpha` that allows the user to set an alpha criterion for determining significance. Make the default value 0.05.
1. Edit the function to handle p-values < .001.
1. Add 95% confidence intervals to the output.
1. Allow the user to set a custom confidence interval. Give this a sensible default value.
1. Create another function to produce the text for a different analysis you're familiar with in R, such as an ANOVA or correlation.