-
Notifications
You must be signed in to change notification settings - Fork 4
/
02-repro.Rmd
696 lines (476 loc) · 31.9 KB
/
02-repro.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
# Reproducible Workflows {#repro}
<div class="right meme"><img src="images/memes/repro_reports.jpg"
alt="Top left: young spongebob; top right: Using Base R for your analysis and copy pasting your results into tables in Word; middle left: older angry spongebob in workout clothes; middle right: learning how to use dplyr visualize data with ggplot2 and report your analysis in rmarkdown documents; bottom left: muscular spongebob shirtless in a boxing ring; bottom right: wielding the entire might of the tidyverse (with 50 hex stickers)" /></div>
## Learning Objectives {#ilo-repro}
1. Organise a [project](#projects) [(video)](https://youtu.be/y-KiPueC9xw ){class="video"}
2. Create and compile an [Rmarkdown document](#rmarkdown) [(video)](https://youtu.be/EqJiAlJAl8Y ){class="video"}
3. Edit the [YAML header](#yaml) to add table of contents and other options
4. Include a [table](#repro-tables)
5. Include a [figure](#repro-figures)
6. Report the output of an analysis using [inline R](#inline-r)
7. Add a [bibliography](#bibliography) and [citations](#citations)
## Setup {#setup-repro}
You will be given instructions in Section\ \@ref(new-project) below to set up a new project where you will keep all of your class notes. Section\ \@ref(r-markdown) gives instructions to set up an R Markdown script for this chapter.
For reference, here are the packages we will use in this chapter.
```{r setup-repro, message=FALSE}
# packages needed for this chapter
library(tidyverse) # various data manipulation functions
library(knitr) # for table and image display
library(kableExtra) # for styling tables
library(papaja) # for APA-style tables
library(gt) # for fancy tables
library(DT) # for interactive tables
```
Download the [R Markdown Cheat Sheet](https://www.rstudio.org/links/r_markdown_cheat_sheet).
## Why use reproducible reports?
Have you ever worked on a report, creating a summary table for the demographics, making beautiful plots, getting the analysis just right, and copying all the relevant numbers into your manuscript, only to find out that you forgot to exclude a test run and have to redo everything?
A `r glossary("reproducibility", "reproducible")` report fixes this problem. Although this requires a bit of extra effort at the start, it will more than pay you back by allowing you to update your entire report with the push of a button whenever anything changes.
Studies also show that many, if not most, papers in the scientific literature have reporting errors. For example, more than half of over 250,000 psychology papers published between 1985 and 2013 have at least one value that is statistically incompatible, such as a p-value that is not possible given a t-value and degrees of freedom [@nuijten2016prevalence]. Reproducible reports help avoid transcription and rounding errors.
We will make reproducible reports following the principles of [literate programming](https://en.wikipedia.org/wiki/Literate_programming). The basic idea is to have the text of the report together in a single document along with the code needed to perform all analyses and generate the tables. The report is then "compiled" from the original format into some other, more portable format, such as HTML or PDF. This is different from traditional cutting and pasting approaches where, for instance, you create a graph in Microsoft Excel or a statistics program like SPSS and then paste it into Microsoft Word.
## Organising a project {#projects}
First, we need to get organised. `r glossary("project", "Projects")` in RStudio are a way to group all of the files you need for one project. Most projects include scripts, data files, and output files like the PDF report created by the script or images.
### File System
Modern computers tend to hide the file system from users, but we need to understand a little bit about how files are stored on your computer in order to get a script to find your data. Your computer's file system is like a big box (or `r glossary("directory")`) that contains both files and smaller boxes, or "subdirectories". You can specify the location of a file with its name and the names of all the directories it is inside.
For example, if Lisa is looking for a file called `report.Rmd`on their Desktop, they can specify the full file `r glossary("path")` like this: `/Users/lisad/Desktop/report.Rmd`, because the `Desktop` directory is inside the `lisad` directory, which is inside the `Users` directory, which is located at the base of the whole file system. If that file was on *your* desktop, you would probably have a different path unless your user directory is also called `lisad`. You can also use the `~` shortcut to represent the user directory of the person who is currently logged in, like this: `~/Desktop/report.Rmd`.
### Working Directory
Where should you put all of your files? You usually want to have all of your scripts and data files for a single project inside one folder on your computer, the `r glossary("working directory")` for that project. You can organise files in subdirectories inside this main project directory, such as putting all raw data files in a directory called `r path("data")` and saving any image files to a directory called `r path("images")`.
Your script should only reference files in three types of locations, using the appropriate format.
| Where | Example |
|-------|---------|
| on the web | "https://psyteachr.github.io/reprores-v3/data/5factor.xlsx" |
| in the working directory | "5factor.xlsx" |
| in a subdirectory | "data/5factor.xlsx" |
::: {.warning data-latex=""}
Never set or change your working directory in a script.
:::
R Markdown files will automatically use the same directory the .Rmd file is in as the working directory.
If your script needs a file in a subdirectory of your working directory, such as, `r path("data/5factor.xlsx")`, load it in using a `r glossary("relative path")` so that it is accessible if you move the working directory to another location or computer:
```{r read-csv, eval = FALSE}
dat <- read_csv("data/5factor.xlsx") # correct
```
Do not load it in using an `r glossary("absolute path")`:
```{r abs-path, eval = FALSE}
dat <- read_csv("C:/My Files/2020-2021/data/5factor.xlsx") # wrong
```
::: {.info data-latex=""}
Also note the convention of using forward slashes, unlike the Windows-specific convention of using backward slashes. This is to make references to files work for everyone, regardless of their operating system.
:::
### Naming Things {#naming-things}
Name files so that both people and computers can easily find things. Here are some important principles:
- file and directory names should only contain letters, numbers, dashes, and underscores, with a full stop (`.`) between the file name and `r glossary("extension")` (that means no spaces!)
- be consistent with capitalisation (set a rule to make it easy to remember, like always use lowercase)
- use underscores (`_`) to separate parts of the file name, and dashes (`-`) to separate words in a section
- name files with a pattern that alphabetises in a sensible order and makes it easy for you to find the file you're looking for
- prefix a filename with an underscore to move it to the top of the list, or prefix all files with numbers to control their order
- use YYYY-MM-DD format for dates so they sort in chronological order
For example, these file names are a mess:
- `r path("Data (Participants) 11-15.xls")`
- `r path("final report2.doc")`
- `r path("Participants Data Nov 12.xls")`
- `r path("project notes.txt")`
- `r path("Questionnaire Data November 15.xls")`
- `r path("report.doc")`
- `r path("report final.doc")`
Here is one way to structure them so that similar files have the same structure and it's easy for a human to scan the list or to use code to find relevant files. See if you can figure out what the missing one should be.
- `r path("_project-notes.txt")`
- `r path("data_participants_2021-11-12.xls")`
- `r path("data_participants_2021-11-15.xls")`
- `r mcq(c("questionnaire-data_2021-11-15.xls", "data-questionnaire-2021_11_15.xls", answer = "data_questionnaire_2021-11-15.xls", "data_2021-11-15_questionnaire.xls"))`
- `r path("report_v1.doc")`
- `r path("report_v2.doc")`
- `r path("report_v3.doc")`
::: {.try data-latex=""}
Think of other ways to name the files above. Look at some of your own project files and see what you can improve.
:::
### Start a Project {#new-project}
Now that we understand how the file system work and how to name things to make it easier for scripts to access them, we're ready to make our class project.
First, make a new `r glossary("directory")` where you will keep all of your materials for this class (I called mine `reprores-2022`). You can set this directory to be the default working directory under the General tab of the Global Options. This means that files will be saved here by default if you aren't working in a project.
::: {.warning data-latex=""}
It can sometimes cause problems if this directory is in OneDrive or if the full file path has special characters or is [more than 260 characters](http://handbook.datalad.org/en/latest/intro/filenaming.html){target='_blank'} on some Windows machines.
:::
Next, choose **`New Project...`** under the **`File`** menu to create a new project called `r path("reprores-class-notes")`. Make sure you save it inside the directory you just made. RStudio will restart itself and open with this new project directory as the working directory.
```{r img-new-proj, echo = FALSE, out.width = "32%", fig.show = "hold"}
#| fig.cap="Starting a new project."
include_graphics(c("images/repro/new_proj_1.png",
"images/repro/new_proj_2.png",
"images/repro/new_proj_3.png"))
```
Click on the Files tab in the lower right pane to see the contents of the project directory. You will see a file called `reprores-class-notes.Rproj`, which is a file that contains all of the project information.You can double-click on it to open up the project.
::: {.info data-latex=""}
Depending on your settings, you may also see a directory called `.Rproj.user`, which contains your specific user settings. You can ignore this and other "invisible" files that start with a full stop.
:::
## R Markdown
In this lesson, we will learn to make an R Markdown document with a table of contents, appropriate headers, code chunks, tables, images, inline R, and a bibliography.
::: {.info data-latex=""}
There is a new type of reproducible report format called [quarto](https://quarto.org/){target="_blank"} that is very similar to R Markdown. We won't be using quarto in this class because it has a few small differences that get confusing if you're learning both quarto and R Markdown at the same time, but you should be able to pick up quarto very easily once you've learned R Markdown.
:::
We will use `r glossary("R Markdown")` to create reproducible reports, which enables mixing of text and code. A reproducible script will contain sections of code in code blocks. A code block starts and ends with three backtick symbols in a row, with some information about the code between curly brackets, such as `{r chunk-name, echo=FALSE}` (this runs the code, but does not show the text of the code block in the compiled document). The text outside of code blocks is written in `r glossary("markdown")`, which is a way to specify formatting, such as headers, paragraphs, lists, bolding, and links.
```{embed, file = "demos/repro.Rmd"}
```
If you open up a new R Markdown file from a template, you will see an example document with several code blocks in it. To create an HTML or PDF report from an R Markdown (Rmd) document, you compile it. Compiling a document is called `r glossary("knit", "knitting")` in RStudio. There is a button that looks like a ball of yarn with needles through it that you click on to compile your file into a report.
::: {.try data-latex=""}
Create a new R Markdown file from the **`File > New File > R Markdown...`** menu. Change the title and author, save the file as `02-repro.Rmd`, then click the knit button to create an html file.
:::
### YAML Header {#yaml}
The `r glossary("YAML")` header is where you can set several options.
```
---
title: "My Demo Document"
author: "Me"
output:
html_document:
toc: true
toc_float:
collapsed: false
smooth_scroll: false
number_sections: false
---
```
::: {.info data-latex=""}
Try changing the values from `false` to `true` to see what the options do.
:::
The `df_print: kable` option prints data frames using `knitr::kable`. You'll learn below how to further customise tables.
The built-in bootswatch themes are: default, cerulean, cosmo, darkly, flatly, journal, lumen, paper, readable, sandstone, simplex, spacelab, united, and yeti. You can [view and download more themes](https://bootswatch.com/4/).
```{r img-bootswatch, echo=FALSE, fig.cap="Light themes in versions 3 and 4."}
knitr::include_graphics("images/repro/bootswatch.png")
```
### Setup
When you create a new R Markdown file in RStudio using the default template, a setup chunk is automatically created.
```{r knitr-setup, eval=FALSE, verbatim="r setup, include=FALSE"}
knitr::opts_chunk$set(echo = TRUE)
```
You can set more default options for code chunks here. See the [knitr options documentation](https://yihui.name/knitr/options/){target="_blank"} for explanations of the possible options.
```{r knitr-setup2, eval=FALSE, verbatim="r setup, include=FALSE"}
knitr::opts_chunk$set(
fig.width = 8,
fig.height = 5,
fig.path = 'images/',
echo = FALSE,
warning = TRUE,
message = FALSE,
cache = FALSE
)
```
The code above sets the following options:
* `fig.width = 8` : default figure width is 8 inches (you can change this for individual figures)
* `fig.height = 5` : default figure height is 5 inches
* `fig.path = 'images/'` : figures are saved in the directory "images"
* `echo = FALSE` : do not show code chunks in the rendered document
* `warning = FALSE` : do not show any function warnings
* `message = FALSE` : do not show any function messages
* `cache = FALSE` : run all the code to create all of the images and objects each time you knit (set to `TRUE` if you have time-consuming code)
Find a list of the current chunk options by typing `r hl(str(knitr::opts_chunk$get()))` in the console.
You can also add the packages you need in this chunk using `r hl(library())`. Often when you are working on a script, you will realize that you need to load another add-on package. Don't bury the call to `library(package_I_need)` way down in the script. Put it in the top, so the user has an overview of what packages are needed.
::: {.try data-latex=""}
We'll be using function from the package `r pkg("tidyverse")`, so load that in your setup chunk.
:::
### Structure {#structure}
If you include a table of contents (`toc`), it is created from your document headers. Headers in `r glossary("markdown")` are created by prefacing the header title with one or more hashes (`#`).
Use the following structure when developing your own analysis scripts:
* load in any add-on packages you need to use
* define any custom functions
* load or simulate the data you will be working with
* work with the data
* save anything you need to save
::: {.try data-latex=""}
Delete the default text and add some structure to your document by creating headers and subheaders. We're going to load some data, create a summary table, plot the data, and analyse it.
:::
### Code Chunks
You can include `r glossary("chunk", "code chunks")` that create and display images, tables, or computations to include in your text. Let's start by loading some data.
First, create a code chunk in your document. This code loads some data from the web.
```{r}
pets <- read_csv("https://psyteachr.github.io/reprores/data/pets.csv",
show_col_types = FALSE)
```
### Comments
You can add comments inside R chunks with the hash symbol (`#`). The R interpreter will ignore characters from the hash to the end of the line.
```{r}
# simulating new data
n <- nrow(pets) # the total number of pet
mu <- mean(pets$score) # the mean score for all pets
sd <- sd(pets$score) # the SD for score for all pets
simulated_scores <- rnorm(n, mu, sd)
```
It's usually good practice to start a code chunk with a comment that explains what you're doing there, especially if the code is not explained in the text of the report.
If you name your objects clearly, you often don't need to add clarifying comments. For example, if I'd named the three objects above `total_pet_n`, `mean_score` and `sd_score`, I would omit the comments.
Another use for comments is to "comment out" a section of code that you don't want to run, but also don't want to delete. For example, you might include the code used to install a package in your script, but you should always comment it out so the script doesn't force a lengthy installation every time it's run.
```{r}
# install.packages("dplyr")
# install.packages("tidyr")
# install.packages("ggplot2")
```
::: {.info data-latex=""}
You can comment or uncomment multiple lines at once by selecting the lines and typing Cmd-shift-C (Mac) or Ctrl-shift-C (Windows).
:::
It's a bit of an art to comment your code well. The best way to develop this skill is to read a lot of other people's code and have others review your code.
### In-line R {#inline-r}
Now let's analyse the pets data to see if cats are heavier than ferrets. First we'll run the analysis code. Then we'll save any numbers we might want to use in our manuscript to variables and round them appropriately. Finally, we'll use `r hl(glue::glue())` to format a results string.
```{r}
# analysis
cat_weight <- filter(pets, pet == "cat") %>% pull(weight)
ferret_weight <- filter(pets, pet == "ferret") %>% pull(weight)
weight_test <- t.test(cat_weight, ferret_weight)
# round individual values you want to report
t <- weight_test$statistic %>% round(2)
df <- weight_test$parameter %>% round(1)
p <- weight_test$p.value %>% round(3)
# handle p-values < .001
p_symbol <- ifelse(p < .001, "<", "=")
if (p < .001) p <- .001
# format the results string
weight_result <- glue::glue("t = {t}, df = {df}, p {p_symbol} {p}")
```
You can insert the results into a paragraph with inline R code that looks like this:
<pre><code>Cats were significantly heavier than ferrets (`r weight_result`).</code></pre>
**Rendered text:**
Cats were significantly heavier than ferrets (`r weight_result`).
### Tables {#repro-tables}
Next, create a code chunk where you want to display a table of the descriptives (e.g., Participants section of the Methods). We'll use tidyverse functions you will learn in the [data wrangling lectures](#tidyr) to create summary statistics for each group.
```{r}
summary_table <- pets %>%
group_by(pet) %>%
summarise(
n = n(),
mean_weight = mean(weight),
mean_score = mean(score)
)
# print
summary_table
```
::: {.warning data-latex=""}
The table above will print in tibble format in the interactive view, but will use the format from the `df_print` setting in the YAML header when you knit.
:::
The table above is OK, but it could be more reader-friendly by changing the column labels, rounding the means, and adding a caption. You can use `r hl(knitr::kable())` for this, or more specialised functions from other packages to format your tables.
<!-- Tab links -->
<div class="tab">
<button class="tablinks" onclick="openCity(event, 'knitr')">knitr</button>
<button class="tablinks" onclick="openCity(event, 'kableExtra')">kableExtra</button>
<button class="tablinks" onclick="openCity(event, 'papaja')">papaja</button>
<button class="tablinks" onclick="openCity(event, 'gt')">gt</button>
</div>
<!-- Tab content -->
<div id="knitr" class="tabcontent">
```{r kable-demo}
newnames <- c("Pet Type", "N", "Mean Weight", "Mean Score")
knitr::kable(summary_table,
digits = 2,
col.names = newnames,
caption = "Summary statistics for the pets dataset.")
```
</div>
<!-- Tab content -->
<div id="kableExtra" class="tabcontent">
The [kableExtra](https://haozhu233.github.io/kableExtra/awesome_table_in_html.html) package gives you a lot of flexibility with table display.
```{r kableExtra-demo, warning = FALSE}
library(kableExtra)
kable(summary_table,
digits = 2,
col.names = c("Pet Type", "N", "Weight", "Score"),
caption = "Summary statistics for the pets dataset.") |>
kable_classic() |>
kable_styling(full_width = FALSE, font_size = 20) |>
add_header_above(c(" " = 2, "Means" = 2)) |>
kableExtra::row_spec(2, bold = TRUE, background = "lightyellow")
```
</div>
<!-- Tab content -->
<div id="papaja" class="tabcontent">
[papaja](https://crsh.github.io/papaja_man/reporting.html#tables) helps you create APA-formatted manuscripts, including tables.
```{r papaja-demo}
papaja::apa_table(summary_table,
col.names = c("Pet Type", "N", "Weight", "Score"),
caption = "Summary statistics for the pets dataset.",
col_spanners = list("Means" = c(3, 4)))
```
</div>
<!-- Tab content -->
<div id="gt" class="tabcontent">
The [gt](https://gt.rstudio.com/index.html) package allows for even more customisation.
```{r gt-demo}
library(gt)
gt(summary_table, caption = "Summary statistics for the pets dataset.") |>
fmt_number(columns = c(mean_weight, mean_score),
decimals = 2) |>
cols_label(pet = "Pet Type",
n = "N",
mean_weight = "Weight",
mean_score = "Score") |>
tab_spanner(label = "Means",
columns = c(mean_weight, mean_score)) |>
opt_stylize(style = 6, color = "blue")
```
</div>
### Images {#repro-figures}
Next, create a code chunk where you want to display an image in your document. Let's put it in the Results section. We'll use some code that you'll learn more about in the [data visualisation lecture](#ggplot) to show violin-boxplots for the groups.
Notice how the figure caption is formatted in the chunk options.
```{r pet-plot, eval = FALSE, verbatim = 'r pet-plot, fig.cap="Figure 1. Scores by pet type and country."'}
#| fig.cap: "Figure 1. Scores by pet type and country."
ggplot(pets, aes(pet, score, fill = country)) +
geom_violin(alpha = 0.5) +
geom_boxplot(width = 0.25,
position = position_dodge(width = 0.9),
show.legend = FALSE) +
scale_fill_manual(values = c("orange", "dodgerblue")) +
labs(x = "", y = "Score") +
theme(text = element_text(size = 20, family = "Times"))
```
```{r pet-plot-out, echo = FALSE}
#| fig.cap: "Figure 1. Scores by pet type and country."
ggplot(pets, aes(pet, score, fill = country)) +
geom_violin(alpha = 0.5) +
geom_boxplot(width = 0.25,
position = position_dodge(width = 0.9),
show.legend = FALSE) +
scale_fill_manual(values = c("orange", "dodgerblue")) +
labs(x = "", y = "Score") +
theme(text = element_text(size = 20, family = "Times"))
```
::: {.info data-latex=""}
The last line changes the default text size and font, which can be useful for generating figures that meet a journal's requirements.
:::
You can also include images that you did not create in R using the typical markdown syntax for images:
``` md
![All the Things by [Hyperbole and a Half](http://hyperboleandahalf.blogspot.com/)](images/memes/x-all-the-things.png){style="width: 50%"}
```
![All the Things by [Hyperbole and a Half](http://hyperboleandahalf.blogspot.com/)](images/memes/x-all-the-things.png){style="width: 50%"}
### Linked documents
If you need to create longer reports with links between sections, you can edit the YAML to use an output format from the `r pkg("bookdown")` package. `bookdown::html_document2` is a useful one that adds figure and table numbers automatically to any figures or tables with a caption and allows you to link to these by reference.
To create links to tables and figures, you need to name the code chunk that created your figures or tables, and then call those names in your inline coding:
```{r, eval = FALSE, verbatim="r my-table"}
# table code here
```
```{r, eval = FALSE, verbatim="r my-figure"}
# figure code here
```
```
See Table\ \@ref(tab:my-table) or Figure\ \@ref(fig:my-figure).
```
::: {.warning data-latex=""}
The code chunk names can only contain letters, numbers and dashes. If they contain other characters like spaces or underscores, the referencing will not work.
:::
You can also link to different sections of your report by naming your headings with `{#}`:
```
# My first heading {#heading-1}
## My second heading {#heading-2}
See Section\ \@ref(heading-1) and Section\ \@ref(heading-2)
```
The code below shows how to link text to figures or tables in a full report using the built-in `diamonds` dataset - use your `reports.Rmd` to create this document now. You can see the [HTML output here](demos/html_document2.html).
`r hide("Linked document code")`
```{embed, file = "demos/html_document2.Rmd"}
# readLines("demos/html_document2.Rmd") |>
# paste(collapse = "\n") |>
# paste0("<code style='font-size: smaller;'><pre>\n", ., "\n</pre></code>") |>
# cat()
```
`r unhide()`
This format defaults to numbered sections, so set `number_sections: false` in the `r glossary("YAML")` header if you don't want this. If you remove numbered sections, links like `\@ref(conclusion)` will show "??", so you need to use URL link syntax instead, like this:
```
See the [last section](#conclusion) for concluding remarks.
```
## Bibliography
There are several ways to do in-text references and automatically generate a [bibliography](https://bookdown.org/yihui/rmarkdown-cookbook/bibliography.html){target="_blank"} in R Markdown. Markdown files need to link to a BibTex or JSON file (a plain text file with references in a specific format) that contains the references you need to cite. You specify the name of this file in the YAML header, like `bibliography: refs.bib` and cite references in text using an at symbol and a shortname, like `[@tidyverse]`. You can also include a Citation Style Language (.csl) file to format your references in, for example, APA style.
```
---
title: "My Paper"
author: "Me"
output:
html_document:
toc: true
bibliography: refs.bib
csl: apa.csl
```
### Converting from reference software
Most reference software like EndNote or Zotero has exporting options that can export to BibTeX format. You just need to check the shortnames in the resulting file.
::: {.warning data-latex=""}
Please start using a reference manager consistently through your research career. It will make your life so much easier. Zotero is probably the best one.
:::
1. If you don't already have one, set up a [Zotero](https://www.zotero.org/){target="_blank"} account
2. Add the [connector for your web browser](https://www.zotero.org/download/){target="_blank"} (if you're on a computer you can add browser extensions to)
3. Navigate to [Easing Into Open Science](https://doi.org/10.1525/collabra.18684){target="_blank"} and add this reference to your library with the browser connector
4. Go to your library and make a new collection called "Open Research" (click on the + icon after **`My Library`**)
5. Drag the reference to Easing Into Open Science into this collection
6. Export this collection as BibTex
```{r zotero, echo = FALSE}
#| fig.cap: Export a bibliography file from Zotero
knitr::include_graphics("images/repro/zotero.png")
```
The exported file should look like this:
```{embed, file = "demos/export-data.bib"}
```
### Creating a BibTeX File
You can also add references manually. In RStudio, go to **`File`** > **`New File...`** > **`Text File`** and save the file as "refs.bib".
Next, add the line `bibliography: refs.bib` to your YAML header.
### Adding references {#references}
You can add references to a journal article in the following format:
```
@article{shortname,
author = {Author One and Author Two and Author Three},
title = {Paper Title},
journal = {Journal Title},
volume = {vol},
number = {issue},
pages = {startpage--endpage},
year = {year},
doi = {doi}
}
```
See [A complete guide to the BibTeX format](https://www.bibtex.com/g/bibtex-format/){target="_blank"} for instructions on citing books, technical reports, and more.
You can get the reference for an R package using the functions `citation()` and `toBibtex()`. You can paste the bibtex entry into your bibliography.bib file. Make sure to add a short name (e.g., "ggplot2") before the first comma to refer to the reference.
```{r}
citation(package="ggplot2") %>% toBibtex()
```
[Google Scholar](https://scholar.google.com/) entries have a BibTeX citation option. This is usually the easiest way to get the relevant values if you can't add a citation through the Zotero browser connector, although you have to add the DOI yourself. You can keep the suggested shortname or change it to something that makes more sense to you.
```{r google-scholar, echo = FALSE, fig.cap = "Get BibTex citations from Google Scholar."}
knitr::include_graphics("images/present/google-scholar.png")
```
### Citing references {#citations}
You can cite references in text like this:
```
This tutorial uses several R packages [@tidyverse;@rmarkdown].
```
This tutorial uses several R packages [@tidyverse;@rmarkdown].
Put a minus in front of the @ if you just want the year:
```
Kathawalla and colleagues [-@kathawalla_easing_2021] explain how to introduce open research practices into your postgraduate studies.
```
Kathawalla and colleagues [-@kathawalla_easing_2021] explain how to introduce open research practices into your postgraduate studies.
### Uncited references
If you want to add an item to the reference section without citing, it, add it to the YAML header like this:
```
nocite: |
@kathawalla_easing_2021, @broman2018data, @nordmann2022data
```
Or add all of the items in the .bib file like this:
```
nocite: '@*'
```
### Citation Styles
You can search a [list of style files](https://www.zotero.org/styles){target="_blank"} for various journals and download a file that will format your bibliography for a specific journal's style. You'll need to add the line `csl: filename.csl` to your YAML header.
::: {.try data-latex=""}
Add some citations to your refs.bib file, reference them in your text, and render your manuscript to see the automatically generated reference section. Try a few different citation style files.
:::
### Reference Section
By default, the reference section is added to the end of the document. If you want to change the position (e.g., to add figures and tables after the references), include `<div id="refs"></div>` where you want the references.
::: {.try data-latex=""}
Add in-text citations and a reference list to your report.
:::
## Custom Templates {#custom-templates}
Some packages provide custom R Markdown templates. `reprores` has a Report template that shows all of the possible options in the YAML header, has bibliography and style files, and explains how to set up linked figures and tables. Because it contains multiple files, RStudio will ask you to create a new folder to keep all of the files in.
```{r img-custom-rmd, echo=FALSE, fig.cap="The custom R markdown template from reprores.", out.width = '75%'}
knitr::include_graphics("images/custom-rmd.png")
```
::: {.try data-latex=""}
Start a report with the Report template and knit it. Try changing or deleting options.
:::
## Glossary {#glossary-repro}
`r glossary_table()`
## Further Resources {#resources-repro}
* [Chapter 27: R Markdown](http://r4ds.had.co.nz/r-markdown.html) in *R for Data Science*
* [R Markdown Cheat Sheet](https://github.com/rstudio/cheatsheets/raw/master/rmarkdown.pdf)
* [R Markdown reference Guide](https://www.rstudio.com/wp-content/uploads/2015/03/rmarkdown-reference.pdf)
* [R Markdown Tutorial](https://rmarkdown.rstudio.com/lesson-1.html)
* [R Markdown: The Definitive Guide](https://bookdown.org/yihui/rmarkdown/) by Yihui Xie, J. J. Allaire, & Garrett Grolemund
* [Project Structure](https://slides.djnavarro.net/project-structure/) by Danielle Navarro
* [How to name files](https://speakerdeck.com/jennybc/how-to-name-files) by Jenny Bryan
* [Papaja](https://crsh.github.io/papaja_man/) Reproducible APA Manuscripts
* [The Turing Way](https://the-turing-way.netlify.app/)