---
title: "R fundamentals"
author: "Tobias Gerstenberg, adapted by Bria Long"
date: "January 10th, 2022, adapted Sept, 2025"
format:
  html:
    toc: true
    toc-depth: 4
    theme: cosmo
    highlight-style: tango
    df-print: kable
execute:
  cache: true
---


In [2]:
library("knitr") # for rendering the RMarkdown file
library("skimr") # for visualizing data
library("visdat") # for visualizing data
library("DT") # for visualizing data
library("tidyverse") # for data wrangling


── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.4     [32m✔[39m [34mreadr    [39m 2.1.5
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.2
[32m✔[39m [34mggplot2  [39m 4.0.0     [32m✔[39m [34mtibble   [39m 3.3.0
[32m✔[39m [34mlubridate[39m 1.9.4     [32m✔[39m [34mtidyr    [39m 1.3.1
[32m✔[39m [34mpurrr    [39m 1.1.0     
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors


In [3]:
name <- c("numeric", "character", "logical", "not available")
example <- c(
  "`1`, `3`, `48`",
  "`'Steve'`, `'a'`, `'78'`",
  "`TRUE`, `FALSE`", "`NA`"
)
kable(
  x = tibble(name, example),
  caption = "Most commonly used variable modes in R.",
  align = c("r", "l"),
  booktabs = TRUE
)




Table: Most commonly used variable modes in R.

|          name|example                  |
|-------------:|:------------------------|
|       numeric|`1`, `3`, `48`           |
|     character|`'Steve'`, `'a'`, `'78'` |
|       logical|`TRUE`, `FALSE`          |
| not available|`NA`                     |

# Data wrangling 1

This notebook takes a look at how to wrangle data using the [dplyr](https://ggplot2.dplyr.org/) package. The nice thing about R is that (thanks to the `tidyverse`), both visualization and data wrangling are particularly powerful. Many analysis pipelines use both Python and R. However, I often like to use R for data visualization and wrangling for tabular data, which is common in experimental psychology.


## Learning goals

- Review R basics (incl. variable modes, data types, operators, control flow, and functions).
- Learn how the pipe operator `%>%` works.
- See different ways for getting a sense of one's data.
- Master key data manipulation verbs from the `dplyr` package (incl. `filter()`, `arrange()`, `rename()`, `relocate()`, `select()`, `mutate()`) as well as the helper functions `across()` and `where()`.


## Some R basics

To test your knowledge of the R basics, I recommend taking the free interactive tutorial on datacamp: [Introduction to R](https://www.datacamp.com/courses/free-introduction-to-r). Here, I will just give a very quick overview of some of the basics.

### Modes

Variables in R can have different modes. Table \@ref(tab:variable-modes) shows the most common ones.


In [None]:
name <- c("numeric", "character", "logical", "not available")
example <- c(
      "`1`, `3`, `48`",
      "`'Steve'`, `'a'`, `'78'`",
      "`TRUE`, `FALSE`", "`NA`"
)
kable(
      x = tibble(name, example),
      caption = "Most commonly used variable modes in R.",
      align = c("r", "l"),
      booktabs = TRUE
)




Table: Most commonly used variable modes in R.

|          name|example                  |
|-------------:|:------------------------|
|       numeric|`1`, `3`, `48`           |
|     character|`'Steve'`, `'a'`, `'78'` |
|       logical|`TRUE`, `FALSE`          |
| not available|`NA`                     |