# R

## Resources
* [knitr Documentation](https://yihui.org/knitr/options/)
* [What they forgot to teach you about R](https://rstats.wtf/)
* [Reproducible Research in R](https://r-cubed.rostools.org/)
* [Environments for R renv](https://www.rstudio.com/blog/renv-project-environments-for-r/)
* [R for Data Science](https://r4ds.had.co.nz/index.html)

## General

- Use <- for assignment
- indexing starts from 1, not 0
- copy-on-modify
- laze evaluation--only when accessed

### What version of packages am I using?

```r
install.packages("devtools")
devtools::session_info() 
```

### Basic functions

```r
# import package
library(package)

# import from script
source("PATH/TO/script.R")

# return type of x
typeof(x)

# print information of x
## try using it on a data frame 
str(x)

# type testing, returns boolean 
is.logical(x)
is.integer(x)
is.double(x)
is.character(x)

# type casting
as.logical(x)
as.integer(x)
as.double(x)
as.character(x)
as_tibble(x)

# length
length()

# create data frame
data <- data.frame(col1 = c(1, 2, 3), 
                   col2 = c(1, 2, 3))
```

### `for` loops

```r
# for each item in vector
for (item in sequence) {
    # action to perform every iteration
}

# by index 
for (i in seq_along(sequence)) {
    # action to perform every iteration
}
```

### `if` statements

```r
if (condition1) {
    # then ...
} else if (condition2) {
    # then ...
} else {
    # all other cases
}
```

### Writing custom functions

```r
# named functions
f <- function(param1, param2, default_param1 = 0) {
       
    # exceptions
    if (condition)
        stop("message describing error")
    
    # content of function
    
    # return if necessary
    return (...)
}

# anonymous functions (for mapping etc.)
function(x) body_of_function
```

### Switching between name as string or object (column)

```r
# using enquo() and !!
f <- function(col, val) {
    col <- enquo(col)
    filter(data, !!col == val)
}

# using {{ }} 
f <- function(col, val) {
    filter(data, {{ col }} == val)
}

# 
```

### Writing tests

``` r
library(testthat)

test_that("MSG TO PRINT WHEN TEST FAILS", {
    # test1
    # test2
    .
    .
    .
})
```

| statement | test |
|-----------|-------------|
| `expect_identical` | are two objects exactly equal? |
| `expect_equal` | are two objects nearly identical (within tolerance)? | 
| `expect_equivalent` | are two objects nearly identical (within tolerance)? (ignores attributes) |
| `expect_error` | does it raise an error? |
| `expect_warning` | does it raise a warning? |
| `expect_output` | is the output what it's supposed to be? |
| `expect_true` | does it evaluate to TRUE? |  |
| `expect_false` | does it evaluate to FALSE?

### Ignoring errors

```r
try({
    # continue trying, even if it fails midway
})
```

## Data structures

### Date and Time

```r
library(lubridate)

# current date
today()

# current date and time
now()

# converting string or numeric to datetime object
ymd("2017-01-31")
mdy("January 31st, 2017")
dmy(31012017)

ymd_hms("2017-01-31 20:11:59")
mdy_hm("01/31/2017 08:01", tz = "UTC")

# combine columns to make single date column 
data |> 
    mutate(date = make_date(year, month, day))

# extract information
year(datetime)
month(datetime)
mday(datetime)
yday(datetime)
wday(datetime, label = TRUE, abbr = FALSE)
```

### Strings

```r
library(stringr)
library(tidyr)


# is pattern in x?
str_detect(x, "pattern")

# what are the actual values that contain the pattern?
str_subset(x, "pattern")

# split string on delimiter
# returns a list
str_split(x, " ")

# split string on delimiter into a preset container 
# n = width of container 
str_split_fixed(x, pattern = " ", n = 2)

# splitting strings in a column into separate columns in data frame

data |> 
    separate(unsplit_col, into= c("col1", "col2"), sep = " ")

# length of each string
str_length(x)

# substrings 
str_sub(x, 1, 3)

# substrings for assignment
str_sub(x, 1, 3) <- "ABC"

# collapse a character vector of length n > 1 into a single string
char_vector |> 
    str_c(collapse = "-")

# concatenating mulitple vectors
str_c(vec1, vec2, sep = " ")

# for concatenating in data frames
data |> 
    unite("combined_col", col1, col2, sep = " ")

# replace a pattern
str_replace(x, "pattern", "replacement")

# replace NA values
str_replace_na(x, "pattern", "replacement")

# replace NA in data frames
replace_na()

# filtering rows containing string
data |>
    filter(str_detect(col, "pattern"))

# replacing rows containing string
data |> 
    mutate(col = str_replace(col,
                             "pattern",
                             "replacement"))

# extracts first any search match
# returns a character vector 
str_extract(sentences, search)

# extracts all any search matches
# returns a list 
str_extract_all(sentences, search)

# capture groups of regex
nou <- "(a|the) ([^ ]+)"
str_match(sentences, noun) 
```

### Factors

Useful for data visualization

```r
library(forcats)

levels(col)
nlevels(col)

# drop unused levels
data$col |>
    fct_drop()

# changing order of factors
## by frequency 
data |>
    mutate(col = fct_infreq(col))

## by reverse frequency
data |> 
    mutate(col = fct_infreq(col),
           col = fct_rev(col))

# based on median of another variable / column
fct_reorder(reorder_col, by_col)

# based on min of another variable / column
fct_reorder(reorder_col, by_col, min)

# bring specific factors to front of order
data$col |>
    fct_relevel("level1", "level2")

```

## Reading / Writing data files

- Look at the data file first, to choose appropriate function arguments. 
  - Does it have headers?
  - Does it have index names? 
  - Does it have meta data? 
  - What delimiter does it use?

### For plain text files

```r
library(readr)

# Default 
data <- read_csv("PATH/TO/FILE/data.csv")

# Skip rows in the beginning (meta data)
data <- read_csv("PATH/TO/FILE.csv", skip = 2)

# If the file provide column names
data <- read_csv("PATH/TO/FILE.csv", col_names = TRUE)


# Skip rows in the end (meta data)
data <- read_csv("PATH/TO/FILE.csv", n_max = 196)

# Specifying delimiter 
data <- read_delim("PATH/TO/FILE.tsv", 
                   delim = "\t")

# Reading from URL / website
data <- read_csv("https://URL.com/data.csv")

```

### For Microsoft Excel files

```r
library(readxl)

# Default
data <- read_excel("PATH/TO/FILE/data.xlsx")

# Specify sheet inside Excel file
data <- read_excel("PATH/TO/FILE/data.xlsx",
                  sheet = "SHEET-NAME")

# Reading from URL / website
## You cannot read directly from URL, as with .csv files
url <- "https://URL.com/data.csv"
download.file(url, "data.csv")

data <- read_excel("data.csv")

## If on Windows
download.file(url, "data.csv", mode = "wb")

```

### Writing files

```r
write_csv(data, "data/data.csv"
```

## Wrangling single data frame

### Cleaning column names

```r
# Manually renaming column by column
data <- rename(data, col1 = `Column One`,
                     col2 = `Column Two`,
                     col3 = `Column Three`)

# Using `janitor` package
library(janitor)

data <- clean_names(data)
```

### Subsetting data frame

```r
# COLUMNS 

# can be used to reorder columns
select(data, col1, col2, col3)

# subset a range of columns
select(data, col1:col3)

# select all
data |> 
    select() |>
    everything()

# extract column as vector
data |>
    pull(col1)

# ROWS

# subset rows based on condition
filter(data, col1 < 10)

# multiple conditions (AND)
filter(data, col1 == 10, col2 > 50)

# one of multiple conditions (OR)
filter(data, col1 ==10 | col2 > 50)

# if one of the cases
filter(data, col1 %in% c("case1", "case2", "case3"))


# select rows by index
data |> 
    slice(1:10)

```

```r
# for rows and columns simultaneously
```

| Operator | Example | Description |
|----------|---------|-------------|
| \[ | `data[1:10, ]` | rows 1-10, all columns |
| \[ | `data[1:10]` | columns 1-10 | 
| \[\[ | `data[[1]]` | column 1 as vector
| \$ | `data$col` | column as vector | 

```r
# logical indexing
# select rows that meet conditions, for all columns
data[data$col1 == 5, ]
```

### Add new variables / columns

```r
# by assignment
data$col1 <- data$col2 + 10 

# by mutation
# multiple columns can be mutated at a time
data |> 
    mutate(new_col1 = col1 * col2,
           new_col2 = col1 + col2)

# changing column in place
data |> 
    mutate(col1 = round(col1, 0))
```

### Selective changes to values

If you're matching strings, make sure to convert any factor columns into character vectors.

```r
data |>
    mutate(col = case_when(col == "case1" ~ "replacement1",
                           col == "case2" ~ "replacement2",
                           TRUE ~ col) # keep all other cases as is 
```

### Mapping

```r
library(purrr)

map_*(data, function)

```

|  | List | Atomic | Same Type | Nothing | 
|---|-----|--------|-----------|---------|
| One argument | map() | map_lgl() | modify() | walk() | 
| Two arguments | map2() | map2_lgl() | modify2() | walk2() | 
| One argument + index | imap() | imap_lgl() | imodify() | iwalk() | 
| N arguments | pmap() | pmap_lgl() | - | pwalk() | 

Source: [Advanced R](https://adv-r.hadley.nz/) by Hadley Wickham

```r
lapply()
apply()
tapply()
integrate()
optim()

```

### Sort

```r
# Sort column in ascending order 
data |> 
    arrange(col1)

# Sort column in descending order
data |> 
    arrange(desc(col1))
```

### Pivoting

```r
# split values into separate rows
data |>
    pivot_longer(`col1`:`col2`, names_to = "new_col1", values_to = "new_col2")

# split values into separate columns
data |>
    pivot_wider(names_from = col1, values_from = col2)
```

### Dealing with NA

```r
# for NA in just selected columns
data |> drop_na(col1:col2)

# for all affected rows
data |> drop_na()
```

### Summarizing data 

```r
# summaries on all rows 
data |>
    summarise(summary_col1 = func(col1),
              summary_col2 = mean(col2),
              summary_col3 = sum(col3))

# summaries by groups
data |>
    group_by(group_col) |>
    summarise(summary_col = mean(col))

# summaries by multiple hierarchies of groups
data |>
    group_by(group_col) |>
    summarise(summary_col = mean(col))

# nesting
```

## Wrangling multiple data frames

### Binding (not very safe)

```r
bind_rows(df1, df2, df3)
bind_cols(df1, df2)

```

### Joining

[joining cheatsheet](https://stat545.com/join-cheatsheet.html)

| Wanted Data | Column Format | Join Function |
| ----------- | ------------- | ------------- |
| common values in x & y | combined columns of x & y | `inner_join(x,y)` |
| common values in x & y | format of x | `semi_join(x,y)` |
| common values in x & y | format of y | `semi_join(y,x)` |
| x with additional info from y | modified format of x |`left_join(x,y)` |
| y with additional info from x | modified format of y | `left_join(y,x)` |
| unique values in x but not in y | format of x | `anti_join(x,y)` |
| unique values in y but not in x | format of y | `anti_join(y,x)` |
| all values from x & y | all columns from x & y | `full_join(x,y)` |

## Documentation

###  R Markdown

In R Markdown, you can name code chunks and add options. 

`{r name-of-code-chunk options=TRUE}`

Options can be set globally for all chunks at the beginning of document. Local options override the global options.

``` r
# Put this inside the first r code chunk. 
# Write options in (...), separated with commas.

knitr::opts_chunk$set(...)
```

| Option | Default | Description |
|--------|---------|-------------|
| <code>eval</code> | TRUE | evaluate code and display results |
| <code>echo</code> | TRUE | display code and results; when false, hides code and only shows output |
| <code>warning</code> | TRUE | display warnings |
| <code>error</code> | FALSE | display errors |
| <code>message</code> | TRUE | display messages |
| <code>tidy</code> | FALSE | reformat code in tidy way |
| <code>cache</code> | FALSE | cache results for future renders |
| <code>fig.width</code> | TRUE | set width of plot |
| <code>fig.height</code> | TRUE | set height of plot | 

More options can be found in [R Markdown Reference Guide](https://www.rstudio.com/wp-content/uploads/2015/03/rmarkdown-reference.pdf).


- supports inline evaluated code
- Supports LaTex math equationsb

### roxygen2

```r
#' Description of function
#'
#' @param x A short description of x
#' @param y A short description of y
#'
#' @return What the function returns
#'
#' @examples
#' f(1, 2)
```
f <- function(x, y) {
    
    # content
    
    return (value)

}

### Citing R 

In [1]:
citation()


To cite R in publications use:

  R Core Team (2022). R: A language and environment for statistical
  computing. R Foundation for Statistical Computing, Vienna, Austria.
  URL https://www.R-project.org/.

A BibTeX entry for LaTeX users is

  @Manual{,
    title = {R: A Language and Environment for Statistical Computing},
    author = {{R Core Team}},
    organization = {R Foundation for Statistical Computing},
    address = {Vienna, Austria},
    year = {2022},
    url = {https://www.R-project.org/},
  }

We have invested a lot of time and effort in creating R, please cite it
when using it for data analysis. See also ‘citation("pkgname")’ for
citing R packages.
