# Basics

## Working Directory

`library(here)` - Package that helps with correct pathing relative to root project directory

| Function | Description |
|:---------|:------------|
| `getwd()` | Get working directory | 
| `setwd(PATH)` | Set working directory | 


## Resources
* [knitr Documentation](https://yihui.org/knitr/options/)
* [What they forgot to teach you about R](https://rstats.wtf/)
* [Reproducible Research in R](https://r-cubed.rostools.org/)
* [Environments for R renv](https://www.rstudio.com/blog/renv-project-environments-for-r/)
* [R for Data Science](https://r4ds.had.co.nz/index.html)

## General

- Use <- for assignment
- indexing starts from 1, not 0
- copy-on-modify
- laze evaluation--only when accessed

### What version of packages am I using?

```r
install.packages("devtools")
devtools::session_info() 
```

### Basic functions

```r
# import package
library(package)

# import from script
source("PATH/TO/script.R")

# return type of x
typeof(x)

# print information of x
## try using it on a data frame 
str(x)

# type testing, returns boolean 
is.logical(x)
is.integer(x)
is.double(x)
is.character(x)

# type casting
as.logical(x)
as.integer(x)
as.double(x)
as.character(x)
as_tibble(x)

# length
length()

# create data frame
data <- data.frame(col1 = c(1, 2, 3), 
                   col2 = c(1, 2, 3))
```

### `for` loops

```r
# for each item in vector
for (item in sequence) {
    # action to perform every iteration
}

# by index 
for (i in seq_along(sequence)) {
    # action to perform every iteration
}
```

### `if` statements

```r
if (condition1) {
    # then ...
} else if (condition2) {
    # then ...
} else {
    # all other cases
}
```

### Writing custom functions

```r
# named functions
f <- function(param1, param2, default_param1 = 0) {
       
    # exceptions
    if (condition)
        stop("message describing error")
    
    # content of function
    
    # return if necessary
    return (...)
}

# anonymous functions (for mapping etc.)
function(x) body_of_function
```

### Switching between name as string or object using {{ }} 

```r
# using enquo() and !!
f <- function(col, val) {
    col <- enquo(col)
    filter(data, !!col == val)
}

# using {{ }} 
f <- function(col, val) {
    filter(data, {{ col }} == val)
}

# 
```

### Writing tests

``` r
library(testthat)

test_that("MSG TO PRINT WHEN TEST FAILS", {
    # test1
    # test2
    .
    .
    .
})
```

| statement | test |
|-----------|-------------|
| `expect_identical` | are two objects exactly equal? |
| `expect_equal` | are two objects nearly identical (within tolerance)? | 
| `expect_equivalent` | are two objects nearly identical (within tolerance)? (ignores attributes) |
| `expect_error` | does it raise an error? |
| `expect_warning` | does it raise a warning? |
| `expect_output` | is the output what it's supposed to be? |
| `expect_true` | does it evaluate to TRUE? |  |
| `expect_false` | does it evaluate to FALSE?

### Ignoring errors

```r
try({
    # continue trying, even if it fails midway
})
```

## Data structures

### Date and Time

```r
library(lubridate)

# current date
today()

# current date and time
now()

# converting string or numeric to datetime object
ymd("2017-01-31")
mdy("January 31st, 2017")
dmy(31012017)

ymd_hms("2017-01-31 20:11:59")
mdy_hm("01/31/2017 08:01", tz = "UTC")

# combine columns to make single date column 
data |> 
    mutate(date = make_date(year, month, day))

# extract information
year(datetime)
month(datetime)
mday(datetime)
yday(datetime)
wday(datetime, label = TRUE, abbr = FALSE)
```

### Strings

```r
library(stringr)
library(tidyr)


# is pattern in x?
str_detect(x, "pattern")

# what are the actual values that contain the pattern?
str_subset(x, "pattern")

# split string on delimiter
# returns a list
str_split(x, " ")

# split string on delimiter into a preset container 
# n = width of container 
str_split_fixed(x, pattern = " ", n = 2)

# splitting strings in a column into separate columns in data frame

data |> 
    separate(unsplit_col, into= c("col1", "col2"), sep = " ")

# length of each string
str_length(x)

# substrings 
str_sub(x, 1, 3)

# substrings for assignment
str_sub(x, 1, 3) <- "ABC"

# collapse a character vector of length n > 1 into a single string
char_vector |> 
    str_c(collapse = "-")

# concatenating mulitple vectors
str_c(vec1, vec2, sep = " ")

# for concatenating in data frames
data |> 
    unite("combined_col", col1, col2, sep = " ")

# replace a pattern
str_replace(x, "pattern", "replacement")

# replace NA values
str_replace_na(x, "pattern", "replacement")

# replace NA in data frames
replace_na()

# filtering rows containing string
data |>
    filter(str_detect(col, "pattern"))

# replacing rows containing string
data |> 
    mutate(col = str_replace(col,
                             "pattern",
                             "replacement"))

# extracts first any search match
# returns a character vector 
str_extract(sentences, search)

# extracts all any search matches
# returns a list 
str_extract_all(sentences, search)

# capture groups of regex
nou <- "(a|the) ([^ ]+)"
str_match(sentences, noun) 
```

### Factors

Useful for data visualization

```r
library(forcats)

levels(col)
nlevels(col)

# drop unused levels
data$col |>
    fct_drop()

# changing order of factors
## by frequency 
data |>
    mutate(col = fct_infreq(col))

## by reverse frequency
data |> 
    mutate(col = fct_infreq(col),
           col = fct_rev(col))

# based on median of another variable / column
fct_reorder(reorder_col, by_col)

# based on min of another variable / column
fct_reorder(reorder_col, by_col, min)

# bring specific factors to front of order
data$col |>
    fct_relevel("level1", "level2")

```

## Reading / Writing data files

- Look at the data file first, to choose appropriate function arguments. 
  - Does it have headers?
  - Does it have index names? 
  - Does it have meta data? 
  - What delimiter does it use?

### For plain text files

```r
library(readr)

# Default 
data <- read_csv("PATH/TO/FILE/data.csv")

# Skip rows in the beginning (meta data)
data <- read_csv("PATH/TO/FILE.csv", skip = 2)

# If the file provide column names
data <- read_csv("PATH/TO/FILE.csv", col_names = TRUE)


# Skip rows in the end (meta data)
data <- read_csv("PATH/TO/FILE.csv", n_max = 196)

# Specifying delimiter 
data <- read_delim("PATH/TO/FILE.tsv", 
                   delim = "\t")

# Reading from URL / website
data <- read_csv("https://URL.com/data.csv")

```

### For Microsoft Excel files

```r
library(readxl)

# Default
data <- read_excel("PATH/TO/FILE/data.xlsx")

# Specify sheet inside Excel file
data <- read_excel("PATH/TO/FILE/data.xlsx",
                  sheet = "SHEET-NAME")

# Reading from URL / website
## You cannot read directly from URL, as with .csv files
url <- "https://URL.com/data.csv"
download.file(url, "data.csv")

data <- read_excel("data.csv")

## If on Windows
download.file(url, "data.csv", mode = "wb")

```

### Writing files

```r
write_csv(data, "data/data.csv"
```