# Law, Bias, and Algorithms
## Introduction to `R` and `dplyr` (1/2)

In [1]:
# Some initial setup
options(digits = 3)

── Attaching packages ─────────────────────────────────────── tidyverse 1.2.1 ──
✔ ggplot2 2.2.1     ✔ purrr   0.2.5
✔ tibble  1.4.2     ✔ dplyr   0.7.5
✔ tidyr   0.8.1     ✔ stringr 1.3.0
✔ readr   1.1.1     ✔ forcats 0.3.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
“cannot open compressed file '../data/sqf_sample.rds', probable reason 'No such file or directory'”

ERROR: Error in gzfile(file, "rb"): cannot open the connection


### `R` basics

#### Assignment

The convention for assigning values to variables in `R` is an arrow (`<-`), where the direction of the arrow indicates the direction of assignment.
For example, if we want to assign the value `12` to a variable named `A`,

In [5]:
A <- 12   # This works
print(A)  # ... and this statement shows us ("prints") the value currently assigned to A
12 -> A   # So does this
print(A)

[1] 12
[1] 12


The more "standard" assignment using equal sign (`=`) also works, but _only for assignment to the left_. In other words

In [6]:
A = 12  # This works

In [7]:
12 = A  # But this doesn't!

ERROR: Error in 12 = A: invalid (do_set) left-hand side to assignment


#### Vectors

The native unit for variables in `R` is a vector. For example, the `A` variable we created above is actually a _vector_ of length 1.
We can create vectors of longer length by `c`ombining multiple values together.

In [8]:
X <- c(1, 2, 3)
print(X)
Y <- c("this", "that", "those")
print(Y)

[1] 1 2 3
[1] "this"  "that"  "those"


A `seq`uence of numbers can be created using `seq(from, to, by = 1)`.
In other words, there is a function called `seq()` which takes three arguments, each named `from`, `to`, and `by`. 
The last argument (`by`) is optional, and will be set to `by = 1` if not supplied. 
For example,

In [9]:
seq(1, 5)  # Creates a sequence of 1 to 5

In [10]:
seq(1, 5, 2)  # Creates a sequence of 1 to 5, but in steps of 2

Since sequences in steps of 1 are created quite often, `R` provides a short-hand notation in the form `from:to`. 
For example,

In [11]:
1:5  # Short-hand notation for generating a sequence of 1 to 5, in increments of 1

Use square braces (`[]`) to index a vector (the first element is at index `1`, _not_ `0`)

In [13]:
X <- c(10, 11, 12, 13)
X[1]

Note that you _can_ index a value that is larger than the length of the vector. 
`R` will NOT fail, but return a special value called `NA`.

In [15]:
X[500000]

In `R`, a _negative_ index is used to _exclude_ elements.

In [16]:
X[-1]  # This will return all but the first element of X

A vector can also be used to index multiple elements of another vector.
For example, if you want the second and fourth elements of `X`,

In [17]:
ind <- c(2, 4)  # A vector that we create for the sole purpose of indexing another vector, X
X[ind]          # We get the second and fourth elements of X, because ind = (2, 4)

#### Vector operations

Since a vector is a native unit of variables in `R`, many operations are "vectorized", meaning that they work on units of vectors.
Basic math operations are done element-wise.

In [29]:
A <- c(1, 2)
B <- c(6, 2)

A + B  # == c(1 + 6, 2 + 2)

In [30]:
A - B  # == c(1 - 6, 2 - 2)

In [31]:
A * B  # == c(1 * 6, 2 * 2)

In [33]:
B / A  # == c(6 / 1, 2 / 2)

In [34]:
B^2  # == c(6*6, 2*2)

Comparisons are also done element-wise

In [37]:
A == B  # == c(1 == 6, 2 == 2)

Note the double equal sign (`==`) for comparing equality! (One equal sign would be assignment.)

In [38]:
A < B  # == c(1 < 6, 2 < 2)

There are many functions in `R` that operate on units of vectors. 
Some examples are:

In [70]:
X <- c(0.1, 1, 10, 100)
log(X)  # Element-wise log

In [71]:
exp(X)  # Element-wise exponential

In [72]:
sqrt(X)  # Element-wise square-root

In [73]:
mean(X)  # Mean

In [74]:
sd(X)  # (Sample) standard deviation

In [75]:
var(X)  # (Sample) variance

In [76]:
max(X)  # Maximum value

In [77]:
min(X)  # Minimum value

In [78]:
median(X)  # Median value

In [79]:
sum(X)  # Sum of all values

In [80]:
prod(X)  # Product of all values

In [81]:
quantile(X, probs = c(.1, .5, .9))  # Quantile at specified probs

In [83]:
length(X)  # Length of vector

#### Strings

Use the `paste` function to concatenate two or more strings. 
Numerical values are automatically converted to strings.

In [86]:
paste("One plus one equals", 1 + 1, ".")

The `paste()` function has an optional `sep` argument, which you can use to specify how the different strings are `paste`d together.

In [88]:
paste("One plus one", 1 + 1, sep = " = ")

If you're familiar with `C`-style formatting, there is a `sprintf()` function, which literally calls the system `sprintf` `C`-library.

In [99]:
sprintf("One plus one = %d, and e = %.3f", 1 + 1, exp(1))

### Packages

`R` packages can be installed using the `install.packages()` function.
For example, to install the `tidyverse` package (which will be used primarily in this course) you can run:
`install.packages("tidyverse")`

This is like installing a piece of software, and only needs to be done once on any machine.

Once a package is installed, it can be "loaded" into the current environment with the `library` function.
For example, to load the `tidyverse` package, run

In [102]:
library("tidyverse")

Unlike `install.packages`, this needs to be done whenever you're on a new session/environment.