# Law, Order, and Algorithms
## Introduction to `R`

In [5]:
# Some initial setup
options(digits = 3)

### `R` basics

#### Assignment

The convention for assigning values to variables in `R` is an arrow (`<-`), where the direction of the arrow indicates the direction of assignment.
For example, if we want to assign the value `12` to a variable named `A`,

In [6]:
A <- 12   # This works
print(A)  # ... and this statement shows us ("prints") the value currently assigned to A
12 -> A   # So does this
print(A)

[1] 12
[1] 12


The more "standard" assignment using equal sign (`=`) also works, but _only for assignment to the left_. In other words

In [7]:
A = 12  # This works

In [8]:
12 = A  # But this doesn't!

#### Vectors

The native unit for variables in `R` is a vector. For example, the `A` variable we created above is actually a _vector_ of length 1.
We can create vectors of longer length by `c`ombining multiple values together.

In [9]:
X <- c(1, 2, 3)
print(X)
Y <- c("this", "that", "those")
print(Y)

[1] 1 2 3
[1] "this"  "that"  "those"


A `seq`uence of numbers can be created using `seq(from, to, by = 1)`.
In other words, there is a function called `seq()` which takes three arguments, each named `from`, `to`, and `by`. 
The last argument (`by`) is optional, and will be set to `by = 1` if not supplied. 
For example,

In [10]:
seq(1, 5)  # Creates a sequence of 1 to 5

In [11]:
seq(1, 5, 2)  # Creates a sequence of 1 to 5, but in steps of 2

Since sequences in steps of 1 are created quite often, `R` provides a short-hand notation in the form `from:to`. 
For example,

In [12]:
1:5  # Short-hand notation for generating a sequence of 1 to 5, in increments of 1

Use square braces (`[]`) to index a vector (the first element is at index `1`, _not_ `0`)

In [13]:
X <- c(10, 11, 12, 13)
X[1]

Note that you _can_ index a value that is larger than the length of the vector. 
`R` will NOT fail, but return a special value called `NA`.

In [14]:
X[500000]

In `R`, a _negative_ index is used to _exclude_ elements.

In [15]:
X[-1]  # This will return all but the first element of X

A vector can also be used to index multiple elements of another vector.
For example, if you want the second and fourth elements of `X`,

In [16]:
ind <- c(2, 4)  # A vector that we create for the sole purpose of indexing another vector, X
X[ind]          # We get the second and fourth elements of X, because ind = (2, 4)

#### Exercise: vector
Create a sequence of numbers from 5 to 10, and then select the numbers 6, 7, and 8 from this sequence.

In [17]:
# WRITE CODE HERE
# START solution
x <- 5:10
x[c(2,3,4)]
# END solution

#### Vector operations

Vector are a native data structure in `R`, and many operations are "vectorized", meaning that they work directly on vectors.
Basic math operations are done element-wise.

In [18]:
A <- c(1, 2)
B <- c(6, 2)

A + B  # == c(1 + 6, 2 + 2)

In [19]:
A - B  # == c(1 - 6, 2 - 2)

In [20]:
A * B  # == c(1 * 6, 2 * 2)

In [21]:
B / A  # == c(6 / 1, 2 / 2)

In [22]:
B^2  # == c(6*6, 2*2)

Comparisons are also done element-wise

In [23]:
A == B  # == c(1 == 6, 2 == 2)

Note the double equal sign (`==`) for comparing equality! (One equal sign would be assignment.)

In [24]:
A < B  # == c(1 < 6, 2 < 2)

There are many functions in `R` that operate on units of vectors. 
Some examples are:

In [25]:
X <- c(0.1, 1, 10, 100)
log(X)  # Element-wise log

In [26]:
exp(X)  # Element-wise exponential

In [27]:
sqrt(X)  # Element-wise square-root

In [28]:
mean(X)  # Mean

In [29]:
sd(X)  # (Sample) standard deviation

In [30]:
var(X)  # (Sample) variance

In [31]:
max(X)  # Maximum value

In [32]:
min(X)  # Minimum value

In [33]:
median(X)  # Median value

In [34]:
sum(X)  # Sum of all values

In [35]:
prod(X)  # Product of all values

In [36]:
quantile(X, probs = c(.1, .5, .9))  # Quantile at specified probs

In [37]:
length(X)  # Length of vector

#### Exercise: vector operations

Generate a sequence of 1,000 random numbers between 0 and 1, and calculate their
1. mean
2. variance
3. 25%, 50%, and 75% quantile

Hint: you can use `runif(n)` to generate n random numbers between 0 and 1.

In [38]:
# WRITE CODE HERE
# START solution
x <- runif(1000)

# mean
mean(x)

# variance
var(x)

# 25%, 50%, and 75% quantile
quantile(x, probs = c(.25, .5, .75))

# END solution

#### Strings

Use the `paste` function to concatenate two or more strings. 
Numerical values are automatically converted to strings.

In [39]:
paste("One plus one equals", 1 + 1, ".")

The `paste()` function has an optional `sep` argument, which you can use to specify how the different strings are `paste`d together.

In [40]:
paste("One plus one", 1 + 1, sep = " = ")

Similar to `sep`, you can also use the optional `collapse` argument to concatenate a vector of strings instead of having them as individual arguments.

In [41]:
my_strings <- c("one", "plus", "one", "equals", "two")

paste(my_strings, collapse=" ")

If you're familiar with `C`-style formatting, there is a `sprintf()` function, which literally calls the system `sprintf` `C`-library.

In [42]:
sprintf("One plus one = %d, and e = %.3f", 1 + 1, exp(1))

#### Exercise: string operations

Suppose you are given a vector of strings denoting items that you have. Write R code to turn this vector into a English sentence in the form of "I have x, y, and z". Pay extra attention to the "and" at the end.

Example input: `c("one apple", "two pears", "three bananas")`

Example output: `I have one apple, two pears, and three bananas.`

In [43]:
my_items <- c("a can of Coke", "a bottle of Pepsi", "a glass of water")

# WRITE CODE HERE
# START solution
l <- length(my_items)

items_str <- paste(my_items[1:(l-1)], collapse = ", ")

paste("I have ", items_str, ", and ",my_items[l], sep="")

# or equivalently
paste0("I have ", items_str, ", and ",my_items[l])

# END solution

### Packages

`R` packages can be installed using the `install.packages()` function.
For example, to install the `tidyverse` package (which will be used primarily in this course) you can run:
`install.packages("tidyverse")`

This is like installing a piece of software, and only needs to be done once on any machine.

Once a package is installed, it can be "loaded" into the current environment with the `library` function.
For example, to load the `tidyverse` package, run

In [44]:
library("tidyverse")

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.0 ──
[32m✔[39m [34mggplot2[39m 3.3.2     [32m✔[39m [34mpurrr  [39m 0.3.4
[32m✔[39m [34mtibble [39m 3.0.3     [32m✔[39m [34mdplyr  [39m 1.0.0
[32m✔[39m [34mtidyr  [39m 1.1.0     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mreadr  [39m 1.3.1     [32m✔[39m [34mforcats[39m 0.4.0
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()


Unlike `install.packages`, this needs to be done whenever you're on a new session/environment.