# Basic concepts in R

## The R language
R has it's own programming language. R works by you writing lines of code in that language (writing commands) and R interpreting that code (running commands).

R (and RStudio) has a limited user interface meaning almost all functionality (statistics, plots, simulations etc.) must be executed using code in the R language.

### R as a calculator
So what does it mean that R interprets our code?
It means that you tell R to do something by writing a command and R will do that (if R can understand you).

R, for example, understands mathematical expressions:

In [1]:
7 * 6

In [2]:
912 - 132

## Using R scripts

Script files are text files containing code that R can interpret.

It is your "analysis recipe" showing what you have done as well as allowing you to re-run commands easily.

Always make a habit of writing your commands into a script, when you have the command figured out.

- `#` can be used for comments (skipped when run)
- `Ctrl` + `Enter`: Runs the current line or selection
- `Ctrl` + `Alt` + `R`: Runs the whole script

*NOTE!* There is no undo in R. When a code is executed, the change has been made. The only way to undo is to re-run previous code to get back to an earlier stage. This is what scripts are used for.

## The R Language: Objects and Functions

R works by storing values and information in "objects". These objects can then be used in various commands like calculating a statistical model, saving a file, creating a graph and so on. To simplify a bit: An object is some kind of stored information and a function is something that can manipulate that stored information (which then creates a new object). 

Most of R can be boiled down to these 3 basic steps:

1. Assign values to an object
2. Make sure R interprets the object correctly (its class)
3. Perfom some operation or manipulation on the object using a function

Translated to data analysis, the steps would (in general terms) look as follows:

1. Load our dataset: `dataset <- read.csv("my_datafile.csv")`
2. Check the that the variables are the correct class: `class(dataset$age)`
3. Perform some kind of analysis: `mean(dataset$age)`

The gap between these steps of course vary greatly.

## Objects

A lot of writing in R is about defining objects: A name to use to call up stored information.

Objects can be a lot of things: 
- a word
- a number
- a series of numbers
- a dataset 
- a URL
- a formula
- a result 
- a filepath
- a series of datasets
- and so on...

When an object is defined, it is available in the current working space (or environment).

This makes it possible to store and work with a variety of informaiton simultaneously.

### Defining objects
Objects are defined using the `<-` operator (`Alt` + `-`):

In [3]:
year <- 1964

In [4]:
year

When defined the object can be used like any other numeric value.

In [5]:
year + 10

Notice that R differentiates between lower- and upper-case letters:

In [6]:
Year # Does not exist

ERROR: Error in eval(expr, envir, enclos): objekt 'Year' blev ikke fundet


Using `' '` or `" "` denotes that the input should be read as text. *This also applies to numbers!*

In [7]:
name <-  "keenan"

In [8]:
name

In [9]:
year_now  <- '2021'

In [10]:
year_now

Notice that numbers stored as text will be enclosed in quotes. Numbers stored as text cannot immediately be used as numbers:

In [11]:
year_now - 5

ERROR: Error in year_now - 5: non-numeric argument to binary operator


This error happens because R differentiates between objects by assigning them to a specific *class*. The class denotes what is possible with the object.

### Naming objects
Objects can be named almost anything but a good rule of thumb is to use names that are indicative of what the object contains.

#### Restrictions for naming objects
- Most special characters not allowed: `/`, `?`, `*`, `+` and so on (most characters mean something to R and will be read as an expression)
- Already existing names in R (will overwrite the function/object in the environment)

#### Good naming conventions 
- Using '`_`': `my_object`, `room_number`

or:

- Capitalize each word except the first: `myObject`, `roomNumber`

## Classes in R

R differentiates between objects via the "class" of the object. The class determines what operations are possible.

The function `class()` is used to check the class of an object:

In [12]:
name = "keenan"
year = 1964

In [13]:
class(name)

In [14]:
class(year)

### Class coercion

In most cases, R can coerce values from one class to another. When doing this, values that are incompatible with the class are coded to missing (`NA`) so beware!

Values can be coerced to character values with `as.character()`

Values can be coerved to numeric values with `as.numeric()`

In [15]:
as.character(year)

In [16]:
as.numeric(name)

"NAs introduced by coercion"


## Booleans / logical values

"booleans" or "logical values" are values that are either `TRUE` or `FALSE`.

A number of operations in R always return a logical value:

- `>`
- `>=`
- `<`
- `<=`
- `==`
- `!=`

In [44]:
42 > 10

In [45]:
10 != 10

Some functions also return a logical value:

In [46]:
startsWith("R", "potato")

## Functions

Functions are commands used to transform an object in some way and give an output.

The input to a function is an "arguement". The number of arguements vary between function.

Functions have the basic syntax: `function(arg1, arg2, arg3)`.

Some arguements are required while others are optional.

In [17]:
name <- 'kilmister'
toupper(name) #Returns the object in upper-case

Most functions take the object as the first input but not all.

In [18]:
gsub("e", "a", name) #Replace all e's with a's

### Functions and their outputs

Note that functions almost *never* change the object. When calling functions you are asking R for a specific output but not to change anything.

Output of a function have to be stored in objects, if to be stored

In [19]:
name # Unchanged even though used in objects

In [20]:
name <- gsub("e", "a", name) # Store object with characters "e" swapped with "a" - replacing the object

In [21]:
name

## R Libraries - Packages 

R being open source means that a lot of developers are constantly adding new functions to R.
These new functions are distributed as *R packages* that can be loaded into the R library.

All the commands you have been using so far have been part of the `base` package (ships with R). 

Packages are installed using (name of package *with* quotes!): 

`install.packages('packagename')` 

The functions from the package is loaded into the environment using (name of package *without* quotes!):
    
`library(packagename)` 

Information for installed packages can be found using (name of package *with* quotes!):

`library(help = 'packagename')` 

In [22]:
ymd('2021-02-04')

ERROR: Error in ymd("2021-02-04"): could not find function "ymd"


In [23]:
library(lubridate)
ymd('2021-02-04')


Vedhæfter pakke: 'lubridate'


De følgende objekter er maskerede fra 'package:base':

    date, intersect, setdiff, union




## R Objects: Vectors

A "vector" is a basic data structure in R. A vector can be considered a seris of values of the same class.

Vectors are created using `c()`:

In [24]:
names  <- c('araya', 'keenan', 'townsend')
years <- c(1961, 1964, 1972)

In [25]:
print(names)
print(years)

[1] "araya"    "keenan"   "townsend"
[1] 1961 1964 1972


In [26]:
mean(years)

Notice that vectors can only store values of the same type/class.

When trying to combine different types in a vector, R will coerce all values to a type compatible with all values (if possible)

In [27]:
names_years <- c('araya', 1961, 'keenan', 1964)

In [28]:
names_years # Notice the numbers are now converted to text

Vectors can only contain values of the same class. The `class()` function therefore works on vectors too.

In [29]:
class(names_years)

### Types of vectors

There are six types of vectors: logical, integer, double, character, complex, and raw.

The types primarily used for data analysis are: logical, integer, double, character.

"integer" and "double" are both referred to as *numeric vectors* (whole number and decimal point, respectively).

The type of vector can be examined with either `typeof` or `class`:

In [30]:
print(class(names))
print(class(years))

[1] "character"
[1] "numeric"


In [31]:
print(typeof(years))

[1] "double"


## R Objects: Data Frames

A "data frame" is the R-equivalent of a spreadsheet (a table of rows and columns). It is one of the most useful storage structures for data analysis in R.

R has some sample datasets that can be loaded in with the `data()` command. `mtcars` is one of such sample datasets:

In [32]:
data(mtcars)

### Data frames and vectors

Data frames are essentially a collection of same length vectors.

R treats single columns (or variables) as "vectors". 

One refers to a single column in a data frame with `$` (a vector).

In [33]:
head(mtcars$mpg) # First six values of yrbrn variable

Each value in a vector is assigned an index refering to the position of the value in the vector (starts from 1).

A vector is indexed using `[]`:

In [34]:
mtcars$mpg[10] # Returns the 10th value (row 10) of the yrbrn variable

In [35]:
mtcars$mpg[2:10] # Returns value 2-10 of the yrbn variable (both inclusive)

A range of useful functions exist for calculating descriptive measures for a vector; fx `mean()`, `min()`, `max()` and `length()`.

In [36]:
min(mtcars$mpg) # Returns smallest value
max(mtcars$mpg) # Returns largest value
mean(mtcars$mpg) # Returns mean value
length(mtcars$mpg) # Returns number of values in the vector (corresponding to the number of rows)

`unique()` returns the unique values in a vector (useful for getting familiar with a variable):

In [37]:
unique(mtcars$gear)

### Useful operations and functions on vectors
Below are some examples of different commands to interact with vectors.

| Code   | Description |
|:-------|:------------|
|`my_vec[-3]` | Everything but the 3rd element |
|`my_vec[c(1,4)]` | The 1st and 4th element |
|`my_vec[c(2:4)]` | The elements from index 2 to 4 |
|`length(my_vec)` | The number of elements |
|`sort(my_vec)` | Sorts the elements in ascending order |
|`sum(my_vec)` | The sum of the vector elements (numeric) |
|`mean(my_vec)`| The mean of the vector elements (numeric) |
|`min(my_vec)` | The vector element with the lowest value (numeric) |
|`max(my_vec)` | The vector element with the highest value (numeric)

## R Objects: Lists

Lists are - simply put - collections of other R objects. This means that lists can be a collection of all kinds of objects regardless of class, type or data structure.

Lists are created using `list()`.

Lists are used in a variety of ways. It is for example the default data structure for any hierarchical data (like JSON). Some functions also returns outputs as list, because it returns several kinds of output (like a model that returns various estimates and input parameters).

Lists can also be used for iteration by repeating the same commands across each entry in a list (like performing the same data handling operations on each data frame in a list).

In [38]:
a_list <- list(42, "keenan", c(9, 3, 2))

Lists are indexed using `[]` for the list element and `[[]]` for the content of the list element:

In [39]:
a_list[3] # Returns element 3 - a list of length 1

In [41]:
class(a_list[3])

In [40]:
a_list[[3]] # Returns the content of element 3 - a vector of values 9, 3, 2

In [42]:
class(a_list[[3]])

It is important to remember the distinction between the list element and the content. A good mnemonic device is to think of a list as a train ([source](https://adv-r.hadley.nz/subsetting.html)):

![listtrain](https://d33wubrfki0l68.cloudfront.net/1f648d451974f0ed313347b78ba653891cf59b21/8185b/diagrams/subsetting/train.png)

By selecting the element using `[]`, you are asking for the train cart but by using `[[]]`, you are asking for the content of the train cart ([source](https://adv-r.hadley.nz/subsetting.html)):

![listtraincart](https://d33wubrfki0l68.cloudfront.net/aea9600956ff6fbbc29d8bd49124cca46c5cb95c/28eaa/diagrams/subsetting/train-single.png)

## Using the help function

All R functions and commands are thoroughly documented so you do not have to remember what every function does or even how it should be written.

Every function and command in R has its own help file. The help file describes how to use the various functions and commands.

The help file for a specific function is accessed using the operator `?`