This first part in the Introduction to R series will introduce you to using the R interpreter (which works very similarly to this Jupyter notebook), data types, and parts of the R syntax.

## Vectors, arithmetic operations and functions
Arithmetic operators work just as you learnt to use them on a calculator, including precedence of operators and brackets. Some examples are below.

In [15]:
100 / 2 - 2 ^ 3 + 5 * 2

In [16]:
100 / (2 - 2 ^ 3) + 5 * 2

If you want to perform the same operations on more than one number, vectors are there to help. In R, a vector will always contain more than one element of the same type (typically numeric values, but there are character, complex, logical and integer vectors too). Vectors can be built "manually" using the concatenation function c() or they can be read from files.
To store the elements of a vector in a variable (called x below), just assign the vector's elements to x with the <- assignment operator.

In [2]:
x <- c(1, 1, 2, 3, 5, 8, 13)

In [3]:
print(x)

[1]  1  1  2  3  5  8 13


In [21]:
x * 2 - 1

Operations can be carried out between vectors too:

In [12]:
y <- c(3, 1, 4, 1, 5, 9, 2)

In [25]:
x + y

There are other ways of defining a vector. Numerical sequences can be generated using the seq() function. These can come in handy when timepoints have to be defined to examine a physical process over some period.  
seq() takes at least two numbers as arguments: the lower and upper bounds of an interval. The R interpreter will then generate a sequence of numbers between the two limits using a step size of 1. If a third number is provided, this will be considered as a step size.

In [8]:
t <- seq(0, 01)
print(t)

[1] 0 1


In [10]:
t <- seq(0, 1, 0.1)
print(t)

 [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0


As mentioned before, vectors can contain logical values too (i.e. TRUE or FALSE). These values can represent states (e.g. a patient having type 2 diabetes) or can be a result of evaluating a logical expression, as shown below. We want to find the negative values in the difference between vectors x and y (defined earlier).

In [14]:
print(x - y)
negatives <- (x - y) < 0
print(negatives)

[1] -2  0 -2  2  0 -1 11
[1]  TRUE FALSE  TRUE FALSE FALSE  TRUE FALSE


There is a special data type in R, called NaN (not a number) and another one called NA (not available). These two don't refer to a specific value, but rather represent an undefined value (e.g. 0/0 would evaluate to NaN) or a missing value (in the case of NA).

In [15]:
x <- c(1, 2, 3, 0/0)
print(x)

[1]   1   2   3 NaN


The R interpreter always does what it is being told to do, therefore if we want to perform operations on a vector that contains NaN or NA values, those operations will likely return NaN or NA as a result. R needs to be told to skip NaN or NA values from operations. The example below shows that the average of the numbers defined in vector x cannot be determined if we also consider the NaN element. Once we remove that, we get the average of 1, 2, and 3.

In [16]:
mean(x)

In [17]:
mean(x, na.rm=TRUE)

It is sometimes necessary to refer to a __specific element__ or a __portion__ of a vector as part of a subset calculation. This can be done by indexing. R allows indexing of vectors using square brackets ([]) that take one number of a logical expression as an input. For instance, the following line will return the second element of vector x:

In [18]:
x[2]

While this line will return all elements between element 2 and element 3 (so just elements 2 and 3 in this case).

In [19]:
x[2:3]

Indexing can also be used to __exclude__ elements from a vector by prefixing the index term with a minus sign.

In [20]:
x[-2]

A more complicated way to refer to vector elements is by constructing logical expression. Let's say we want to have only the numeric elements that are greater than 1. It may be tempting to try this:

In [21]:
x[x > 1]

... but it will return the NA element too, which we didn't want. So we'd have to "filter" that one out, by combining the greater than condition with a function that only returns indices of non-NA elements. Since we want to have the two conditions to be true simultaneously, we use the & operator which corresponds to logical __and__. Logical __or__ can be achieved with | and logical __negation__ is represented by !

In [22]:
x[!is.na(x) & x > 1]

In [43]:
x[!is.na(x) & (x == 1 | x == 3)]

In addition to arithmetic and logical operations, several mathematical functions are available in R: min, max, log, exp, sin, cos, tan all work in the usual way.

In [23]:
print(y)
max(y)

[1] 3 1 4 1 5 9 2


There are two special functions that can tell more about a vector. length() provides the number of elements in a vector, while range() returns the minimum and maximum values of a vector - corresponding to ```c(min(y), max(y))```.

In [24]:
length(y)
range(y)

### Factors - categorical variable

In [58]:
steatosisGrade <- c("S0", "S0", "S2", "S1", "S2", "S0", "S1", "S3", "S1", "S2", "S3")

In [59]:
length(steatosisGrade)

In [60]:
steatosisGradeF <- factor(steatosisGrade)

In [61]:
levels(steatosisGradeF)

In [62]:
fatFraction <- c(2.3, 3.0, 45.3, 12.5, 35.6, 0.6, 7.8, 73.7, 13.3, 40.5, 89.0)

In [64]:
tapply(fatFraction, steatosisGradeF, mean)