<font size="6"><b>BASIC R FEATURES</b></font>

<font size="5"><b>Serhat Çevikel</b></font>

First of all, please watch this short video on how to use the modes and shortcuts in Jupyter notebooks:

[![The Data Incubator - Keyboard Shortcuts in Jupyter](https://img.youtube.com/vi/cuHY1o3Cf2s/0.jpg)](https://www.youtube.com/watch?v=cuHY1o3Cf2s&index=4&list=PLjDTd-bDo6Q3nnt7y_GjMaYD79-stYZ-O&t=0s)

# Understanding R

> “To understand computations in R, two slogans are helpful:
> 
> Everything that exists is an object.
>
> Everything that happens is a function call."
>
> — John Chambers

(http://adv-r.had.co.nz/Functions.html)

![image.png](attachment:c41cae36-d015-44f9-824d-745683ec2817.png)

# Data types

Atomic data structure in R is a vector. There is no separate scalar structure:

A numeric vector of size 1

In [None]:
class(1)

An integer vector of size 1

In [None]:
class(1L)

A character vector of size 1

In [None]:
class("a")

A logical vector of size 1

In [None]:
class(T)

Data types are checked with is.xxx() functions:

In [None]:
is.logical(T)
is.integer("a")

And converted with as.xxx() functions

In [None]:
1
as.character(1)

# Operators

Before going on to vectors in detail, let's cover basic operators in R:

## Arithmetic

4 operations:

In [None]:
1 + 1

In [None]:
2 / 1

In [None]:
4 - 3

In [None]:
5 * 3

Exponentiation

In [None]:
2^3

Modulo operator:

In [None]:
10 %% 3

Floor division operator:

In [None]:
10 %/% 3

## Logical

And operator

In [None]:
T & F

Or operator

In [None]:
T | F

Not operator

In [None]:
!T

Comparison operators:

In [None]:
4 > 3

In [None]:
3 >= 3

In [None]:
3 < 4

In [None]:
3 <= 3

In [None]:
3 == 4

In [None]:
3 != 4

# Objects, variables and assignments

Assignment is done with the "<-" operator:

In [None]:
var_1 <- 1

In [None]:
var_1

In [None]:
class(var_1)

When a variable is assigned into a new variable, any change to the objects, creates deep copies:

In [None]:
var_2 <- var_1

In [None]:
var_1 <- var_1 + 1

In [None]:
var_1

In [None]:
var_2

An object name cannot start with a number and cannot have a hypen

# Vectors

A vector is an R object holding values of the same type

A single value is also a vector

A vector is created with assignment (no declaration is needed)

c() concatenate function combines vectors into a single vector object:

In [None]:
var_3 <- c(1, 2, 3)

In [None]:
class(var_3)

In [None]:
var_4 <- c("a", "b", "c")
var_4

: colon operator creates a sequence of integers:

In [None]:
var_5 <- 1:3
var_5
class(var_5)

## Vector attributes

The attribute of a vector is names

In [None]:
var_6 <- 1:3
names(var_6) <- c("a", "b", "c")
var_6

In [None]:
attributes(var_6)

A vector does not have a dimension attribute (it is not 1-dimension):

In [None]:
dim(var_6)

But it has length:

In [None]:
length(var_6)

## Vector subsetting

A vector can be subset by:

- A vector of numeric indices
- A vector of logical values for inclusion
- A vector of character values for names

**VECTOR INDEXING IN R STARTS AT 1 NOT 0!**

In [None]:
var_6

In [None]:
var_6[2:3]

The logical vector must be of the same length as the subsetted vector (or will be recycled otherwise):

In [None]:
var_6[c(T, F, T)]

In [None]:
var_6[c("a", "b")]

Subsetting can be used for both retrieval and assignment

In [None]:
var_7 <- var_6
var_7

In [None]:
var_7[2] <- var_7[3]

In [None]:
var_7

Non-existent items of a vector are automatically created when they are assigned to:

In [None]:
var_7

In [None]:
var_7[5]

In [None]:
var_7[5] <- 10
var_7

Negative indices exclude those items:

In [None]:
var_7[-5]

## Vectorization

Many basic operations are vectorized in R: The operation is instantly repeated on every value of the vector object

In [None]:
1:10
1:10 + 1

With this feature at hand, the raw speed of R can match that of compiled C code

Vectorization is in fact, an implicit loop done at the C level (R is mostly written in C and Fortran)

## %in% operator

A set operator that checks membership. Returns logical values with length of the LHS

In [None]:
1:5 %in% 3:10

It is not symmetric

In [None]:
3:10 %in% 1:5

## rep() function

rep() function repeats given values in a vector to create a longer vector:

In [None]:
rep(1:3, 2)

In [None]:
rep(1:3, each = 2)

The return values are like collated or non-collated multi-set print-outs from a printer

# Special variables and values

## NA

Missing values. Each data type has its own NA method so NA can be used with any data type:

In [None]:
class(NA)

For integers:

In [None]:
var_14 <- 1:3
var_14
class(var_14)

In [None]:
var_14[2] <- NA
var_14

In [None]:
var_14[2]
class(var_14)

For characters:

In [None]:
var_15 <- c("a", "b", "c")
var_15
class(var_15)

In [None]:
var_15[2] <- NA
var_15

In [None]:
var_15[2]
class(var_15[2])

## NULL

NULL object is for a non-existent value, contrary to NA, which is an existing but missing value

In [None]:
length(NA)
length(NULL)

NA is a logical vector of length 1
NULL is not a vector, does not have a length and does not have a data type

In [None]:
class(NA)
class(NULL)

In [None]:
c(1, NA, 3)

In [None]:
c(1, NULL, 3)

## Inf, -Inf

Holds positive and negative infinite ($\infty$) values

In [None]:
var_16 <- Inf
var_16

In [None]:
Inf + 1e17

In [None]:
Inf / 1e22

In [None]:
-Inf + 1e22

Division by zero also creates an Inf value:

In [None]:
1 / 0

In [None]:
-1 / 0

Division by Inf creates 0 value:

In [None]:
3 / Inf

## NaN

Not a Number: Output of undefined operations on infinite numbers

Has numeric type

In [None]:
Inf - Inf

In [None]:
class(Inf - Inf)

# Functions

R is intended as a functional programming language:

> R, at its heart, is a functional programming (FP) language. This means that it provides many tools for the creation and manipulation of functions. In particular, R has what’s known as first class functions. You can do anything with functions that you can do with vectors: you can assign them to variables, store them in lists, pass them as arguments to other functions, create them inside functions, and even return them as the result of a function.

(http://adv-r.had.co.nz/Functional-programming.html)

Apart from built-in functions, new functions can be created with the "function()" function

These functions can be assigned to named objects (the usual way):

In [None]:
add_one <- function(x) x + 1

In [None]:
add_one(1)

The function execution terminates after the first return() call encountered

If no return() call exists, the function returns the value of the last executed statement

In [None]:
func_1 <- function(x)
{
    return("stop here")
    return(x + 1)
}

func_1(3)

In [None]:
func_2 <- function(x)
{
    x
    x + 1
    x + 2
    x + 3
}

func_2(5)

## return value

A function can return only a single object, but that object may be one with multiple values or a nested one

In [None]:
func_3 <- function(x = 1, y = 3)
{
    a <- (x + y)^2
    b <- (x - y)^2
    return(c(a, b))
}

func_3()

## Default values for arguments

When default values are defined for arguments, these values are taken as granted when no value is supplied to that argument:

In [None]:
func_3 <- function(x = 1, y = 3)
{
    (x + y)^2
}

In [None]:
func_3()

# Operators as functions

LISP is one of the inspirations of R language

Every operator on R is also a function:

In [None]:
exists("var_17")
"<-"("var_17", 1)
exists("var_17")
var_17

In [None]:
var_18 <- 1:3

In [None]:
var_18 <- ":"(1, 3)
var_18

In [None]:
var_18[2] <- 4
var_18

In [None]:
"<-"("["(var_18, 3), 5)
var_18

# Control structures

## Loops

### for loop

For definite number of iterations

For creates a new variable that iterates through a vector object:

In [None]:
for (i in 1:10)
{
    print(i^2)
}

The iterated vector cannot be modified one the loop starts:

In [None]:
vec_1 <- 1:10

for (i in vec_1)
{
    vec_1[i] <- vec_1[i] + 1
    print(i)
}

vec_1

Although vec_1 is modified inside the loop, the iteration went over the original object

Combined with conditions, "next" statement instantly skips to the next iteration while "break" terminates the execution of the loop

### while loop

while loop is used when the number of iterations is not predetermined but dependent on a logical condition

while loop does not iterate through a vector (hence not limited with object sizes) and the objects in the condition must already be existing

while continues as long as the condition returns T

In [None]:
x <- 5
while(x < 10)
{
    print(x < 10)
    print("x is still below 10")
    x <- x + 1
}

x

## Conditionals

If else statements on logical conditions:

In [None]:
check3 <- function(x)
{
    if (x < 3)
    {
        print("a is smaller than 3")
    }
    else
    {
        print("a is larger than or equal to 3")
    }    
}

check3(2)
check3(4)

Condition will check only the first value if the input is a vector of size > 1

A vectorized version is given by ifelse() function:

In [None]:
ifelse(1:5 < 3,
       "a is smaller than 3",
       "a is larger than or equal to 3"
      )

# Other data structures

## Matrix

If vector is a ball of wool:

<img src="../imagesba/wool.jpg" width="500"/>

a matrix is a pullover:

<img src="../imagesba/pullover.jpg" width="500"/>

A pullover inherits all attributes of the wool (It's color, softness, the ability to shrink when soaked in hot water, etc)

However, the wool does not have all attributes of the pullover (No sleeves, collars)

You can think of matrix as a folded form of a vector

We can create a matrix out of vector(s) by folding or binding

**JUST LIKE A VECTOR, A MATRIX HAS VALUES OF THE SAME TYPE**

### Matrix out of a vector

In [None]:
vec_1 <- 1:20

In [None]:
mat_1 <- matrix(vec_1, nrow = 4)
mat_1

It is created with column-major order by default

Let's provide row and column names:

In [None]:
rownames(mat_1) <- letters[1:nrow(mat_1)]
mat_1

In [None]:
colnames(mat_1) <- letters[1:ncol(mat_1) + nrow(mat_1)]
mat_1

Now let's check dimensions, attributes and structure:

In [None]:
dim(mat_1)
length(mat_1)
attributes(mat_1)
str(mat_1)

Matrix has dimensions and rownames and colnames as attributes

Note that a matrix is a dimensioned vector, so it still has a length equal to row * column counts

However a vector does not have dimensions!

In [None]:
mat_1

### Getting the dimensions of a matrix

dim() gets all dimension as a vector:

In [None]:
dim(mat_1)

nrow() get numbers of rows:

In [None]:
nrow(mat_1)

ncol() gets number of columns:

In [None]:
ncol(mat_1)

### rbind() vectors into a matrix

rbind() creates a matrix where vectors become rows of the matrix and vector names become rownames:

In [None]:
vec_2 <- 1:3
vec_3 <- 10:8
vec_2
vec_3

In [None]:
mat_2 <- rbind(vec_2, vec_3)
mat_2
class(mat_2)
attributes(mat_2)

### cbind() vectors into a matrix

 cbind() creates a matrix where vectors become columns of the matrix and vector names become colnames:

In [None]:
vec_2 <- 1:3
vec_3 <- 10:8
vec_2
vec_3

In [None]:
mat_2 <- cbind(vec_2, vec_3)
mat_2
class(mat_2)
attributes(mat_2)

### Subsetting matrices with two vector arguments

Matrices can be subsetted by two arguments to return another matrix:

Two index vectors:

In [None]:
mat_1[2:3,3:5]

Or two character vectors for dimension names:

In [None]:
mat_1[c("a", "c"),c("f", "g", "h")]

Or two logical vectors:

In [None]:
mat_1[c(T, F), c(F, F, T, T, F)]

Negative indices exclude items:

In [None]:
mat_1[-(1:2), -3]

### Transpose a matrix

t() function transposes a matrix : rows become columns, columns become rows:

In [None]:
mat_1

In [None]:
t(mat_1)

### Vectorization

Many operators and functions work in a vectorized manner on matrices as they do on vectors:

In [None]:
mat_1

In [None]:
mat_1 * 2

In [None]:
sqrt(mat_1)

### %*% operator

\* operator causes element-wise multiplication of matrices:

In [None]:
mat_2 <- matrix(20:1, nrow = 4, byrow = T)
mat_2

In [None]:
mat_1 * mat_2

%\*% operator causes two matrices of sizes n x m and m x o to be matrix multiplied: 

In [None]:
mat_1 %*% t(mat_2)

## List

List is a special kind of vector that holds R objects as values. It can be in a nested structure:

In [None]:
list_1 <- list(a_vector = 1:3,
              a_matrix = outer(1:2, 1:2),
              another_list = list(1, 2, 3))

list_1

Lists are very versatile and powerful objects in R for handling non-regular data

Hierarchical data structures such as JSON and XML can be converted back and forth into list objects

### Subsetting and modifying lists

Lists can be subset with three operators:

A single bracket returns a list of requested items. Multiple items can be subsetted this way:

In [None]:
list_1[1:2]

In [None]:
class(list_1[1:2])

In [None]:
list_1[1]

In [None]:
class(list_1)

Double bracket returns a single object. Numeric indices or names (with quotes) can be supplied. Multiple items cannot be subsetted:

In [None]:
list_1[[1]]

In [None]:
class(list_1[[1]])

In [None]:
list_1[["a_matrix"]]

In [None]:
class(list_1[["a_matrix"]])

$ operator returns a single object using name of the item w/o quotes:

In [None]:
list_1$a_matrix

In [None]:
class(list_1$a_matrix)

A new item can be added by c() (the new item should also be inside a list to be added as is)

In [None]:
list_1 <- c(list_1, another_vector = list(10:5))
list_1

By calling the name of the new item:

In [None]:
list_1$another_matrix <- matrix(1:4, nrow = 2)
list_1

Or indexing:

In [None]:
list_1[[6]] <- list(1:3)
list_1

Lists also support negative subsetting:

In [None]:
list_1[-1]

### Recursing a list

Lists can be traversed recursively

In [None]:
list_1

In [None]:
list_1$another_list[[2]]

Or with indices:

In [None]:
list_1[[3]][[2]]

Or with the newer compact notation:

In [None]:
list_1[[c(3, 2)]]

## Data Frame

A data frame is special type of list that is comprised of vectors of same sizes. The data types of vectors may be different.

Although it is a "list" by nature, a data frame is treated like a matrix object since it has row and column dimensions and rownames for rows and names for columns

Data frame is the main data structure to use in data science since it allows for handling different data types in each column

In [None]:
df_1 <- data.frame(int1 = 1:5, char1 = letters[1:5], logi1 = c(T, T, F, T, T))
rownames(df_1) <- LETTERS[10:14]
df_1

In [None]:
str(df_1)
attributes(df_1)

The dimensions of a data frame is its row and column counts:

In [None]:
dim(df_1)

However length of a data frame is its column count (contrary to a matrix)

In [None]:
length(df_1)

### subsetting

Data frames can be subsetted with two vectors (like a matrix) or one vector (like a list)

In [None]:
df_1[1:2, 1:2]

In [None]:
df_1[2:3]

Data frames can be subsetted with negative indices like other R structures

In [None]:
df_1[-1]

### Factors

Apart from numeric, integer, logical and character values, many datasets in data science have categorical variables that have to be represented in discrete values that are kept as integer values internally but printed with comprehensive labels.

They are called factors and they are mostly used with data frames

First let's create a numeric variable:

In [None]:
vec_4 <- c(1,1,4,2,3,1)

And concert it into a factor with labels:

In [None]:
fct_1 <- factor(vec_4, levels = 1:4, labels = c("a", "b", "c", "d"))

In [None]:
fct_1

We can append or modify a value with any of the defined labels:

In [None]:
fct_1[7] <- "a"

And we still have a vector of factor type:

In [None]:
class(fct_1)

However if we try to add a value of new label:

In [None]:
fct_1[8] <- "e"
fct_1

It is not identified as a level and hence added as NA

Now get the unique levels (as labels)

In [None]:
levels(fct_1)

Add add a new level:

In [None]:
levels(fct_1) <- c(levels(fct_1), "e")

In [None]:
fct_1

Now you can add a value of that new level:

In [None]:
fct_1[8] <- "e"
fct_1

forcats package from tidyverse bundle provides better functions to handle factors

# APPENDIX

## Vectors

### Dynamic typing

Vectors in R are dynamically typed: Their types can be changed

In [None]:
var_8 <- 1:3

In [None]:
class(var_8)

In [None]:
var_8 <- "a"

In [None]:
class(var_8)

When an R vector is updated partially with a different data type, other values are coerced when necessary

In [None]:
var_9 <- 1:3
var_9

In [None]:
class(var_9)

In [None]:
var_9[2] <- "a"

In [None]:
var_9
class(var_9)

### Recycling

In some operations involving vectors of different length, the shorter vector is recycled to the length of the longer one

In [None]:
var_10 <- 1:4
var_10

In [None]:
var_10[c(T, F)] # "T F" vector is recycled to length 4

### Initiating an empty vector

An empty vector can be initiated with an assignment of NULL:

In [None]:
var_11 <- NULL
var_11
length(var_11)
class(var_11)

By c() function:

In [None]:
var_12 <- c()
var_12
length(var_12)
class(var_12)

Or the initiator function of the appropriate type:

In [None]:
var_13 <- integer(0)

In [None]:
var_13

In [None]:
length(var_13)
class(var_13)

### Vectorized and non-vectorized and/or

& is vectorized:

In [None]:
c(T, T, F) & c(F, T, T)

While && is not: checks only the first items

In [None]:
c(T, T, F) && c(F, T, T)

The same is true for | and || also

## Special variables and values

### letters and LETTERS

Reserved objects for lower and upper case letters in the English alphabet:

In [None]:
letters

In [None]:
LETTERS

### pi

In [None]:
pi

## Numeric accuracy

Internal decimal accuracy of numeric numbers is 22 digits

Add-on packages such as rmpfr() allows more precision with special data types

the "digits" option controls how the numbers are printed (by default 7 digits)

In [None]:
getOption("digits")

In [None]:
options(digits = 1); 1/3
options(digits = 5); 1/3
options(digits = 10); 1/3
options(digits = 22); 1/3

In [None]:
options(digits = 7)

## Scientific notation

Large numeric numbers or too small decimal numbers are automatically printed with scientific notation:

In [None]:
getOption("scipen")

In [None]:
1111111111111111

In [None]:
0.000000000000000000001

To disable scipen option is set to 999

In [None]:
options(scipen = 999)

In [None]:
1111111111111111

In [None]:
0.00000000000000000000001

## Numeric limits

Largest integer value to be held by R is 2^31 - 1

In [None]:
as.integer(2^31 - 1)

In [None]:
as.integer(2^31)

Largest numeric value to be handled properly is 2^53

In [None]:
2^53
2^53 - 1
2^53 + 1

With add-on libraries like gmp, much larger values can be held properly with special data types

## Environments and scoping

R is lexicall scoped: Functions create their own environments on call. Objects are kept in their own environments

For example a global object is created:

In [None]:
var_19 <- 3

In [None]:
func_3 <- function()
{
    var_19 <- 5
    return(var_19)
}

In [None]:
func_3()
var_19

var_19 at the global environment and at func_3's scope are different

However, superassignment operator "<<-" can modify global objects from a function's scope

In [None]:
func_4 <- function(x)
{
    var_20 <<- x
    return(var_20)
}

In [None]:
var_20 <- 10
var_20
func_4(12)
var_20

## Recursion

R supports recursion (the stack size and maximum depth can be controlled)

In [None]:
factorial_r <- function(x)
{
    if(x == 1)
    {
        return(x)
    }
    else
    {
        return(x * factorial_r(x-1))
    }
}

In [None]:
factorial_r(5)

## Indenting

Contrary to Python, indentation of the code is not important for interpretation

However, for good style, proper indentation should be followed

## Matrix

### Drop

When subsetted with two arguments, the object retains its matrix structure:

In [None]:
mat_1[2:3,3:5]
class(mat_1[2:3,3:5])

Unless only a single row or column is returned, the "matrix" structure is dropped in this case:

In [None]:
mat_1[2,3:5]
class(mat_1[2,3:5])

To prevent this, "drop" argument must be provided with F value:

In [None]:
mat_1[2,3:5, drop = F]
class(mat_1[2,3:5, drop = F])

### Subsetting a matrix with a single vector argument

What if we subset with a single index:

In [None]:
mat_1[2:10]

In [None]:
class(mat_1[2:10])

In [None]:
attributes(mat_1[2:10])

It is converted to a vector automatically

### Subsetting a matrix by another matrix

Non contiguous cells from a matrix can be subsetted by using a two column matrix:

In [None]:
row_indices <- c(4, 1, 2)
col_indices <- c(2, 5, 3)

index_mat <- cbind(row_indices, col_indices)
index_mat

In [None]:
mat_1

In [None]:
mat_1[index_mat]

row() and col() functions return the row and column indices of all cells in a matrix:

In [None]:
row(mat_1)
col(mat_1)

These return matrices can be used for subsetting and manipulating matrices in more complicated ways, for example extracting the secondary diagonal, etc

In [None]:
mat_1[row(mat_1) + col(mat_1) == 4] <- 0
mat_1

In [None]:
mat_1[row(mat_1)]

### outer() function

outer() repeats a function on the cartesian product of multiple vectors

It can create a matrix out of two vectors:

In [None]:
outer(1:5, 1:5, "*")

## List

### Structure and attributes

In [None]:
str(list_1)

In [None]:
attributes(list_1)

List is closer to a vector than it is to a matrix: It does not have dimension, rownames or colnames attributes

A list can have names and it has a length

In [None]:
length(list_1)

### unlisting

A list in R can be flattened into simpler vectors by unlist()

In [None]:
list_1

In [None]:
unlist(list_1)
class(unlist(list_1))

## Data Frame

### matrix to data.frame

A matrix can be converted to a data frame with as.data.frame:

In [None]:
df_2 <- as.data.frame(mat_1)
df_2
attributes(df_2)

### list to data.frame

A list of vectors of same sizes can be converted to a data frame:

In [None]:
list_2 <- list(a = 1:3, b = letters[2:4], c = rep(T, 3))
list_2

In [None]:
df_3 <- as.data.frame(list_2)
df_3
attributes(df_3)

## Arrays

Arrays are hyper dimensional version of matrices in R: They can have > 2 dimension but they still have to contain values of the same type like a matrix

In [None]:
ar_1 <- array(1:12, dim = c(2, 3, 2))
ar_1

In [None]:
str(ar_1)
attributes(ar_1)

abind package extends the rbind and cbind functionality into n-dimensional arrays

### Subsetting and modifying arrays

When drop argument with F value is not provided, subsetting an array with a single value on a margin will return a lower dimension object:

In [None]:
ar_1[1,,]
class(ar_1[1,,])

In [None]:
ar_1[1,1,]
class(ar_1[1,1,])

Multiple values on all margins:

In [None]:
ar_1[1:2,1:2,1:2]
class(ar_1[1:2,1:2,1:2])

Or the drop = F argument keeps the dimension of the array:

In [None]:
ar_1[1,1,1,drop = F]
class(ar_1[1,1,1,drop = F])

Negative indices can be used for subsetting arrays

In [None]:
ar_1[-1,-1,-1, drop = F]

### outer() to array

outer() function can create an array out of vector or matrix objects:

In [None]:
vec_4 <- 1:2
mat_2 <- outer(vec_4, vec_4, FUN = "*")
vec_4
mat_2

A 3D array:

In [None]:
ar_2 <- outer(vec_4, mat_2, FUN = "^")
ar_2
class(ar_2)
str(ar_2)
attributes(ar_2)

A 4D array:

In [None]:
ar_3 <- outer(mat_2, mat_2, FUN = "^")
ar_3
class(ar_3)
str(ar_3)
attributes(ar_3)