## data storage objects  
There are several objects in R that handle data storage. We will go through them one by one.   
But first the concept of placeholders, also called identifiers, has to be explained.  

You, as a programmer, are expected to create logical placeholder names. These names should be short, but still clarify what it is. Placeholders are used to store any object type. It is recommended to:  
* separate words with underscores: my_variable.  
* use small letters  
* for functions use verbs: join_tables()  


### placeholders  
As an example of a placeholder, let us define a variable that enables us to store a time-stamp: 

In [None]:
current_time <- Sys.time()
current_time 

* exercise: Run the lines several times, see how the placeholder `current_time` stores the output of our `Sys.time()` function?  

We can also store the output of a calculation:

In [None]:
my_result <- 5 * (1+5)
my_result

And use this later on in our analysis, because it is stored in *R memory*! 

In [None]:
my_result + 1

* exercise 3: Explain and then fix the error message you get when running the following line of code: 

In [None]:
my_other_result + 1

### vectors
A vector object is created with the `c()` command, each entry is separated by a comma:  
`my_vector <- c( 4, 6, 2, 4, 1)`   


* exercise 4: Now that we know how to create vectors, how to create variables and strings/ characters lets practice this:  
Create a vector with the `c()` function containing the sentence; "hello, my name is " in the first entry and in the second vector entry a placeholder containing your name.  Make sure to   define the placeholder containing your name first, or you will get an error!  

Another way to create sequences of numbers is with the `seq()` function. Have a look at the help file for `seq` by typing
?seq. Note the `from` `to` `by` and `length.out` arguments. To create a sequence of 6 equally spaced numbers from 0 to 10 you can do:
`my_seq <- seq(from = 0, to = 10, by = 2)` or you can do `my_seq <- seq(from = 0, to = 10, length.out = 6)` 
Another useful method to generate sequences of integers is the colon: `1:10`  

* exercise: Try all three methods to generate a sequence of numbers from 12 to 2 with steps of 1. 

It is also possible to do calculations with numeric vectors. There is a behavior to be aware of called recycling. 
* exercise: Calculate the sum of these vectors and explain recycling: `1:6 + 1:2` also try `1:5 + 1:2`, when and why did you get a warning?   

Now let us move on to selection operations on vectors. Square brackets `[]` are used for selection. Selection by dimension (rows/columns/ or more dimensions) can be performed by separating the arguments inside the brackets by comma's. A vector however, has only 1 dimension. To select the second entry of my_seq: `my_seq[2]`   
* exercise: Select entry 1, 3 and 5 of `my_seq`:  

### matrixes  
Single class tables can be stored as matrices. Matrices are not so useful for us biologists, but super useful for fast matrix calculations in the for of linear algebra needed for the various statistical tools available in R.  We will have a quick look at them. To define a matrix:  

In [None]:
my_matrix <- matrix(1:9, nrow = 3, ncol = 3)
my_matrix

In [None]:
my_matrix2 <- matrix(1:9, nrow = 3, ncol = 3, byrow = TRUE)
my_matrix2

Selection of matrix data is two dimensional, you need row and column identifiers, selection occurs like this: 
`my_matrix[`*row-numbers*`,` *column-numbers*`]` to select row 2 and 3 of the 1st and 3rd columns: 

In [None]:
my_matrix[2:3, c(1, 3)] 

### data.frames  
The data.frame allows mixing data classes, so you can mix numeric, strings and factors. Let's have a look at the already available `iris` data.frame using the `head` function to display only the first 7 rows. 

In [None]:
head(iris, n = 7)

* exercise: Display the last 5 rows of `iris`  

To define your own data frame use the `data.frame()` function:  

In [None]:
my_dataframe <- data.frame(names = c("Steven", "Margriet", "Gerard", "Elke"),
                           hobbies = c("R", "Python", "cheminformatics", "pharmacokinetics"),
                           sex = c("M", "F", "M", "F"),
                           fakeage = c(38, 30, 34, 28))
my_dataframe

* exercise: Define your own data frame, the first column should have 10 random letters, the second column random numbers drawn from the `1:6` sequence. Name the columns as you want.  
Hints: check out the `letters` or `LETTERS` object and have a look at the help documentation for the `sample()` function.

To check the column classes, for column one and five this would be: 

In [None]:
class(iris[, 1])

In [None]:
class(iris[, 5])

This mixing of data types is not possible in matrices.  
Before we dive deeper into `data.frames`, we will discuss the `factor` class. Let us print the contents of the `Species` column from the `iris` `data.frame`: 

In [None]:
iris[, 5]

Note the `Levels` at the bottom? Note: (`levels(iris[, 5])`) does displayes the levels specifically. This informs us this column has the class `factor`, and also informs us of the unique factor levels. This is mainly important for statistical analysis, where defining the order is important. However many functions and visualization tools in R also use the factor data type to define for example the order of displaying data labels or the order of plotting nominal values on the x-axis.  
You can create the `factor` class with (first we transform the `Species` to character class):

In [None]:
iris$Species <- as.character(iris$Species)
levels(iris$Species)
iris$Species <- factor(iris$Species)
levels(iris$Species)

Always think of `data.frames` in a column-wise fashion. A column represents a variable. Next to `iris[, 5]` We can aces `data.frame` columns with `iris$Species` or `iris[, "Species"]`  

This is an opportune moment to explain Boolean and conditional selection. The Boolean class, in R called the logical class, is another class of data. It has the values `TRUE`, `FALSE` and `NA`. The `NA` is used if the entry does not exist.  
With logicals it is possible to perform selection. For example;  
`iris[ iris$Petal.Length < 1.4,  ]` displays all rows of the `iris` data for which the Petal Length is smaller than 1.4.  
printing the `iris$Petal.Length < 1.4` statement clearly shows what is happening. 

In [None]:
iris$Petal.Length < 1.4

It is possible to perform multiple conditional selections by using the `&` sign in between.  

* exercise: Display the `Petal.Length` for all plants that belong to the `virginica` species. 

* exercise: Display the `Petal.Length` for all plants that belong to the `virginica` species for which the petal length is smaller than 1.6.  

A very useful operator for selection to multiple entries is the `%in%` operator:  

In [None]:
iris[ iris$Species %in% c("virginica", "setosa")  & iris$Sepal.Length %in% c( 7.2, 5.3),]

Finally, we will have a quick look how we can perform calculations with data frames. To define a new variable (thus column) in your data frame you can use direct assignment to a new variable: `iris$is_small_Sepal <- iris$Sepal.Length < 5 | iris$Sepal.Width < 3`   
Note that you can do calculations on logicals:

In [None]:
sum(c(TRUE, TRUE, FALSE, FALSE, TRUE))

* exercise: how many small sepal plants are there for each of the three species? With small as defined above.  

Note how we used multiple columns in a `data.frame` to calculate a new column and that these operations are performed row-wise, thus for each plant!  

### lists  

Lists are another object class in R to store data. Actually any object can be stored in lists, for example functions, data frames, vectors, characters and even other lists can all be placed in a list. If you put 1 object in a list, the list has length 1, putting 2 objects in the list creates a list of length 2, and so on. You can name the entries in the list making access easy and explicit. Let's create a list with named entries:  

In [None]:
my_list <- list(teachers = my_dataframe, stat = t.test( 
  subset(iris, Species == "virginica")[, "Sepal.Length"], subset(iris, Species == "setosa")[, "Sepal.Length"]) )

my_list

As you can see, the first entry in the list is our `my_dataframe` and the second entry is the result of the `t.test()` function where we tested for different `Sepal.Length` between the species `virginica` and`setosa`  

To access an entry in our list there are several options. One can use the name:  

In [None]:
my_list$teachers
my_list$stat

Or simply use index numbers:  

In [None]:
my_list[[1]]

If you want to keep the list attributes use a single bracket:  

In [None]:
my_list[1]

To select multiple slots simultaneously use single brackets:  

In [None]:
my_list[1:2]

Accessing objects inside lists is also straightforward, access the list entry and then treat the object selections as you would normally:  

In [None]:
my_list$teachers$hobbies

Or you could do:  

In [None]:
my_list[[1]][, "hobbies"]

* exercise: Store the iris data frame as and additional slot in the existing list object `my_list`  

* exercise: Display the `Sepal.Lengths` of all `virginca` and `setosa` plants from within the `my_list`  

### arrays  
Arrays are higher dimensional  matrices. For example, a three dimensional array can be thought of as multiple matrices stacked behind one another, for 4 3x3 matrices: 

In [None]:
my_array <- array(1:28, dim = c(3, 3, 4))
dim(my_array)
my_array

To access 2nd row, 2nd column of 2nd matrix: `[row, column, extra_dimension1]`  

In [None]:
my_array[ 2, 2, 2 ]

Or to access the 3rd matrix:  

In [None]:
my_array[ , , 3]

But you could also give the matrices themselves indexes in multiple dimensions, for example:  

In [None]:
my_array <- array(1:28, dim = c(3, 3, 2, 2 ))
dim(my_array)
my_array