<h1>Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Arrays-&amp;-Matrices" data-toc-modified-id="Arrays-&amp;-Matrices-1">Arrays &amp; Matrices</a></span><ul class="toc-item"><li><span><a href="#Arrays" data-toc-modified-id="Arrays-1.1">Arrays</a></span></li><li><span><a href="#Accessing-an-Array" data-toc-modified-id="Accessing-an-Array-1.2">Accessing an Array</a></span></li><li><span><a href="#Matrices" data-toc-modified-id="Matrices-1.3">Matrices</a></span></li><li><span><a href="#Accessing-a-Matrix" data-toc-modified-id="Accessing-a-Matrix-1.4">Accessing a Matrix</a></span></li></ul></li><li><span><a href="#Lists" data-toc-modified-id="Lists-2">Lists</a></span><ul class="toc-item"><li><span><a href="#Accessing-items-in-a-List" data-toc-modified-id="Accessing-items-in-a-List-2.1">Accessing items in a List</a></span></li><li><span><a href="#Named-Lists" data-toc-modified-id="Named-Lists-2.2">Named Lists</a></span></li><li><span><a href="#Accessing-Named-Lists" data-toc-modified-id="Accessing-Named-Lists-2.3">Accessing Named Lists</a></span></li><li><span><a href="#Adding-items" data-toc-modified-id="Adding-items-2.4">Adding items</a></span></li><li><span><a href="#Modifying-items" data-toc-modified-id="Modifying-items-2.5">Modifying items</a></span></li><li><span><a href="#Removing-items" data-toc-modified-id="Removing-items-2.6">Removing items</a></span></li></ul></li><li><span><a href="#Data-Frames" data-toc-modified-id="Data-Frames-3">Data Frames</a></span><ul class="toc-item"><li><span><a href="#Accessing-Data-Frames" data-toc-modified-id="Accessing-Data-Frames-3.1">Accessing Data Frames</a></span></li><li><span><a href="#Data-Frame-Structure" data-toc-modified-id="Data-Frame-Structure-3.2">Data Frame Structure</a></span></li><li><span><a href="#Head-and-Tail" data-toc-modified-id="Head-and-Tail-3.3">Head and Tail</a></span></li><li><span><a href="#Inserting-a-new-column" data-toc-modified-id="Inserting-a-new-column-3.4">Inserting a new column</a></span></li><li><span><a href="#Inserting-a-new-row" data-toc-modified-id="Inserting-a-new-row-3.5">Inserting a new row</a></span></li><li><span><a href="#Deleting-Rows" data-toc-modified-id="Deleting-Rows-3.6">Deleting Rows</a></span></li><li><span><a href="#Deleting-Columns" data-toc-modified-id="Deleting-Columns-3.7">Deleting Columns</a></span></li></ul></li><li><span><a href="#Review-Questions" data-toc-modified-id="Review-Questions-4">Review Questions</a></span></li></ul></div>

# R 101

# Module 2 - Data structures in R

* Arrays & Matrices
* Lists
* Dataframes

## Arrays & Matrices

### Arrays

* An array is a structure that contains data of the same type
* Arrays can be multi-dimensional, so the data can be contained in multiple rows and columns

In [1]:
# In order to create an array, first create a vector:
movie_vector <- c("Akira", "Toy Story", "Room", "The Wave", "Whiplash", "Star Wars", "The Ring", "The Artist", "Jumanji")

movie_array <- array(movie_vector, dim = c(3,3))  # The second argument specifies the dimension (i.e. 3 rows and 3 cols)
movie_array

0,1,2
Akira,The Wave,The Ring
Toy Story,Whiplash,The Artist
Room,Star Wars,Jumanji


### Accessing an Array

In [2]:
# To extract a particular element, e.g. "Whiplash"

movie_array[2,2]

In [3]:
# To extract an entire row, e.g. the entire first row

movie_array[1,]  # specify row but leave the column empty

In [4]:
# To extract an entire column, e.g the entire 2nd column

movie_array[,2]  # specify column but leave the row empty

### Matrices

* Similar in structure to an array
* Must be two dimensional

In [5]:
movie_vector <- c("Akira", "Toy Story", "Room", "The Wave", "Whiplash", "Star Wars", "The Ring", "The Artist", "Jumanji")

# To build a three by three matrix:
movie_matrix <- matrix(movie_vector, nrow = 3, ncol = 3)
movie_matrix

0,1,2
Akira,The Wave,The Ring
Toy Story,Whiplash,The Artist
Room,Star Wars,Jumanji


In [6]:
# By default, the matrix is arranged by columns. To arrange by rows:

movie_matrix <- matrix(movie_vector, nrow = 3, ncol = 3, byrow = TRUE)
movie_matrix

0,1,2
Akira,Toy Story,Room
The Wave,Whiplash,Star Wars
The Ring,The Artist,Jumanji


### Accessing a Matrix

In [7]:
# To access a subset of a matrix:

movie_matrix[2:3, 1:2]  # Extract the values in rows 2 to 3, and in cols 1 to 2

0,1
The Wave,Whiplash
The Ring,The Artist


## Lists

* a collection of objects, similar to a vector
* the elements of a list can differ in terms of data type

In [8]:
movie <- list("Toy Story", 1995, c("Animation", "Adventure", "Comedy"))
movie

### Accessing items in a List

In [9]:
# To access the second element inside the list:
movie[2]

In [10]:
# To access all elements between index 2 and index 3:
movie[2:3]

### Named Lists

In [11]:
# To provide each of the individual variables with a name:

movie <- list(name = "Toy Story", year = 1995, genre = c("Animation", "Adventure", "Comedy"))
movie

### Accessing Named Lists

In [12]:
# To access a list element by name:
movie$genre

In [13]:
# Alternatively:
movie["genre"]

### Adding items

In [14]:
# R appends the new element to the last position of the list
movie["age"] <- 5
movie

### Modifying items

In [15]:
# To modify a value of the list, simply overwrite an existing value:
movie["age"] <- 6
movie

### Removing items

In [16]:
# To remove an element from the list, assign a "NULL" value to it:
movie["age"] <- NULL
movie

## Data Frames

* A data frame is a type of structure that contains correlated information.
* Each argument in the "data.frame" function is a vector that represents one of the columns.
* Each vector should contain data of the same type.

In [17]:
# Data frame for storing movie titles along with their corresponding years:

movies <- data.frame(name = c("Toy Story", "Akira", "The Breakfast Club", "The Artist", "Modern Times","Fight Club", "City of God", "The Untouchables"),
                     year = c(1995, 1998, 1985, 2011, 1936, 1999, 2002, 1987))
movies

name,year
<chr>,<dbl>
Toy Story,1995
Akira,1998
The Breakfast Club,1985
The Artist,2011
Modern Times,1936
Fight Club,1999
City of God,2002
The Untouchables,1987


### Accessing Data Frames

In [18]:
# To access the variables of a data frame:
movies$name

In [19]:
# Alternatively specify the column number inside square brackets:
movies[1]

name
<chr>
Toy Story
Akira
The Breakfast Club
The Artist
Modern Times
Fight Club
City of God
The Untouchables


In [20]:
# To access individual elements specify the row and column numbers:
movies[1,2]

### Data Frame Structure

In [21]:
# To get information about the data frame's structure:
str(movies)

'data.frame':	8 obs. of  2 variables:
 $ name: chr  "Toy Story" "Akira" "The Breakfast Club" "The Artist" ...
 $ year: num  1995 1998 1985 2011 1936 ...


### Head and Tail

In [22]:
# The "head" function displays the first six elements of a data frame:
head(movies)

Unnamed: 0_level_0,name,year
Unnamed: 0_level_1,<chr>,<dbl>
1,Toy Story,1995
2,Akira,1998
3,The Breakfast Club,1985
4,The Artist,2011
5,Modern Times,1936
6,Fight Club,1999


In [23]:
# The "tail" function displays the last six elements of a data frame:
tail(movies)

Unnamed: 0_level_0,name,year
Unnamed: 0_level_1,<chr>,<dbl>
3,The Breakfast Club,1985
4,The Artist,2011
5,Modern Times,1936
6,Fight Club,1999
7,City of God,2002
8,The Untouchables,1987


### Inserting a new column

In [24]:
# Add a new column for "length"
movies["length"] <- c(81, 125, 97, 100, 87, 139, 130, 119)
movies

name,year,length
<chr>,<dbl>,<dbl>
Toy Story,1995,81
Akira,1998,125
The Breakfast Club,1985,97
The Artist,2011,100
Modern Times,1936,87
Fight Club,1999,139
City of God,2002,130
The Untouchables,1987,119


### Inserting a new row

In [25]:
# The arguments for the "rbind" function are the data frame itself, and a vector that contains values for all the columns:
movies <- rbind(movies, c(name="Dr. Strangelove", year=1964, length=94))
movies

name,year,length
<chr>,<chr>,<chr>
Toy Story,1995,81
Akira,1998,125
The Breakfast Club,1985,97
The Artist,2011,100
Modern Times,1936,87
Fight Club,1999,139
City of God,2002,130
The Untouchables,1987,119
Dr. Strangelove,1964,94


### Deleting Rows

In [26]:
# To delete the last row (row 9) of the data frame:
movies <- movies[-9,]  # Note: leave the column blank
movies

Unnamed: 0_level_0,name,year,length
Unnamed: 0_level_1,<chr>,<chr>,<chr>
1,Toy Story,1995,81
2,Akira,1998,125
3,The Breakfast Club,1985,97
4,The Artist,2011,100
5,Modern Times,1936,87
6,Fight Club,1999,139
7,City of God,2002,130
8,The Untouchables,1987,119


### Deleting Columns

In [27]:
# To delete the "length" column:
movies["length"] <- NULL
movies

Unnamed: 0_level_0,name,year
Unnamed: 0_level_1,<chr>,<chr>
1,Toy Story,1995
2,Akira,1998
3,The Breakfast Club,1985
4,The Artist,2011
5,Modern Times,1936
6,Fight Club,1999
7,City of God,2002
8,The Untouchables,1987


## Review Questions

1. Given a 5 x 5 matrix object, `movies`, how would you retrieve the bottom-left item?
* **Answer:** movies[5,1]

2. Below is a list for a student and his info. Select all the correct options we can use to retrieve his courses?
* **Answer:** All options are correct

In [28]:
john <- list("studentid" = 9, "age" = 18, "courses" = c("Data Science 101", "Data Science Methodology"))

In [29]:
john["courses"]

In [30]:
john$courses

In [31]:
john[3]

3. Select the correct code from the following options which produces the following result?

In [32]:
# Answer:
data.frame("student" = c("John", "Mary"), "id" = c(1, 2))

student,id
<chr>,<dbl>
John,1
Mary,2


<hr>