Skip to content

Latest commit

 

History

History
695 lines (443 loc) · 21.4 KB

chapter3.md

File metadata and controls

695 lines (443 loc) · 21.4 KB
courseTitle chapterTitle description framework mode
Introduction to R (v2)
Matrices
In this chapter, you'll learn how to work with matrices in R. By the end of the chapter you'll be able to create matrices, and understand how you can do basic computations with them. You'll analyze the box office numbers of Star Wars to illustrate the use of matrices in R. May to force be with you!
datamind
selfcontained

What's a matrix?

In R, a matrix is a collection of elements of the same data type (numeric, character, or logical), that is arranged into a fixed number of rows and columns. Since you are only working with rows and columns, a matrix is called two-dimensional.

In R, you can construct a matrix with the matrix function, for example: matrix(1:9, byrow=TRUE, nrow=3).

The first argument, is the collection of elements that R will arrange into the rows and columns of the matrix. Here, we use 1:9 which constructs the vector c(1,2,3,4,5,6,7,8,9).

The argument byrow indicates that the matrix is filled by the rows. If we want the vector to be filled by the columns, we just place bycol=TRUE or byrow=FALSE.

The third argument nrow indicates that the matrix should have three rows.

*** =instructions

  1. Construct a matrix with 3 rows containing the numbers 1 upto 9 in the editor, the click Submit Answer button and look at the output in the console. Hint, use: matrix(1:9, byrow=TRUE, nrow=3).

*** =hint Read the assigment carefully, the answer is already given ;-).

*** =sample_code

# Construction of a matrix with 3 rows containing the numbers 1 upto 9

*** =solution

matrix(1:9, byrow = TRUE, nrow = 3)

*** =sct

DM.result <- code_test("matrix(1:9, byrow=TRUE, nrow=3)")

*** =pre_exercise_code


Analyzing matrices, you shall

It is time to get your hands dirty. In the following exercises you will analyze the box office numbers of the Star Wars franchisee. May the force be with you!

In the editor, three vectors are defined, each representing the box office numbers from the first three Star Wars movies. The first element of each vector indicates the US box office revenue, the second element of each vector refers to the Non-US box office (source: Wikipedia).

*** =instructions

  1. Construct a matrix with one row for each movie (thus with 3 rows), a column for the US box office revenue, and a second column for the non-US box office revenue. Name the matrix star.wars.matrix.

*** =hint Remember that you can construct a matrix containing the numbers 1 upto 9 with: matrix(1:9, byrow=TRUE, nrow=3). In this case, you don't want the number 1 upto 9, but the elements of the 3 star wars movie: this means the input vector is thus: c(new.hope,empire.strikes,return.jedi).

*** =sample_code

# Box office Star Wars: In Millions (!)  First element: US, Second
# element: Non-US
new.hope <- c(460.998007, 314.4)
empire.strikes <- c(290.475067, 247.9)
return.jedi <- c(309.306177, 165.8)

# Add your code below to construct the matrix
star.wars.matrix <- 
# Show me the
star.wars.matrix

*** =solution

# Box office Star Wars: In Millions (!)  First element: US, Second
# element: Non-US
new.hope <- c(460.998007, 314.4)
empire.strikes <- c(290.475067, 247.9)
return.jedi <- c(309.306177, 165.8)

# Add your code below to construct the matrix
star.wars.matrix <- matrix(c(new.hope, empire.strikes, return.jedi), nrow = 3, 
    byrow = TRUE)

# Show me the
star.wars.matrix

*** =sct

names <- "star.wars.matrix"
values <- "matrix( c(new.hope,empire.strikes,return.jedi),\nnrow=3, byrow=TRUE)"
DM.result <- closed_test(names, values)

*** =pre_exercise_code


Naming a matrix

To help you remember what's stored in star.wars.matrix, you'd like to add the names of the movies for the rows. Not only does this helps you to read the data, it is also useful to select certain elements from the matrix later.

Similar to vectors, you can add names for the rows and the columns of a matrix with:

  • rownames(my.matrix) <- row.names.vectors
  • colnames(my.matrix) <- col.names.vectors

*** =instructions

  1. Give the columns of star.wars.matrix the names "US" and "non-US".
  2. Give the rows of the matrix star.wars.matrix the names of the three movies. In case you are not a fan ;-), the movie names are: "A new hope", "The empire strikes back" and "Return of the Jedi".

*** =hint Don't forget that R is case sensitive. The vector for the column names is thus: c("US","non-US") and for the row names: c("A new hope","The empire strikes back","Return of the Jedi").

*** =sample_code

# Box office Star Wars: In Millions (!)  First element: US, Second
# element: Non-US
new.hope <- c(460.998007, 314.4)
empire.strikes <- c(290.475067, 247.9)
return.jedi <- c(309.306177, 165.8)

# Construct matrix:
star.wars.matrix <- matrix(c(new.hope, empire.strikes, return.jedi), nrow = 3, 
    byrow = TRUE)

# Add your code here such that rows and columns of star.wars have a name!

# Print the matrix to the console:
star.wars.matrix

*** =solution

# Box office Star Wars: In Millions (!)  First element: US, Second
# element: Non-US
new.hope <- c(460.998007, 314.4)
empire.strikes <- c(290.475067, 247.9)
return.jedi <- c(309.306177, 165.8)

# Construct matrix:
star.wars.matrix <- matrix(c(new.hope, empire.strikes, return.jedi), nrow = 3, 
    byrow = TRUE)

colnames(star.wars.matrix) <- c("US", "non-US")
rownames(star.wars.matrix) <- c("A new hope", "The empire strikes back", "Return of the Jedi")

# Print the matrix to the console:
star.wars.matrix

*** =sct

names <- c("star.wars.matrix", "colnames(star.wars.matrix)", "rownames(star.wars.matrix)")
values <- list("star.wars.matrix", "c(\"US\",\"non-US\")", "c(\"A new hope\", \"The empire strikes back\", \"Return of the Jedi\")")
DM.result <- closed_test(names, values, check.existence = c(T, F, F))

*** =pre_exercise_code


Calculating the worlwide box office

The single most important thing for a movie to become an instant legend in Tinseltown, is its worldwide box office figures.

To calculate the total box office revenue for the three Star Wars movies, you have to take the sum of the US revenue column and the non-US revenue column.

In R, the function rowSums() conveniently calculates the totals for each row of a matrix. This function creates a new vector. sum.of.rows.vector<- rowSums(my.matrix).

*** =instructions

  1. Calculate the worldwide box office figures for the three movies and put these in the vector named worldwide.vector.

*** =hint The rowSums() function will calculate the total box office for each row of the star.wars.matrix, if you supply star.wars.matrix as an argument to that function by putting it between the brackets.

*** =sample_code

# Box office Star Wars: In Millions (!) 
# Construct matrix: 
box.office.all <- c(461, 314.4,290.5, 247.9,309.3,165.8)
movie.names    <- c("A new hope","The empire strikes back","Return of the Jedi")
col.titles     <-  c("US","non-US")
star.wars.matrix      <- matrix(box.office.all, nrow=3, byrow=TRUE,dimnames=list(movie.names,col.titles))


worldwide.vector <- #Your code here!

# Show me the
worldwide.vector

*** =solution

# Box office Star Wars: In Millions (!)  Construct matrix:
box.office.all <- c(461, 314.4, 290.5, 247.9, 309.3, 165.8)
movie.names <- c("A new hope", "The empire strikes back", "Return of the Jedi")
col.titles <- c("US", "non-US")
star.wars.matrix <- matrix(box.office.all, nrow = 3, byrow = TRUE, dimnames = list(movie.names, 
    col.titles))

worldwide.vector <- rowSums(star.wars.matrix)

# Show me the
worldwide.vector

*** =sct

names <- c("star.wars.matrix", "worldwide.vector")
values <- c("star.wars.matrix", "rowSums(star.wars.matrix)")
DM.result <- closed_test(names, values)

*** =pre_exercise_code


Adding a column for the Worlwide box office (2)

In the previous exercise, you calculated the vector that contained the worldwide boxoffice receipt for each of the three star wars movies. However, this vector is not yet part of star.wars.matrix...

When you want to add a column or multiple columns to a matrix. A good way to do this is cbind(), which merges matrices and/or vectors together by column. For example: new.combined.matrix <- cbind( matrix1, matrix2, vector1, ... ).

*** =instructions

  1. Add worldwide.vector as a new column to the star.wars matrix and assign to all.wars.matrix. Use the cbind() function.

*** =hint Bind the worldwide.vector to the star.wars.matrix with the cbind() function, with cbind( the.correct.matrix, the.correct.vector) and assign to all.star.wars.matrix.

*** =sample_code

# Box office Star Wars: In Millions (!)  Construct matrix:
box.office.all <- c(461, 314.4, 290.5, 247.9, 309.3, 165.8)
movie.names <- c("A new hope", "The empire strikes back", "Return of the Jedi")
col.titles <- c("US", "non-US")
star.wars.matrix <- matrix(box.office.all, nrow = 3, byrow = TRUE, dimnames = list(movie.names, 
    col.titles))
worldwide.vector <- rowSums(star.wars.matrix)

# Bind the new variable total.per.movie as a column to star.wars
all.wars.matrix <- 
# Show me the
all.wars.matrix

*** =solution

# Box office Star Wars: In Millions (!)  Construct matrix:
box.office.all <- c(461, 314.4, 290.5, 247.9, 309.3, 165.8)
movie.names <- c("A new hope", "The empire strikes back", "Return of the Jedi")
col.titles <- c("US", "non-US")
star.wars.matrix <- matrix(box.office.all, nrow = 3, byrow = TRUE, dimnames = list(movie.names, 
    col.titles))

# Print the matrix to the console:
worldwide.vector <- rowSums(star.wars.matrix)
worldwide.vector  # Print worldwide revenue per movie

# Bind the new variable total.per.movie as a column to star.wars
all.wars.matrix <- cbind(star.wars.matrix, worldwide.vector)

# Show me the:
all.wars.matrix

*** =sct

names <- c("star.wars.matrix", "worldwide.vector", "all.wars.matrix")
values <- c("star.wars.matrix", "rowSums(star.wars.matrix)", "cbind(star.wars.matrix,worldwide.vector)")
DM.result <- closed_test(names, values)

*** =pre_exercise_code


Adding a column for the Worlwide box office

Just like every action has a reaction, every cbind() has a rbind(). (Ok we admit, we are pretty bad with metaphors)

Your R workspace now contains two matrices, the star.wars.matrix we just used (the first trilogy) but also the star.wars.matrix2 for the second trilogy. Type the name of the matrices in the console and press enter, in case you want to have a closer look.

*** =instructions

  1. Assign to all.wars.matrix a new matrix with star.wars.matrix in the first three rows and star.wars.matrix2 in the next three rows.

*** =hint Bind the two matrices together in the following way: rbind( matrix1, matrix2 ) and assign to all.wars.matrix.

*** =sample_code

# Box office Star Wars: In Millions (!)
star.wars.matrix  # Matrix containing first trilogy box office
star.wars.matrix2  # Matrix containing second trilogy box office

# Combine the both Star Wars trilogies in one matrix
all.wars.matrix <- 
# Show me the
all.wars.matrix

*** =solution

# Box office Star Wars: In Millions (!)
star.wars.matrix  # Matrix containing first trilogy box office
star.wars.matrix2  # Matrix containing second trilogy box office

# Combine the both Star Wars trilogies in one matrix
all.wars.matrix <- rbind(star.wars.matrix, star.wars.matrix2)

# Show me the
all.wars.matrix

*** =sct

names <- c("star.wars.matrix", "star.wars.matrix2", "all.wars.matrix")
values <- c("star.wars.matrix", "star.wars.matrix2", "rbind(star.wars.matrix,star.wars.matrix2)")
DM.result <- closed_test(names, values)

*** =pre_exercise_code

# Construct matrix:
box.office.all <- c(461, 314.4, 290.5, 247.9, 309.3, 165.8)
movie.names <- c("A new hope", "The empire strikes back", "Return of the Jedi")
col.titles <- c("US", "non-US")
star.wars.matrix <- matrix(box.office.all, nrow = 3, byrow = TRUE, dimnames = list(movie.names, 
    col.titles))

# Construct matrix2:
box.office.all2 <- c(474.5, 552.5, 310.7, 338.7, 380.3, 468.5)
movie.names2 <- c("The Phantom Menace", "Attack of the Clones", "Revenge of the Sith")
star.wars.matrix2 <- matrix(box.office.all2, nrow = 3, byrow = TRUE, dimnames = list(movie.names2, 
    col.titles))

Adding a column for the Worlwide box office

Just like every cbind() has an rbind(), every colSums() has a rowSums(). Your R workspace contains the all.wars.matrix you constructed in the previous exercise (Type all.wars.matrix in the console if you don't recall what it contains). Let's now calculate the total box office revenue for the entire saga.

*** =instructions

  1. Calculate the total revenue for the US and the non-US region and assign total.revenue.vector. You can use the colSums() function.

*** =hint You should use the colSums() function with star.wars.matrix as argument to find the total box office per region.

*** =sample_code

# Print box office Star Wars: In Millions (!) for 2 trilogies
all.wars.matrix

total.revenue.vector

*** =solution

# Print box office Star Wars: In Millions (!) for 2 trilogies
all.wars.matrix

total.revenue.vector <- colSums(all.wars.matrix)

*** =sct

names <- c("all.wars.matrix", "total.revenue.vector")
values <- c("all.wars.matrix", "colSums( all.wars.matrix )")
DM.result <- closed_test(names, values)

*** =pre_exercise_code

# Construct matrix:
box.office.all <- c(461, 314.4, 290.5, 247.9, 309.3, 165.8)
movie.names <- c("A new hope", "The empire strikes back", "Return of the Jedi")
col.titles <- c("US", "non-US")
star.wars.matrix <- matrix(box.office.all, nrow = 3, byrow = TRUE, dimnames = list(movie.names, 
    col.titles))

# Construct matrix2:
box.office.all2 <- c(474.5, 552.5, 310.7, 338.7, 380.3, 468.5)
movie.names2 <- c("The Phantom Menace", "Attack of the Clones", "Revenge of the Sith")
star.wars.matrix2 <- matrix(box.office.all2, nrow = 3, byrow = TRUE, dimnames = list(movie.names2, 
    col.titles))


# Box office Star Wars: In Millions (!)
star.wars.matrix  # Matrix containing first trilogy box office
star.wars.matrix2  # Matrix containing second trilogy box office

# Combine the both Star Wars trilogies in one matrix
all.wars.matrix <- rbind(star.wars.matrix, star.wars.matrix2)

Selection of matrix elements

Similar to vectors, use the square brackets [ ] to select one or multiple elements from a matrix. Whereas vectors have 1 dimension, matrices have 2 dimensions, therefore use a comma to separate what to select from the rows and what from the columns. For example:

  • my.matrix[1,2] selects from the first row the second element
  • my.matrix[1:3,2:4] selects rows 1,2,3 and columns 2,3,4.

If you want to select all elements of a row or column, no number is needed before or after the comma:

  • my.matrix[,1] selects all elements of the first column
  • my.matrix[1,] selects all elements of the first row.

Back to Star Wars with this newly acquired knowledge:

*** =instructions

  1. Calculate the average Non-US revenue per movie. Assign this to the non.us.all variable (In other words: take the average of all elements of the second column).
  2. Same question, but now only for the first two Star Wars movies.

Hint: Use the function mean() to compute the average.

*** =hint You can use the function mean() to calculate the average of the inputs to the function. Remember that you can select all rows of a matrix for a specific column x, by typing my.matrix[,x]. Non-US is the second column.

*** =sample_code

# Box office Star Wars: In Millions (!) 
# Construct matrix: 
box.office.all <- c(461, 314.4,290.5, 247.9,309.3,165.8)
movie.names    <- c("A new hope","The empire strikes back","Return of the Jedi")
col.titles     <-  c("US","non-US")
star.wars.matrix  <- matrix(box.office.all, nrow=3, byrow=TRUE,  
                         dimnames=list(movie.names,col.titles)) 

star.wars.matrix 

non.us.all  <-     #your code here
non.us.some <- #your code here

# Print both averages  to console:
non.us.all
non.us.some

*** =solution

# Box office Star Wars: In Millions (!)  Construct matrix:
box.office.all <- c(461, 314.4, 290.5, 247.9, 309.3, 165.8)
movie.names <- c("A new hope", "The empire strikes back", "Return of the Jedi")
col.titles <- c("US", "non-US")
star.wars.matrix <- matrix(box.office.all, nrow = 3, byrow = TRUE, dimnames = list(movie.names, 
    col.titles))

star.wars.matrix

non.us.all <- mean(star.wars.matrix[, 2])
non.us.some <- mean(star.wars.matrix[1:2, 2])

# Print to console both averages:
non.us.all
non.us.some

*** =sct

names <- c("star.wars.matrix", "non.us.all", "non.us.some")
values <- c("star.wars.matrix", "mean( star.wars.matrix[,2] )", "mean( star.wars.matrix[1:2,2] )")
DM.result <- closed_test(names, values)

*** =pre_exercise_code


A little arithmetic with matrices

Similar to what you learned with vectors, the standard operators like +, -, /, *, etc. work in an element-wise way on matrices in R.

For example: 2*my.matrix multiplies each element of my.matrix by two.

As a newly-hired data analyst for Lucasfilm, your job is to find out how many visitors went to each movie for each geographical area. You already have the total revenue figures in star.wars.matrix. Assume that the price of a ticket was 5 dollars. Note that box office numbers divided by the ticket price gives you the number of visitors.

*** =instructions

  1. Assign to visitors the matrix with the estimated number of Non-US and US visitors for the three movies.

*** =hint The number of visitors is the revenue (which is stored in star.wars.matrix ) divided by the price of ticket (assumed to be $5).

*** =sample_code

# Box office Star Wars: In Millions (!) 
# Construct matrix: 
box.office.all <- c(461, 314.4,290.5, 247.9,309.3,165.8)
movie.names    <- c("A new hope","The empire strikes back","Return of the Jedi")
col.titles     <- c("US","non-US")
star.wars.matrix      <- matrix(box.office.all, nrow=3, byrow=TRUE,dimnames=list(movie.names,col.titles)) 

visitors <- #your code here!
  
# Show me (also in millions!) the 
visitors

*** =solution

# Box office Star Wars: In Millions (!)  Construct matrix:
box.office.all <- c(461, 314.4, 290.5, 247.9, 309.3, 165.8)
movie.names <- c("A new hope", "The empire strikes back", "Return of the Jedi")
col.titles <- c("US", "non-US")
star.wars.matrix <- matrix(box.office.all, nrow = 3, byrow = TRUE, dimnames = list(movie.names, 
    col.titles))

visitors <- star.wars.matrix/5

# Show me (also in millions!) the
visitors

*** =sct

names <- c("star.wars.matrix", "visitors")
values <- c("star.wars.matrix", "star.wars.matrix/5")
DM.result <- closed_test(names, values)

*** =pre_exercise_code


A little arithmetic with matrices (2)

Just like 2*my.matrix multiplied every element of my.matrix by two, my.matrix1 * my.matrix2 creates a matrix where each element is the product of the corresponding elements in my.matrix1 and my.matrix2.

After looking at the result of the previous exercise, big boss Lucas pointed out that the ticket prices went up over time with one dollar per movie. He asks to redo the analysis based on the prices you can find in ticket.prices.matrix (source: imagination).

(Those familiar with matrices should note this is not the standard matrix multiplication for which you should use %*% in R)

*** =instructions

  1. Assign to visitors the matrix with your estimated number of Non-US and US visitors for the three movies.
  2. Assign to average.us.visitor the average number of visitors in the US for a Star Wars movie.
  3. Assign to average.non.us.visitor the average number of visitors in non-US areas.

*** =hint

  • You can use the function mean() to calculate the average of the inputs to the function.
  • To get the number of visitors in the US, select the first column from visitors using visitors[ ,1]

*** =sample_code

# Box office Star Wars: In Millions (!) 
# Construct matrix: 
box.office.all <- c(461, 314.4,290.5, 247.9,309.3,165.8)
movie.names    <- c("A new hope","The empire strikes back","Return of the Jedi")
col.titles     <- c("US","non-US")
star.wars.matrix      <- matrix(box.office.all, nrow=3, byrow=TRUE,dimnames=list(movie.names,col.titles)) 
ticket.prices.matrix  <- matrix( c(5,5,6,6,7,7), nrow=3,byrow=TRUE,dimnames=list(movie.names,col.titles)) 

visitors <- #your code here
average.us.visitor <- #your code here
average.non.us.visitor <- #your code here

#Show me the (all in Millions)
visitors
average.us.visitor
average.non.us.visitor

*** =solution

# Box office Star Wars: In Millions (!)  Construct matrix:
box.office.all <- c(461, 314.4, 290.5, 247.9, 309.3, 165.8)
movie.names <- c("A new hope", "The empire strikes back", "Return of the Jedi")
col.titles <- c("US", "non-US")
star.wars.matrix <- matrix(box.office.all, nrow = 3, byrow = TRUE, dimnames = list(movie.names, 
    col.titles))
ticket.prices.matrix <- matrix(c(5, 7, 6, 8, 7, 9), nrow = 3, byrow = TRUE, 
    dimnames = list(movie.names, col.titles))

visitors <- star.wars.matrix/ticket.prices.matrix
average.us.visitor <- mean(visitors[, 1])
average.non.us.visitor <- mean(visitors[, 2])

# Show me the (all in Millions)
visitors
average.us.visitor
average.non.us.visitor

*** =sct

names <- c("star.wars.matrix", "ticket.prices.matrix", "visitors", "average.us.visitor", 
    "average.non.us.visitor")
values <- c("star.wars.matrix", "ticket.prices.matrix", "star.wars.matrix/ticket.prices.matrix", 
    "mean( visitors[,1] )", "mean( visitors[,2] )")
DM.result <- closed_test(names, values)

*** =pre_exercise_code