# Matrices and Dataframes
**Author : Sanket Dave**

---
***In this lesson, we'll cover matrices and data frames. Both represent 'rectangular' data types, meaning that they are used
 to store tabular data, with rows and columns.***

***The main difference, as you'll see, is that matrices can only contain a single class of data, while data frames can
 consist of many different classes of data.***

Let's create a vector containing the numbers 1 through 20 using the `:` operator. Store the result in a variable called
 my_vect.

In [1]:
my_vect <- 1:20

In [2]:
my_vect

Now, lets check the dimension of `my_vect` with `dim()` function

In [3]:
dim(my_vect)

NULL

It returned NULL value because a vector is a vector it has no dimensions. 

What it do have is `length()` let's check it.

In [4]:
length(my_vect)

Ah! That's what we wanted. But, what happens if we give my_vect a `dim` attribute? Let's give it a try. 


In [6]:
dim(my_vect) <- c(4, 5)

It's okay if that last command seemed a little strange to you. It should! The dim() function allows you to get OR set the
`dim` attribute for an R object. In this case, we assigned the value c(4, 5) to the `dim` attribute of my_vect.

Use dim(my_vect) to confirm that we've set the `dim` attribute correctly.

In [9]:
dim(my_vect)

Another way to see this is by calling the `attributes()` function on my_vect. Try it now.

In [10]:
attributes(my_vect)

Just like in math class, when dealing with a 2-dimensional object (think rectangular table), the first number is the
 number of rows and the second is the number of columns. Therefore, we just gave my_vect 4 rows and 5 columns.

But, wait! That doesn't sound like a vector any more. Well, it's not. Now it's a matrix. View the contents of my_vect
 now to see what it looks like.

In [13]:
my_vect

0,1,2,3,4
1,5,9,13,17
2,6,10,14,18
3,7,11,15,19
4,8,12,16,20


Now, let's confirm it's actually a matrix by using the class() function. Type class(my_vect) to see what I mean.

In [14]:
class(my_vect)

Sure enough, my_vect is now a matrix. We should store it in a new variable that helps us remember what it is. Store the
 value of my_vect in a new variable called my_matrix.

In [15]:
my_matrix <- my_vect

Bring up the help file for the matrix() function now using the `?` function.


In [16]:
?matrix

Now, look at the documentation for the matrix function and see if you can figure out how to create a matrix containing
 the same numbers (1-20) and dimensions (4 rows, 5 columns) by calling the matrix() function. Store the result in a
 variable called my_matrix2.

In [17]:
my_matrix2 <- matrix(1:20,4,5)

Finally, let's confirm that my_matrix and my_matrix2 are actually identical. The identical() function will tell us if its
 first two arguments are the same. Try it out.

In [18]:
identical(my_matrix,my_matrix2)

Now, imagine that the numbers in our table represent some measurements from a clinical experiment, where each row
 represents one patient and each column represents one variable for which measurements were taken.

We may want to label the rows, so that we know which numbers belong to each patient in the experiment. One way to do this
 is to add a column to the matrix, which contains the names of all four people.

Let's start by creating a character vector containing the names of our patients -- Bill, Gina, Kelly, and Sean. Remember
 that double quotes tell R that something is a character string. Store the result in a variable called patients.

In [19]:
 patients <- c("Bill","Gina","Kelly","Sean")

Now we'll use the cbind() function to 'combine columns'. Don't worry about storing the result in a new variable. Just
 call cbind() with two arguments -- the patients vector and my_matrix.

In [20]:
cbind(patients,my_matrix)

patients,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5
Bill,1,5,9,13,17
Gina,2,6,10,14,18
Kelly,3,7,11,15,19
Sean,4,8,12,16,20


Something is fishy about our result! It appears that combining the character vector with our matrix of numbers caused
 everything to be enclosed in double quotes. This means we're left with a matrix of character strings, which is no good.

If you remember back to the beginning of this lesson, I told you that matrices can only contain ONE class of data.
 Therefore, when we tried to combine a character vector with a numeric matrix, R was forced to 'coerce' the numbers to
 characters, hence the double quotes.

This is called 'implicit coercion', because we didn't ask for it. It just happened. But why didn't R just convert the
 names of our patients to numbers? I'll let you ponder that question on your own.

So, we're still left with the question of how to include the names of our patients in the table without destroying the
 integrity of our numeric data. Try the following -- my_data <- data.frame(patients, my_matrix)

In [21]:
my_data <- data.frame(patients,my_matrix)

In [22]:
my_data

patients,X1,X2,X3,X4,X5
Bill,1,5,9,13,17
Gina,2,6,10,14,18
Kelly,3,7,11,15,19
Sean,4,8,12,16,20


 It looks like the data.frame() function allowed us to store our character vector of names right alongside our matrix of
 numbers. That's exactly what we were hoping for!

Behind the scenes, the data.frame() function takes any number of arguments and returns a single object of class
 `data.frame` that is composed of the original objects.

Let's confirm this by calling the class() function on our newly created data frame.


In [23]:
class(my_data)

It's also possible to assign names to the individual rows and columns of a data frame, which presents another possible
 way of determining which row of values in our table belongs to each patient.

However, since we've already solved that problem, let's solve a different problem by assigning names to the columns of
 our data frame so that we know what type of measurement each column represents.


Since we have six columns (including patient names), we'll need to first create a vector containing one element for each
 column. Create a character vector called cnames that contains the following values (in order) -- "patient", "age",
 "weight", "bp", "rating", "test".

In [24]:
cnames <- c("patient","age","weight","bp","rating","test")

Now, use the colnames() function to set the `colnames` attribute for our data frame. This is similar to the way we used
 the dim() function earlier in this lesson.

In [25]:
colnames(my_data) <- cnames

In [26]:
my_data

patient,age,weight,bp,rating,test
Bill,1,5,9,13,17
Gina,2,6,10,14,18
Kelly,3,7,11,15,19
Sean,4,8,12,16,20


In this lesson, you learned the basics of working with two very important and common data structures -- matrices and data
 frames. There's much more to learn and we'll be covering more advanced topics, particularly with respect to data frames,
 in future lessons.
 
**Source: [Swirlstats](swirlstats.com/students.html)**