# Basics of R

In [1]:
# Assignment & Adding
my_apples <- 5
my_oranges <- 6

my_fruit <- my_apples + my_oranges
my_fruit

### <b>ls()</b> lists all the available variables in our environment.

In [2]:
ls()

## Basic data types in R
#### R works with numerous data types. Some of the most basic types to get started are:

- Decimal values like 4.5 are called numerics.
- Natural numbers like 4 are called integers. Integers are also numerics.
- Boolean values (TRUE or FALSE) are called logical.
- Text (or string) values are called characters.

In [3]:
my_numeric <- 42

my_character <- "universe"

my_logical <- FALSE

#### Checking type or class of every variable

In [4]:
class(my_numeric)

class(my_character)

class(my_logical)

## Vectors

Vectors are one-dimension arrays that can hold numeric data, character data, or logical data. In other words, a vector is a simple tool to store data. <br>For example, you can store your daily gains and losses in the casinos.

In R, you create a vector with the combine function <b>c()</b>. 
You place the vector elements separated by a comma between the parentheses. 

In [5]:
numeric_vector <- c(1, 10, 49)
character_vector <- c("a", "b", "c")
boolean_vector <- c(TRUE,FALSE,TRUE)

#### EXAMPLE:

In [6]:
# Defining a vector
a_vector <- c(140, -50, 20, -120, 240)

# Assigning name to the vector elements
names(a_vector) <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")

a_vector

In [7]:
is.vector(a_vector)

In [8]:
typeof(a_vector)

In [9]:
# if you sum two vectors in R, it takes the element-wise sum. 
# For example, the following three statements are completely equivalent:

c(1, 2, 3) + c(4, 5, 6)
c(1 + 4, 2 + 5, 3 + 6)
c(5, 7, 9)

In [10]:
# sum() calculates the sum of all elements of a vector
sum(a_vector)

### Vector selection

In [11]:
a_vector[3]

In [12]:
# Selecting multiple elements from a vector
a_vector[c(2,3,4)]

In [13]:
a_vector[2:4]

### The (logical) comparison operators known to R are:

##### < for less than
##### > for greater than
##### <= for less than or equal to
##### >= for greater than or equal to
##### == for equal to each other
##### != not equal to each other

In [14]:
a_vector > 0

## Matrix

In R, a matrix is a collection of elements of the same data type (numeric, character, or logical) arranged into a fixed number of rows and columns.<br>
Since you are only working with rows and columns, a matrix is called two-dimensional.

You can construct a matrix in R with the <b>matrix()</b> function. 

In the matrix() function:

- The first argument is the collection of elements that R will arrange into the rows and columns of the matrix. Here, we use 1:9 which is a shortcut for c(1, 2, 3, 4, 5, 6, 7, 8, 9).
- The argument <b>byrow</b> indicates that the matrix is filled by the rows. If we want the matrix to be filled by the columns, we just place byrow = FALSE.
- The third argument <b>nrow</b> indicates that the matrix should have three rows.

In [15]:
matrix(1:9, byrow = TRUE, nrow = 3)

0,1,2
1,2,3
4,5,6
7,8,9


### Similar to vectors, you can add names for the rows and the columns of a matrix

- rownames(my_matrix) <- row_names_vector
- colnames(my_matrix) <- col_names_vector

In [16]:
# An Example of matrix

# Box office Star Wars (in millions!)
new_hope <- c(460.998, 314.4)
empire_strikes <- c(290.475, 247.900)
return_jedi <- c(309.306, 165.8)

# Construct matrix
star_wars_matrix <- matrix(c(new_hope, empire_strikes, return_jedi), nrow = 3, byrow = TRUE)

# Vectors region and titles, used for naming
region <- c("US", "non-US")
titles <- c("A New Hope", "The Empire Strikes Back", "Return of the Jedi")

# Name the columns with region
colnames(star_wars_matrix) <- region

# Name the rows with titles
rownames(star_wars_matrix) <- titles

# Print out star_wars_matrix
star_wars_matrix

Unnamed: 0,US,non-US
A New Hope,460.998,314.4
The Empire Strikes Back,290.475,247.9
Return of the Jedi,309.306,165.8


#### In R, the function rowSums() conveniently calculates the totals for each row of a matrix. This function creates a new vector

In [17]:
rowSums(star_wars_matrix)

#### Similarly we have colSums which calculates the totals for each column of a matrix.

In [18]:
colSums(star_wars_matrix)

#### You can add a column or multiple columns to a matrix with the cbind() function, which merges matrices and/or vectors together by column.

In [19]:
# The worldwide box office figures
worldwide_vector <- rowSums(star_wars_matrix)

# Bind the new variable worldwide_vector as a column to star_wars_matrix
all_wars_matrix <- cbind(star_wars_matrix,worldwide_vector)
all_wars_matrix

Unnamed: 0,US,non-US,worldwide_vector
A New Hope,460.998,314.4,775.398
The Empire Strikes Back,290.475,247.9,538.375
Return of the Jedi,309.306,165.8,475.106


#### You can add two matrices with the rbind() function, which merges matrices together by row.

### Selection of matrix elements
Similar to vectors, you can use the square brackets <b>[ ]</b> to select one or multiple elements from a matrix.<br>
Whereas vectors have one dimension, matrices have two dimensions. You should therefore use a comma to separate the rows you want to select from the columns. <br>
For example:

- my_matrix[1,2] selects the element at the first row and second column.
- my_matrix[1:3,2:4] results in a matrix with the data on the rows 1, 2, 3 and columns 2, 3, 4.
- If you want to select all elements of a row or a column, no number is needed before or after the comma, respectively:
- my_matrix[,1] selects all elements of the first column.
- my_matrix[1,] selects all elements of the first row.

### A little arithmetic with matrices
Similar to what you have learned with vectors, the standard operators like +, -, /, *, etc. work in an element-wise way on matrices in R.

For example, 2 * my_matrix multiplies each element of my_matrix by two.

Just like 2 * my_matrix multiplied every element of my_matrix by two, my_matrix1 * my_matrix2 creates a matrix where each element is the product of the corresponding elements in my_matrix1 and my_matrix2.

## Factors

The term factor refers to a statistical data type used to store categorical variables. <br> 
The difference between a categorical variable and a continuous variable is that a categorical variable can belong to a limited number of categories. A continuous variable, on the other hand, can correspond to an infinite number of values.

It is important that R knows whether it is dealing with a continuous or a categorical variable, as the statistical models you will develop in the future treat both types differently.

A good example of a categorical variable is sex. In many circumstances you can limit the sex categories to "Male" or "Female".

To create factors in R, you make use of the function <b>factor()</b>. First thing that you have to do is create a vector that contains all the observations that belong to a limited number of categories. 

In [20]:
# Gender vector
gender_vector <- c("Male", "Female", "Female", "Male", "Male")

# Convert gender_vector to a factor
factor_gender_vector <- factor(gender_vector)

# Print out factor_gender_vector
factor_gender_vector

#### There are two types of categorical variables: a nominal categorical variable and an ordinal categorical variable.

A nominal variable is a categorical variable without an implied order. This means that it is impossible to say that 'one is worth more than the other'. For example, think of the categorical variable animals_vector with the categories "Elephant", "Giraffe", "Donkey" and "Horse". Here, it is impossible to say that one stands above or below the other. 

In contrast, ordinal variables do have a natural ordering. Consider for example the categorical variable temperature_vector with the categories: "Low", "Medium" and "High". Here it is obvious that "Medium" stands above "Low", and "High" stands above "Medium".

In [21]:
# Animals (NOMINAL)
animals_vector <- c("Elephant", "Giraffe", "Donkey", "Horse", "Donkey")
factor_animals_vector <- factor(animals_vector)
factor_animals_vector

# Temperature (ORDINAL)
temperature_vector <- c("High", "Low", "High","Low", "Medium")
factor_temperature_vector <- factor(temperature_vector, order = TRUE, levels = c("Low", "Medium", "High"))
factor_temperature_vector

## DataFrame

You may remember from the chapter about matrices that all the elements that you put in a matrix should be of the same type. Back then, your data set on Star Wars only contained numeric elements.

When doing a market research survey, however, you often have questions such as:

- 'Are you married?' or 'yes/no' questions (logical)
- 'How old are you?' (numeric)
- 'What is your opinion on this product?' or other 'open-ended' questions (character)
- ...

The output, namely the respondents' answers to the questions formulated above, is a data set of different data types. You will often find yourself working with data sets that contain different data types instead of only one.

<b>A data frame has the variables of a data set as columns and the observations as rows. 

The function <b>head()</b> enables you to show the first 6 observations of a data frame.<br>
Similarly, the function <b>tail()</b> prints out the last observations in your data set.

The function <b>str()</b> shows you the structure of your data set. For a data frame it tells you:

- The total number of observations (e.g. 32 car types)
- The total number of variables (e.g. 11 car features)
- A full list of the variables names (e.g. mpg, cyl ... )
- The data type of each variable (e.g. num)
- The first observations`

You construct a data frame with the <b>data.frame()</b> function. <br>As arguments, you pass the vectors from before: they will become the different columns of your data frame. Because every column has the same length, the vectors you pass should also have the same length. <br>But don't forget that it is possible (and likely) that they contain different types of data.

In [22]:
# An Example
name <- c("Mercury", "Venus", "Earth", "Mars", "Jupiter", "Saturn", "Uranus", "Neptune")
type <- c("Terrestrial planet", "Terrestrial planet", "Terrestrial planet", 
          "Terrestrial planet", "Gas giant", "Gas giant", "Gas giant", "Gas giant")
diameter <- c(0.382, 0.949, 1, 0.532, 11.209, 9.449, 4.007, 3.883)
rotation <- c(58.64, -243.02, 1, 1.03, 0.41, 0.43, -0.72, 0.67)
rings <- c(FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE)

# Create a data frame from the vectors
planets_df <-  data.frame(name, type, diameter, rotation ,rings)
planets_df

name,type,diameter,rotation,rings
Mercury,Terrestrial planet,0.382,58.64,False
Venus,Terrestrial planet,0.949,-243.02,False
Earth,Terrestrial planet,1.0,1.0,False
Mars,Terrestrial planet,0.532,1.03,False
Jupiter,Gas giant,11.209,0.41,True
Saturn,Gas giant,9.449,0.43,True
Uranus,Gas giant,4.007,-0.72,True
Neptune,Gas giant,3.883,0.67,True


In [23]:
# Check the structure of planets_df
str(planets_df)

'data.frame':	8 obs. of  5 variables:
 $ name    : Factor w/ 8 levels "Earth","Jupiter",..: 4 8 1 3 2 6 7 5
 $ type    : Factor w/ 2 levels "Gas giant","Terrestrial planet": 2 2 2 2 1 1 1 1
 $ diameter: num  0.382 0.949 1 0.532 11.209 ...
 $ rotation: num  58.64 -243.02 1 1.03 0.41 ...
 $ rings   : logi  FALSE FALSE FALSE FALSE TRUE TRUE ...


### Selection of data frame elements
Similar to vectors and matrices, you select elements from a data frame with the help of square brackets <b>[ ]</b>.<br> By using a comma, you can indicate what to select from the rows and the columns respectively. <br>
For example:

- my_df[1,2] selects the value at the first row and second column in my_df.
- my_df[1:3,2:4] selects rows 1, 2, 3 and columns 2, 3, 4 in my_df.

Sometimes you want to select all elements of a row or column. 
- For example, my_df[1, ] selects all elements of the first row. 

In [24]:
# Diameter of Mercury (row 1, column 3)
planets_df[1,3]

# Data for Mars (entire fourth row)
planets_df[4,]

Unnamed: 0,name,type,diameter,rotation,rings
4,Mars,Terrestrial planet,0.532,1.03,False


In [25]:
# Select first 5 values of diameter column
planets_df[1:5,"diameter"]

### Subset
<b>subset()</b> function is used to select a subset from a data frame based on whether or not a certain condition is true.

## <center>subset(my_df, subset = some_condition)</center>

- The first argument of subset() specifies the data set for which you want a subset. 
- By adding the second argument, you give R the necessary information and conditions to select the correct subset.

In [26]:
subset(planets_df, subset = rings)

Unnamed: 0,name,type,diameter,rotation,rings
5,Jupiter,Gas giant,11.209,0.41,True
6,Saturn,Gas giant,9.449,0.43,True
7,Uranus,Gas giant,4.007,-0.72,True
8,Neptune,Gas giant,3.883,0.67,True


### Sorting a Data Frame

<b>order()</b> is a function that gives you the ranked position of each element when it is applied on a variable, such as a vector.

In [27]:
# for example
a <- c(100, 10, 1000)
order(a)

In [28]:
# Sort on the diameter column.
# Use order() to create positions
positions <-  order(planets_df$diameter)

# Use positions to sort planets_df
planets_df[positions,]

Unnamed: 0,name,type,diameter,rotation,rings
1,Mercury,Terrestrial planet,0.382,58.64,False
4,Mars,Terrestrial planet,0.532,1.03,False
2,Venus,Terrestrial planet,0.949,-243.02,False
3,Earth,Terrestrial planet,1.0,1.0,False
8,Neptune,Gas giant,3.883,0.67,True
7,Uranus,Gas giant,4.007,-0.72,True
6,Saturn,Gas giant,9.449,0.43,True
5,Jupiter,Gas giant,11.209,0.41,True


## Lists

### Lists, why would you need them?

- Vectors (one dimensional array): can hold numeric, character or logical values. The elements in a vector all have the same data type.
- Matrices (two dimensional array): can hold numeric, character or logical values. The elements in a matrix all have the same data type.
- Data frames (two-dimensional objects): can hold numeric, character or logical values. Within a column all elements have the same data type, but different columns can be of different data type.

A list in R is similar to your to-do list at work or school: the different items on that list most likely differ in length, characteristic, and type of activity that has to be done.

A list in R allows you to gather a variety of objects under one name (that is, the name of the list) in an ordered way. These objects can be matrices, vectors, data frames, even other lists, etc. It is not even required that these objects are related to each other in any way.

You could say that a list is some kind super data type: you can store practically any piece of information in it!

To construct a list you use the function <b>list()</b>:

In [29]:
# Vector with numerics from 1 up to 10
my_vector <- 1:10 

# Matrix with numerics from 1 up to 9
my_matrix <- matrix(1:9, ncol = 3)

# First 10 elements of the built-in data frame mtcars
my_df <- mtcars[1:5,]

# Construct list with these different elements:
my_list <- list(my_vector,my_matrix,my_df)
my_list

0,1,2
1,4,7
2,5,8
3,6,9

Unnamed: 0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Mazda RX4,21.0,6,160,110,3.9,2.62,16.46,0,1,4,4
Mazda RX4 Wag,21.0,6,160,110,3.9,2.875,17.02,0,1,4,4
Datsun 710,22.8,4,108,93,3.85,2.32,18.61,1,1,4,1
Hornet 4 Drive,21.4,6,258,110,3.08,3.215,19.44,1,0,3,1
Hornet Sportabout,18.7,8,360,175,3.15,3.44,17.02,0,0,3,2


### Naming a list

Just like on your to-do list, you want to avoid not knowing or remembering what the components of your list stand for. That is why you should give names to them:

```
my_list <- list(name1 = your_comp1, 
                name2 = your_comp2)
```
                
This creates a list with components that are named name1, name2, and so on. <br>If you want to name your lists after you've created them, you can use the <b>names()</b> function as you did with vectors. <br>The following commands are fully equivalent to the assignment above:
```
my_list <- list(your_comp1, your_comp2)
names(my_list) <- c("name1", "name2")
```

### Selecting elements from a list

One way to select a component is using the numbered position of that component.<br>
For example, to "grab" the first component of shining_list you type
```
shining_list[[1]]
```

An important to remember: to select elements from vectors, you use single square brackets: [ ].

You can also refer to the names of the components, with <b>[[ ]]</b> or with the <b>$</b> sign. Both will select the data frame representing the reviews:



```
shining_list[["reviews"]]
shining_list$reviews
```

Besides selecting components, you often need to select specific elements out of these components. For example, with shining_list[[2]][1] you select from the second component, actors (shining_list[[2]]), the first element ([1]).

### Adding to the list
To conveniently add elements to lists you can use the <b>c()</b> function, that you also used to build vectors:
```
ext_list <- c(my_list , my_val)```
This will simply extend the original list, my_list, with the component my_val. This component gets appended to the end of the list. <br>If you want to give the new list item a name, you just add the name as you did before:

```ext_list <- c(my_list, my_name = my_val)```

# -------------------------------------------------------------------------------------