<a class="anchor" id="jump_to_top"></a>
# Coding Basics
---

### Table of Contents
* [R as a calculator](#calc)
* [Calling functions](#functions)
* [Commenting](#commenting)
* [Common data types in R](#data_types)
    * [Numeric](#Numeric)
    * [Integer](#Integer)
    * [Logical (boolean)](#boolean)
    * [Characters](#Characters)
* [Common data structures in R](#data_str)
    * [Atomic Vector](#vector)
        * [Coercion](#Coercion)
        * [Missing values](#NA)
    * [List](#List)
    * [Factors](#Factors)
    * [Matrix and Array](#mat_arr)
    * [Data frame](#Dataframe)
* [Mathematical Operations](#math)
* [Comparison operators](#Comparison)
* [Tibbles](#Tibbles)

In [1]:
# loading libraries
library(tidyverse)

── Attaching packages ─────────────────────────────────────── tidyverse 1.2.1 ──
✔ ggplot2 3.0.0     ✔ purrr   0.2.5
✔ tibble  1.4.2     ✔ dplyr   0.7.6
✔ tidyr   0.8.1     ✔ stringr 1.3.1
✔ readr   1.1.1     ✔ forcats 0.3.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()


<div style="text-align: right"> [[Jump to top]](#jump_to_top) </div>

<a class="anchor" id="calc"></a>
## R as a calculator 

In [2]:
8 * 7

In [3]:
10 / 5 + 5

In [4]:
10 / (5 + 5)

In [5]:
pi

In [6]:
sin(pi/6)

## Create new objects by "`<-`"

In [7]:
x <- 4

In [8]:
x

All R statements where you create objects, assignment statements, have the same form:

`myObject <- value`

When reading that code say "myObject gets value" in your head.

Shortcut: **Alt** + **-** results in ` <- `

<div style="text-align: right"> [[Jump to top]](#jump_to_top) </div>
<a class="anchor" id="functions"></a>
## Calling functions
R has a large collection of built-in functions that are called like this:

```FunctionName(arg1 = val1, arg2 = val2, ...)```

In [9]:
seq(1, 100)

In [10]:
#?seq

In [11]:
seq(1, 100, length.out = 5)

In [12]:
seq(1, 100, by = 10)

In [13]:
myList <- seq(1, 100, by = 10)

In [14]:
myList

In [15]:
(myList <- seq(1, 100, by = 10))

In [16]:
myChar <- "hello world"

In [17]:
myChar

Now look at your RStudio environment in the upper right pane:

<img src="../png/rstudio_glb_env.png" width="400px" align="left">

--- 
### Exercise 1
Tweak each of the following R commands so that they run correctly:

```
ggplot(dota = mpg) 
  + geom_point(mapping = aes(x = displ, y = hwy))

fliter(mpg, cyl = 8)

filter(diamond, carat > 3)```

---
In RStudio press Alt + Shift + K to get all the shortcuts.
<div style="text-align: right"> [[Jump to top]](#jump_to_top) </div>
<a class="anchor" id="commenting"></a>
### Commenting along the code
Use "#" followed by a space and your comment. R won't read anything after "#"

In [18]:
# This is a comment. It's a good practice to leave a space after the hash tag for redability

One can also leave a comment after the code in the same line. A good practice is to leave 2 spaces after the code, hash tag, a single space, and then comment. E.g.:

In [19]:
accountBalance <- 100
accountBalance <- accountBalance + 500  # adding 500 to accountBalance
accountBalance

<div style="text-align: right"> [[Jump to top]](#jump_to_top) </div>
<a class="anchor" id="data_types"></a>
## Common data types in R
There are several data types associated with objects in R. We can identify the class of data types in these objects by `class()` function. Everything in R is an object, so we can use `class()` on any object.

* Numeric
* Integer
* Logical (boolean)
* Character

<a class="anchor" id="Numeric"></a>
### Numeric
Decimal objects

Note: If integers are not explicitly declared as integers they will be considered in numeric class as well.

In [20]:
x1 <- 2.2
class(x1)

In [21]:
x2 <- 2
class(x2)

In [22]:
is.integer(x2)

<a class="anchor" id="Integer"></a>
### Integer
To work with integer class we can use `as.integer()` to explicitly set an integer class:

In [23]:
y1 <- as.integer(2)
class(y1)

In [24]:
is.integer(y1)

One could also use `L` notation:

In [25]:
y2 <- 3L
print(y2)
class(y2)

[1] 3


<a class="anchor" id="boolean"></a>
### Logical (boolean)

In [26]:
a <- TRUE
a

In [27]:
T

In [28]:
b <- FALSE
b

In [29]:
F

In [30]:
class(b)

<a class="anchor" id="Characters"></a>
### Characters
We can use double quotes or single quotes to define characters

In [31]:
myChar <- "hello world"
myChar

In [32]:
class(myChar)

<div style="text-align: right"> [[Jump to top]](#jump_to_top) </div>
<a class="anchor" id="data_str"></a>
## Common data structures in R


| Dimension | Homogeneous | Heterogeneous |
|:---|:---|:---|
| 1d | Atomic Vector | List |
| 2d | Matrix | Data frame |
| nd | Array | - | 

<a class="anchor" id="vector"></a>
### Atomic Vector
There are four common types of atomic vectors that we will discuss here: numeric ( aka double), integer, character, and logical. As you can tell we are familiar with all these types. Vectors in R are collections of data types, all the elements of a vector have the same type.

Atomic vectors are typically created with c(), short for combine:

In [33]:
numVec <- c(0.01, 5, 90)
print(numVec)
class(numVec)

[1]  0.01  5.00 90.00


In [34]:
intVec <- c(1L, 5L, 9L)
print(intVec)
class(intVec)

[1] 1 5 9


In [35]:
charVec <- c("a", "b", "abc-xyz")
print(charVec)
class(charVec)

[1] "a"       "b"       "abc-xyz"


In [36]:
boolVec <- c(T, F, FALSE, TRUE)
print(boolVec)
class(boolVec)

[1]  TRUE FALSE FALSE  TRUE


We can use `length()` on any of these vectors to get their length:

In [37]:
length(numVec)

In [38]:
length(boolVec)

Check the type of a vector with "is" functions: `is.character()`, `is.numeric()`, `is.integer()`, `is.logical()`, or, more generally, `is.atomic()`:

In [39]:
is.numeric(numVec)

In [40]:
is.atomic(numVec)

In [41]:
is.character(numVec)

<a class="anchor" id="Coercion"></a>
#### Coercion
If you try to combine different data types together by `c()`, R will automatically coerce:

In [42]:
c('foo', 'bar', 4)

In [43]:
c(TRUE, 3, FALSE)

Note: You will usually get a warning message if the coercion might lose information. If confusion is likely, explicitly coerce with `as.character()`, `as.double()`, `as.integer()`, or `as.logical()`.

<a class="anchor" id="NA"></a>
#### Missing values
Missing values are noted with NA:

In [44]:
c(1, NA, 8)

<a class="anchor" id="List"></a>
## List
Lists are similar to vectors with the major difference that their elements can contain any types, including lists. We make them by `list()`:

In [45]:
myList <- list(5, "Sample character element", TRUE, c(2.71, 3.14), list("a", 6))
print(myList)

[[1]]
[1] 5

[[2]]
[1] "Sample character element"

[[3]]
[1] TRUE

[[4]]
[1] 2.71 3.14

[[5]]
[[5]][[1]]
[1] "a"

[[5]][[2]]
[1] 6




You can use function `str()` to compactly display the internal **str**ucture of an R object:

In [46]:
str(myList)

List of 5
 $ : num 5
 $ : chr "Sample character element"
 $ : logi TRUE
 $ : num [1:2] 2.71 3.14
 $ :List of 2
  ..$ : chr "a"
  ..$ : num 6


In [47]:
class(myList)

Accessing elements in a list:

In [48]:
myList[[4]]

In [49]:
myList[[4]][[2]]

In [50]:
myList[[5]]

In [51]:
myList[[5]][[1]]

#### Names
Elements of a vector can have names. These names can be set in the creation of the vector or later by `names()`:

In [52]:
x <- 1:3
x

In [53]:
names(x)

NULL

In [54]:
names(x) <- c('a', 'b', 'c')

In [55]:
x

In [56]:
names(x)

While creation:

In [57]:
x <- c(a = 1, b = 2, c = 3)
x

In [58]:
highTemps <- c(96, 88, 94, 78, 85, 90, 72)
highTemps
days <- c("Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun")
names(highTemps) <- days
highTemps

In [59]:
highTemps['Mon']  # accessing the element and its name by calling its name

In [60]:
highTemps[['Mon']]

<a class="anchor" id="Factors"></a>
### Factors
Factors are a type of vector that contain only predefined values. It is used to store categorical data. For example:

In [61]:
genderVec <- c('f', 'f', 'm', 'f', 'm')
genderVec

In [62]:
(gender <- factor(genderVec, levels = c('m', 'f')))

In [63]:
# modifying an element
gender[1] <- 'm'
gender

In [64]:
# factors won't allow a modification that is not a part of `levels`
gender[2] <- 'x'
gender

“invalid factor level, NA generated”

In [65]:
# checking the levels
levels(gender)

<div style="text-align: right"> [[Jump to top]](#jump_to_top) </div>
<a class="anchor" id="mat_arr"></a>
# Matrix and Array
Matrices are a special case of a multi-dimensional array, with 2d. 

In [66]:
a <- matrix(1:9, ncol = 3, nrow = 3)
a

0,1,2
1,4,7
2,5,8
3,6,9


In [67]:
b <- matrix(1:9, ncol = 3, nrow = 3, byrow = TRUE)
b

0,1,2
1,2,3
4,5,6
7,8,9


In [68]:
c <- array(1:12, c(2, 3, 2))
print(c)  # in order to print out an array with more than 2 dimensions in Jupyter one needs to explicitely "print" it

, , 1

     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6

, , 2

     [,1] [,2] [,3]
[1,]    7    9   11
[2,]    8   10   12



In [69]:
# Note that a 2x2 array is a matrix
array(1:9, c(3, 3))

0,1,2
1,4,7
2,5,8
3,6,9


In [70]:
class(array(1:9, c(3, 3))) 

Naming columns and rows:

In [71]:
rowNames <- c("ROW1", "ROW2")
columnNames <- c("COL1", "COL2", "COL3")
matrixNames <- c("Matrix1", "Matrix2")

d <- array(1:12, c(2, 3, 2), dimnames = list(rowNames, columnNames, matrixNames))
print(d)

, , Matrix1

     COL1 COL2 COL3
ROW1    1    3    5
ROW2    2    4    6

, , Matrix2

     COL1 COL2 COL3
ROW1    7    9   11
ROW2    8   10   12



In [72]:
# Naming a matrix rows/columns
rowNames <- c("ROW1", "ROW2", "ROW2")
columnNames <- c("COL1", "COL2", "COL3")

mat <- matrix(1:9, ncol = 3, nrow = 3, dimnames = list(rowNames, columnNames))
mat

Unnamed: 0,COL1,COL2,COL3
ROW1,1,4,7
ROW2,2,5,8
ROW2,3,6,9


Accessing matrix elements:

In [73]:
print(mat[1, ])

COL1 COL2 COL3 
   1    4    7 


In [74]:
print(mat[, 1])

ROW1 ROW2 ROW2 
   1    2    3 


In [75]:
print(mat[, 'COL2'])

ROW1 ROW2 ROW2 
   4    5    6 


Manipulating matrix elements:

In [76]:
mat[1,2] <- 40
mat

Unnamed: 0,COL1,COL2,COL3
ROW1,1,40,7
ROW2,2,5,8
ROW2,3,6,9


In [77]:
# dimensions
dim(mat)

In [78]:
nrow(mat)

In [79]:
ncol(mat)

In [80]:
rownames(mat)

In [81]:
colnames(mat)

In [82]:
dimnames(mat)

In [83]:
# manipulating the names
dimnames(mat) <- list(c('A', 'B', 'C'), c('X', 'Y', 'Z'))
mat

Unnamed: 0,X,Y,Z
A,1,40,7
B,2,5,8
C,3,6,9


In [84]:
# To transpose a matrix
t(mat)

Unnamed: 0,A,B,C
X,1,2,3
Y,40,5,6
Z,7,8,9


<div style="text-align: right"> [[Jump to top]](#jump_to_top) </div>
<a class="anchor" id="Dataframe"></a>
### Data frame
A data frame is the most common way of storing data in R. Under the hood, a data frame is a list of equal-length vectors, so it is a 2-dimensional data structure.

A data frame has attributes such as `colnames()`, `rownames()`, `ncol()`, `nrow()`, `str()`

In [85]:
df <- data.frame(x = 1:3, y = c(TRUE, FALSE, TRUE), z = c("a", "b", "c"))
df

x,y,z
1,True,a
2,False,b
3,True,c


In [86]:
str(df)

'data.frame':	3 obs. of  3 variables:
 $ x: int  1 2 3
 $ y: logi  TRUE FALSE TRUE
 $ z: Factor w/ 3 levels "a","b","c": 1 2 3


Character variables passed to `data.frame` are converted to factor columns unless specified by `stringsAsFactors = FALSE`:

In [87]:
df <- data.frame(x = 1:3, y = c(TRUE, FALSE, TRUE), z = c("a", "b", "c"), stringsAsFactors = FALSE)
str(df)

'data.frame':	3 obs. of  3 variables:
 $ x: int  1 2 3
 $ y: logi  TRUE FALSE TRUE
 $ z: chr  "a" "b" "c"


**Slicing and indexing** 
Getting a subset of a `data.frame`

In [88]:
df <- data.frame(x = 1:3, y = c(TRUE, FALSE, TRUE), z = c("a", "b", "c"))
df

x,y,z
1,True,a
2,False,b
3,True,c


In [89]:
df$y  # column y, a data.frame subset

In [90]:
df[1]  # first column, a data.frame subset

x
1
2
3


In [91]:
df[[1]]  # returns the first column as a vector

In [92]:
df[2,]  # 2nd row

Unnamed: 0,x,y,z
2,2,False,b


In [93]:
df[2,3]  # 2nd row, 3rd column element

In [94]:
df[c(2,3), c(2,3)]  # intersection of 2nd and 3rd rows with 2nd and 3rd columns

Unnamed: 0,y,z
2,False,b
3,True,c


Subsetting vectors:

In [95]:
v1 <- 10:20
names(v1) <- letters[1:11]
v1

In [96]:
v1[1]

In [97]:
v1[[1]]

In [98]:
v1[5:8]

In [99]:
v1[c(1,3)]

#### Combining data frames
`cbind()` and `rbind()` can be used to combine two or more `data.frame`s, by column and row respectively.

In [100]:
m <- cbind(1, 1:5) # the '1' (= shorter vector) is recycled
m

0,1
1,1
1,2
1,3
1,4
1,5


In [101]:
cbind(m, 6:10) # insert a column

0,1,2
1,1,6
1,2,7
1,3,8
1,4,9
1,5,10


In [102]:
df1 <- data.frame(series = 1, Letter = letters[1:3], number = 1:3)
cat('df1')
df1
df2 <- data.frame(series = 2, Letter = LETTERS[4:6], number = 10:12)
cat('df2')
df2
cat('combine by row')
rbind(df1, df2)

df1

series,Letter,number
1,a,1
1,b,2
1,c,3


df2

series,Letter,number
2,D,10
2,E,11
2,F,12


combine by row

series,Letter,number
1,a,1
1,b,2
1,c,3
2,D,10
2,E,11
2,F,12


Note: In order for this to work all the column names should match. We will see how to merge data.frames if the names don't match in the future notebooks.

Non-matching names result in error:

In [103]:
df1 <- data.frame(A = 1, number = 1:3)
df2 <- data.frame(B = 2, number = 10:12)
#rbind(df1, df2)

> Error in match.names(clabs, names(xi)): names do not match previous names
Traceback:
1. rbind(df1, df2)
2. rbind(deparse.level, ...)
3. match.names(clabs, names(xi))
4. stop("names do not match previous names")

<div style="text-align: right"> [[Jump to top]](#jump_to_top) </div>
<a class="anchor" id="math"></a>
## Mathematical Operations

In [104]:
v1 <- c(1, 2, 5)
v2 <- c(3, 4, 2)
v1 + v2

In [105]:
v1 * v2

In [106]:
v1 / v2

In [107]:
10 * v1

In [108]:
sum(v2)

In [109]:
mean(v1)

In [110]:
prod(v1)

---

In [111]:
mat <- matrix(1:9, byrow = T, ncol = 3)
mat

0,1,2
1,2,3
4,5,6
7,8,9


In [112]:
mat * 2

0,1,2
2,4,6
8,10,12
14,16,18


In [113]:
mat + 10

0,1,2
11,12,13
14,15,16
17,18,19


In [114]:
colSums(mat)

In [115]:
rowSums(mat)

In [116]:
rowMeans(mat)

<div style="text-align: right"> [[Jump to top]](#jump_to_top) </div>
<a class="anchor" id="Comparison"></a>
## Comparison operators

In [117]:
2 > 1

In [118]:
10 == 10

In [119]:
4 >= 5

In [120]:
x = 12
y = 10

x != y  # checks if x is not equial to y. If not equal then returns TRUE!

In [121]:
v1 <- 10:20
names(v1) <- letters[1:11]
v1

In [122]:
v1 > 15  # returns a logical vector

In [123]:
# Filtering elements that are greater than 15
v1[v1 > 15]

---
### Exercise 2
Stock price matrix - follow the steps outlined below to create a matrix with some mocked data. Assume the closing price for each day of the week for these companies are given as below:

| Comp/Day | Mon | Tue | Wed | Thu | Fri |
|:---|:---:|:---:|:---:|:---:|:---:|
| GOOG | 1205 | 1248 | 1263 | 1268 | 1238 |
| AMZN | 1802 | 1829 | 1863 | 1808 | 1817 |

*Real values from last week of July 2018.*

For each company create a vector that contains all the values for that company. Use the symbol as names for these vectors

In [124]:
# your code goes here

Create a vector containing weekdays, call it `days`

In [125]:
# your code goes here

Assign `days` vector as names to all the vectors you created earlier

In [126]:
# your code goes here

Now using `rbind()` combine these two vectors into a matrix and call it stocksPrices

In [127]:
# your code goes here

We just got FB data for those days, let's add this to our matrix

| Comp/Day | Mon | Tue | Wed | Thu | Fri |
|:---|:---:|:---:|:---:|:---:|:---:|
| FB   | 210 | 214 | 217 | 176 | 174 |

In [128]:
# your code goes here

Your matrix should look something similar to 

| Comp/Day | Mon | Tue | Wed | Thu | Fri |
|:---|:---:|:---:|:---:|:---:|:---:|
| GOOG | 1205 | 1248 | 1263 | 1268 | 1238 |
| AMZN | 1802 | 1829 | 1863 | 1808 | 1817 |
| FB   | 210 | 214 | 217 | 176 | 174 |

Let's add a new column to this matrix to capture the average value of stocks for the week for each company, call it `average`

In [129]:
# your code goes here

You can access an entire column by `stocksPrices[,<COLUMN NUMBER or NAME>]`. Print the slice of data for `Mon` and `Fri` separately. Then create a new column called `percentageGain` and using those slices calculate and populate week's gains (in percentage) for these companies

In [130]:
# your code goes here

---
<div style="text-align: right"> [[Jump to top]](#jump_to_top) </div>
<a class="anchor" id="Tibbles"></a>

## Tibbles
Tibbles are basically refined data frames, but they tweak some older behaviors to make life a little easier. Unfortunately, this is not entirely the case for when we use Jupyter notebooks to run R code, but in RStudio tibbles bring lots of comfort. In Jupyter use `print()` to print a tibble, otherwise many rows will be printed.

**tibble** is one of the libraries that come with tidyverse. In order to use its functionalities you can load either one of these packages.

Some of the features of tibble:
* Tibbles have clean printing method that shows only the first 10 rows and all the columns that fit on the screen. These 10 rows can be modified
* When printed, the data type of each column is specified
* Subsetting a tibble will always return a tibble

To convert a `data.frame` to tibble use `as_tibble()`:

In [131]:
print(as_tibble(iris))

# A tibble: 150 x 5
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          <dbl>       <dbl>        <dbl>       <dbl> <fct>  
 1          5.1         3.5          1.4         0.2 setosa 
 2          4.9         3            1.4         0.2 setosa 
 3          4.7         3.2          1.3         0.2 setosa 
 4          4.6         3.1          1.5         0.2 setosa 
 5          5           3.6          1.4         0.2 setosa 
 6          5.4         3.9          1.7         0.4 setosa 
 7          4.6         3.4          1.4         0.3 setosa 
 8          5           3.4          1.5         0.2 setosa 
 9          4.4         2.9          1.4         0.2 setosa 
10          4.9         3.1          1.5         0.1 setosa 
# ... with 140 more rows


As mentioned above, if you are running this code on your RStudio there is no need to spell out the `print()` function, you can just say the name of the tibble, or in this example `as_tibble(iris)`, and you will get the same result.

You can create a new tibble from individual vectors with `tibble()`:

In [132]:
myTibble <- tibble(x = 1, 
                   y = 1:5, 
                   z = y ^ 2 + x
                  )

print(myTibble)

# A tibble: 5 x 3
      x     y     z
  <dbl> <int> <dbl>
1     1     1     2
2     1     2     5
3     1     3    10
4     1     4    17
5     1     5    26


In [133]:
# The dataset mpg from ggplot2 that worked in the viz notebook is a tibble, 
#   like all other datasets that come with tidyverse.
print(mpg)  

# A tibble: 234 x 11
   manufacturer model   displ  year   cyl trans   drv     cty   hwy fl    class
   <chr>        <chr>   <dbl> <int> <int> <chr>   <chr> <int> <int> <chr> <chr>
 1 audi         a4        1.8  1999     4 auto(l… f        18    29 p     comp…
 2 audi         a4        1.8  1999     4 manual… f        21    29 p     comp…
 3 audi         a4        2    2008     4 manual… f        20    31 p     comp…
 4 audi         a4        2    2008     4 auto(a… f        21    30 p     comp…
 5 audi         a4        2.8  1999     6 auto(l… f        16    26 p     comp…
 6 audi         a4        2.8  1999     6 manual… f        18    26 p     comp…
 7 audi         a4        3.1  2008     6 auto(a… f        18    27 p     comp…
 8 audi         a4 qua…   1.8  1999     4 manual… 4        18    26 p     comp…
 9 audi         a4 qua…   1.8  1999     4 auto(l… 4        16    25 p     comp…
10 audi         a4 qua…   2    2008     4 manual… 4        20    28 p     comp…
# ... with 224 more

In [134]:
class(mtcars)  # This is what class() function will output for a traditional data.frame

In [135]:
class(ggplot2::mpg)  # And this is what it'll show for a tibble data.frame

In the examples below you can compare the behavior of a data.frame right next to its corresponding tibble:

In [136]:
df <- data.frame(abc = 1, xyz = "a")
df$x
df[, "xyz"]
df[, c("abc", "xyz")]

abc,xyz
1,a


In [137]:
df <- tibble(abc = 1, xyz = "a")
df$x
df[, "xyz"]
df[, c("abc", "xyz")]

“Unknown or uninitialised column: 'x'.”

NULL

xyz
a


abc,xyz
1,a


---
### Exercise 3
If you have the name of a variable stored in an object, e.g. `var <- "hwy"`, how can you extract the reference variable from a tibble?



In [138]:
# Your answer goes here

<div style="text-align: right"> [[Jump to top]](#jump_to_top) </div>