In [1]:
options(jupyter.rich_display = FALSE)

# Week 8 Tutorial: Fundamentals of R Programming I

## POP77001 Computer Programming for Social Scientists

##### Module website: [bit.ly/POP77001](https://bit.ly/POP77001)

## R and development environments

- There is some choice of integrated development environments (IDEs) for R (StatET, ESS, R Commander)
- However, over the last decade RStudio became the de factor standard IDE for working in R
- You can also find R extensions for your favourite text editor (Atom, Sublime Text, Visual Studio Code, Vim)
- For the purposes of consistency with Python part of the module, we will continue using Jupyter with R
- But feel free to use RStudio as your primary IDE for R

## Running R in Jupyter

- In order to be able to run R kernel in Jupyter, you need to install package `IRkernel`:
    - Open R (in the terminal) or RStudio:
    - Run `install.packages("IRkernel")` to install the package
    - Wait until the package is installed
    - Run `IRkernel::installspec()` to initialize R kernel for Jupyter
    - Now you should be able to launch or edit a notebook with R kernel
    
Tip: When starting working with R in Jupyter run `options(jupyter.rich_display = FALSE)` command to switch off pretty printing and get the output (albeit less neat) consistent with output in RStudio

## `IRkernel` installation and initialization

<div style="text-align: center;">
    <img width="700" height="700" src="../imgs/irkernel.png">
</div>

## Jupyter Notebook demonstration

![Jupyter Notebook R_1](../imgs/jupyter_notebook_r_1.png)

## Jupyter Notebook demonstration continued

![Jupyter Notebook R_2](../imgs/jupyter_notebook_r_2.png)

## Naming conventions

- Even while allowed in R, do not use `.` in variable names (it works as an object attribute in Python)
- Do not name give objects the names of existing functions and variables (e.g. `c`, `T`, `list`, `mean`)
- Use **UPPER_CASE_WITH_UNDERSCORE** for named constants (e.g. variables that remain fixed and unmodified)
- Use **lower_case_with_underscores** for function and variable names

Extra: [Style guide by Hadley Wickham](http://adv-r.had.co.nz/Style.html)

## Code layout

- Limit all lines to a maximum of 79 characters.
- Break up longer lines

```
my_long_vector <- c(
  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
  23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41,
  42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60
)
    
long_function_name <- function(a = "a long argument", 
                               b = "another argument",
                               c = "another long argument") {
  # As usual code is indented by two spaces.
}
```

## Reserved words

There are 14 (plus some variations of them) reserved words in R that cannot be used as identifiers.

|              |              |
|:-------------|:-------------|
| `break`      | `NA`         |
| `else`       | `NaN`        |
| `FALSE`      | `next`       |
| `for`        | `NULL`       |
| `function`   | `repeat`     |
| `if`         | `TRUE`       |
| `Inf`        | `while`      |


Source: [R reserved words](https://stat.ethz.ch/R-manual/R-devel/library/base/html/Reserved.html)

## Exercise 1: Vector subsetting

- Load built-in R object `letters` (lower-case letters of the Roman alphabet)
- Calculate its length
- Generate a vector of integers that starts from 1 and has the same length as `letters`
- Assign to each integer corresponding lower-case letter as its name
- Use these names to subset all vowels
- Now, repeat the subsetting, but using indices rather than names

Tip: You can use function `which()` for determining the indices of vowels

In [2]:
letters

 [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
[20] "t" "u" "v" "w" "x" "y" "z"

In [3]:
length(letters)

[1] 26

In [4]:
v <- 1:length(letters)

In [5]:
v

 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
[26] 26

In [6]:
names(v) <- letters

In [7]:
v

 a  b  c  d  e  f  g  h  i  j  k  l  m  n  o  p  q  r  s  t  u  v  w  x  y  z 
 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 

In [8]:
v[c("a", "e", "i", "o", "u")]

 a  e  i  o  u 
 1  5  9 15 21 

In [9]:
# We can use function which() to automatically determine the indices of vowels
which(names(v) %in% c("a", "e", "i", "o", "u"))

[1]  1  5  9 15 21

In [10]:
# which() simply returns the indices of TRUE values in a logical vector
names(v) %in% c("a", "e", "i", "o", "u")

 [1]  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE
[13] FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE
[25] FALSE FALSE

In [11]:
v[c(1, 5, 9, 15, 21)]

 a  e  i  o  u 
 1  5  9 15 21 

In [12]:
# Or we can use directly the output of which()
v[which(names(v) %in% c("a", "e", "i", "o", "u"))]

 a  e  i  o  u 
 1  5  9 15 21 

## Tabulation and crosstabulation in R

- R function `table()` provides an easy way of summarizing categorical variables
- Note that implicitly variables represented as character vectors are converted to factors

In [13]:
# Top 10 most populous settlements on the island of Ireland
# https://en.wikipedia.org/wiki/List_of_settlements_on_the_island_of_Ireland_by_population
top_10_settlements <- c(
    "Dublin", "Belfast", "Cork", "Limerick", "Derry",
    "Galway", "Newtownabbey", "Bangor", "Waterford", "Lisburn"
)

In [14]:
# Corresponding provinces
provinces <- c(
    "Leinster", "Ulster", "Munster", "Munster", "Ulster",
    "Connacht", "Ulster", "Ulster", "Munster", "Ulster"
)

In [15]:
# Given that each town appears only once, cross-tabulation might not be the most informative
table(top_10_settlements, provinces)

                  provinces
top_10_settlements Connacht Leinster Munster Ulster
      Bangor              0        0       0      1
      Belfast             0        0       0      1
      Cork                0        0       1      0
      Derry               0        0       0      1
      Dublin              0        1       0      0
      Galway              1        0       0      0
      Limerick            0        0       1      0
      Lisburn             0        0       0      1
      Newtownabbey        0        0       0      1
      Waterford           0        0       1      0

In [16]:
# Instead, we can just get tabulate the `provinces` vector
# and check the value counts for each province
table(provinces)

provinces
Connacht Leinster  Munster   Ulster 
       1        1        3        5 

## Exercise 2: Working with attributes and factors

- As you note the output of `table(provinces)` is sorted alphabetically
- Change this to reflect the actual counts
- First, let's store the result of tabulation for later re-use
- Start from exploring the structure of this object with `str()`
- What are the 2 main parts of this object? How are they stored?
- Extract the relevant parts from the stored object
- Save them as a named vector with provinces as names and counts as values
- Use `sort()` function to sort the vector in a decreasing order (from largest to smallest)
- Convert the original `provinces` vector into a factor with the levels ordered accordingly
- Re-run `table(provinces)`

In [17]:
tab <- table(provinces)

In [18]:
# As you can see, under the hood 1-dimensional table is no more than:
#    - integer vector of counts
#    - character vector of provinces' names (stored as an attribute)
str(tab)

 'table' int [1:4(1d)] 1 1 3 5
 - attr(*, "dimnames")=List of 1
  ..$ provinces: chr [1:4] "Connacht" "Leinster" "Munster" "Ulster"


In [19]:
# We can coerce the underlying integer vector, which will remove all attributes (class and names)
counts <- as.integer(tab)

In [20]:
counts

[1] 1 1 3 5

In [21]:
# Names can extracted as an names attribute
names(counts) <- names(tab)

In [22]:
counts

Connacht Leinster  Munster   Ulster 
       1        1        3        5 

In [23]:
# In order to sort the vector in the decreasing order we need to set the argument `decreasing` to TRUE
names(sort(counts, decreasing = TRUE))

[1] "Ulster"   "Munster"  "Connacht" "Leinster"

In [24]:
provinces <- factor(provinces, levels = names(sort(counts, decreasing = TRUE)))

In [25]:
table(provinces)

provinces
  Ulster  Munster Connacht Leinster 
       5        3        1        1 

## Week 8 Exercise (unassessed)

- Save a `letters` object under a different name
- Convert saved object into a matrix of 13 rows and 2 columns
- Subset letter 'f' using indices
- Concatenate 3 copies of `letters` object together in a single character vector
- Convert it into a 3-dimensional array, where each dimension appears as a matrix above
- Subset all letters 'f' across all 3 dimensions 