# **R Basic Introduction**





In [1]:
if (!require(tidyverse)) install.packages('tidyverse')

Loading required package: tidyverse

── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.3     [32m✔[39m [34mreadr    [39m 2.1.4
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.0
[32m✔[39m [34mggplot2  [39m 3.4.3     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mlubridate[39m 1.9.2     [32m✔[39m [34mtidyr    [39m 1.3.0
[32m✔[39m [34mpurrr    [39m 1.0.2     
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors


In [2]:
library(tidyverse)

## Vectors

*Atomic vectors*<br/>
They can be logical, numerical (integer or double), character, complex, raw. Atomic vectors are *homogeneous*.

*Lists*<br/>
They are also called recursive vectors. Lists can be *heterogenous*.

Every vector has two key properties:


*   Its *type*, which you can determine with **typeof()**
*   Its *length*, which you can determine with **length()**

Besides, there are four important types of augmented vectors:


*   *factors*, that are built on top of integer vectors
*   *dates* and date-times, that are built on top of numeric vectors
*   *data* frames, that are built on top of lists





### Logical vectors
They can only take three values: FALSE, TRUE and NA. We can create them with **c()**.

In [3]:
c(TRUE, TRUE, FALSE, NA)

### Numerical vectors

In [4]:
# Double is by default
typeof(1)

In [5]:
# Place a L after the number to get integer
typeof(1L)

In [6]:
# Double have four special values: NA, NaN, Inf, -Inf
c(-1, 0, 1) / 0

Avoid using '==' ... :


*   ... to compare two doubles, use **dplyr::near()** instead
*   ... to check for *Inf*, *-Inf*, *NaN*, use the helper functions instead.


<br/>Helper functions:
*   **is.finite()** for *0*
*   **is.infinite()** for *Inf*
*   **is.nan()** for *NaN*




### Character vectors

Each element is a *string*, and a *string* can contain an arbitrary amount of data.<br/>

The following cells of code provide you complementary information on *string* manipulation.

In [7]:
# Create a string variable
string1 <- "This is a string"
string2 <- 'To put a "quote" inside a string, use single quotes'
print(str_c("string1", string1, sep = ' --> '))
print(str_c("string2", string2, sep = ' --> '))

[1] "string1 --> This is a string"
[1] "string2 --> To put a \"quote\" inside a string, use single quotes"


In [8]:
# Obtain the length of a string
str_length(c("Data", "Science", NA))

In [9]:
# Collapse a vector of strings
str_c(c("M", "L", "S", "D"), collapse=" - ")

In [10]:
# Subsetting strings
x <- c("Apple", "Banana", "Pear")
# forward
print(str_sub(x, 1, 3))
# backward
print(str_sub(x, -3, -1))
# modify string
print(str_sub(x, 1, 1) <- str_to_lower(str_sub(x, 1, 1)))

[1] "App" "Ban" "Pea"
[1] "ple" "ana" "ear"
[1] "a" "b" "p"


### --- !!! --- **Exercises** --- !!! ---





Use **str_length()** and **str_sub()** to extract the middle character
from a string.<br/>
What will you do if the string has an even
number of characters?



In [11]:
# ../..

What does **str_wrap()** do? When might you want to use it?

In [12]:
# ../..

What does **str_trim()** do? What’s the opposite of str_trim()?

In [13]:
# ../..

Write a function that turns (e.g.) a vector **c("a", "b", "c")**
into the *string* *a*, *b*, and *c*. <br/>
Think carefully about what it
should do if given a vector of length 0, 1, or 2.

In [14]:
# ../..

## Using Atomic Vectors

### Coercion of type

There is two types of coercions:


*   *Explicit coercion*: </br>
    using a function like **as.logical()**, **as.integer()**, **as.double()**, **as.character()**
*   *Implicit coercion*: </br>
    happens when you use a vector in a specific context that expects a certain type of vector.



In [15]:
# Create a logical vector
my_vec <- c(rep(TRUE, n=10), rep(FALSE, n=10))
mean(my_vec)

### Naming vectors

In [16]:
# Naming during creation
print(c(x = 1, y = 2, z = 4))

x y z 
1 2 4 


In [17]:
# Naming after creation
print(set_names(1:3, c("a", "b", "c")))

a b c 
1 2 3 


### Subsetting vectors

In [18]:
# Subsetting with positive integers keeps the elements at those positions
x <- c("one", "two", "three", "four", "five")
x[c(3, 2, 5)]

In [19]:
# You can repeat positions
x[c(1, 1, 5, 5, 5, 2)]

In [20]:
# Negative values drop the elements at the specified positions
x[c(-1, -3, -5)]

In [21]:
# Keeping only TRUE values
x <- c(10, 3, NA, 5, 8, 1, NA)
x[!is.na(x)]

In [22]:
# Subsetting with names
x <- c(abc = 1, def = 2, xyz = 5)
print(x[c("xyz", "def")])

xyz def 
  5   2 


### --- !!! --- **Exercises** --- !!! ---

Create a vector. How can you obtain:


*   the last value?
*   the elements at events numbered positions?
*   every elements except the last value?  
*   only even numbers (and no missing value)?


In [23]:
# ../..

## Recursive vectors

In [24]:
# Creating a list
x <- list(1, 2, 3)
x

In [25]:
# Another view of the content
str(x)

List of 3
 $ : num 1
 $ : num 2
 $ : num 3


In [26]:
# Setting names
x_named <- list(a = 1, b = 2, c = 3)
str(x_named)

List of 3
 $ a: num 1
 $ b: num 2
 $ c: num 3


In [27]:
# Create a list with a mix of objects
y <- list("a", 1L, 1.5, TRUE)
str(y)

List of 4
 $ : chr "a"
 $ : int 1
 $ : num 1.5
 $ : logi TRUE


In [28]:
# Create a list of lists
z <- list(list(1, 2), list(3, 4))
str(z)

List of 2
 $ :List of 2
  ..$ : num 1
  ..$ : num 2
 $ :List of 2
  ..$ : num 3
  ..$ : num 4


Try to understand the structures of the following lists:

In [29]:
x1 <- list(c(1, 2), c(3, 4))
x2 <- list(list(1, 2), list(3, 4))
x3 <- list(1, list(2, list(3)))

In [30]:
# Subsetting a list
a <- list(a = 1:3, b = "a string", c = pi, d = list(-1, -5))
y <- list("a", 1L, 1.5, TRUE)

In [31]:
str(a[1:2])

List of 2
 $ a: int [1:3] 1 2 3
 $ b: chr "a string"


In [32]:
str(a[4])

List of 1
 $ d:List of 2
  ..$ : num -1
  ..$ : num -5


In [33]:
str(y[[1]])

 chr "a"


In [34]:
str(y[[4]])

 logi TRUE


In [35]:
print(a$a)

[1] 1 2 3


In [36]:
print(a[["a"]])

[1] 1 2 3


## Factors

They represent categorical data. They can take a fixed set of possible values.
Factors are built on top of integers and have *levels* attribute.

In [37]:
# Create a factor variable
x <- factor(c("ab", "cd", "ab"), levels = c("ab", "cd", "ef"))
x

In [38]:
# Check the type
typeof(x)

In [39]:
# See the attributes
attributes(x)

## Dates

They are numeric vectors that represent the number of days since 1 January 1970.

In [40]:
# Create a date variable
x <- as.Date("1971-01-01")
x

In [41]:
unclass(x)

In [42]:
typeof(x)

In [43]:
attributes(x)

## Date-times

They are numeric vectors with class POSIXct (Portable Operating System Interface calendar time) that represent the number of seconds since 1 January 1970.

In [44]:
# Create a date-time variable
x <- lubridate::ymd_hm("1970-01-01 01:00")
unclass(x)

In [45]:
attributes(x)

In [46]:
# Set the tzone attribute (how the time is printed)
attr(x, "tzone") <- "US/Pacific"
x

[1] "1969-12-31 17:00:00 PST"

In [47]:
# Set the tzone attribute (how the time is printed)
attr(x, "tzone") <- "US/Eastern"
x

[1] "1969-12-31 20:00:00 EST"

## Functions

There are three key steps:


1.   Pick a name<br/>
    --> *the name of your function will be short, but clearly evoke what the function does*
2.   List the inputs or *arguments*
    --> Generally, function names should be verbs, and arguments should be nouns.
3.   Place your code within the *body*

Let us have a look below to *my_rescale* function.



In [48]:
# Create the function
rescale_data <- function(x){
  rng <- range(x, na.rm = TRUE)
  (x - rng[1]) / (rng[2] - rng[1])
}
# NB: the last value is returned
# NB2: you should also consider unit testing...

In [49]:
# Call the function
rescale_data(c(0, 5, 10))

In [50]:
# Call the function
rescale_data(c(-10, 0, 10))

In [51]:
# Call the function
rescale_data(c(1, 2, 3, NA, 5))

### --- !!! --- **Exercises** --- !!! ---

Write **both_na()**, a function that takes two vectors of the same<br/>
length and returns the number of positions that have an NA in<br/>
both vectors.



In [52]:
# ../..

### Conditional execution
An *if* statement allows you to conditionally execute code.

```
if (condition) {
  # code executed when condition is TRUE
} else {
  # code executed when condition is FALSE
}
```



In [53]:
# A simple conditional execution within a function
# GOAL: The function returns a logical vector describing
# whether or not each element of a vector is named

has_name <- function(x) {
  nms <- names(x)

  if (is.null(nms)) {
    rep(FALSE, length(x))
  } else {
    !is.na(nms) & nms != ""
  }
}

### Multiple Conditions

You can chain multiple if statements together:

```
if (this) {
  # do that
} else if (that) {
  # do something else
} else {
  #
}
```



### --- !!! --- **Exercises** --- !!! ---

What’s the difference between *if* and *ifelse()*? Carefully read<br/>
the help and construct three examples that illustrate the key differences.

In [54]:
# ../..

Write a greeting function that says "good morning," "good afternoon,"<br/>
or "good evening," depending on the time of day. (Hint:<br/>
use a time argument that defaults to **lubridate::now()**. That<br/>
will make it easier to test your function.)

In [55]:
# ../..

What does this **switch()** call do? What happens if x is “e”?

```
switch(x,
  a = ,
  b = "ab",
  c = ,
  d = "cd"
)
```



In [56]:
# ../..

### Arguments

Generally, data arguments should come first. Detail arguments<br/>
should go on the end, and usually should have default values. You<br/>
specify a default value in the same way you call a function with a<br/>
named argument:

In [57]:
# Compute confidence interval around
# mean using normal approximation
mean_ci <- function(x, conf = 0.95) {
  se <- sd(x) / sqrt(length(x))
  alpha <- 1 - conf
  mean(x) + se * qnorm(c(alpha / 2, 1 - alpha / 2))
}

In [58]:
x <- runif(100)
round(mean_ci(x), digits = 4)

In [59]:
round(mean_ci(x, conf = 0.99), digits = 4)

### Return values

The value returned by the function is usually the last statement it<br/>
evaluates, but you can choose to return early by using return().

In [60]:
# Example return() usage
complicated_function <- function(x, y, z) {
  my_value = NULL
    if (length(x) == 0 || length(y) == 0) {

      # Do something
      my_value = -1

    } else {

      my_value = 10

    }

    return(my_value)
}