# Using R as a calculator

In [None]:
3 + 2

In [None]:
3 * 4

In [None]:
3 / 2

In [None]:
3 ^ 2

In [None]:
3 ^ 0.5

In [None]:
# Unlike Python's "_", R has no support for thousand separators when inputting numbers.
revenue = 100000
costs = 40000
profit = revenue - costs

In [None]:
profit

In [None]:
tax_rate = 0.15
net_profit = profit * (1 - tax_rate)

In [None]:
net_profit

# Integer vs. floating-point numbers

Unlike basically any other programming language, a variable in `R` has a mode (how it's stored in memory) and a class (an abstract type). Additionally, it also has a type.

In [None]:
class(revenue)

In [None]:
mode(revenue)

In [None]:
typeof(revenue)

By default, numeric literals are double. To create an integer you need the `L` suffix.

In [None]:
foo = 10L

In [None]:
class(foo)

In [None]:
mode(foo)

In [None]:
typeof(foo)

You can also convert to integer with `as.integer`.

In [None]:
as.integer(profit)

# Dealing with text: strings

In [None]:
name = "Jean"
surname = 'Valjean'

R does not have an analog of Python's f-strings. The most similar effect can be achieved using a third-party package, `glue`.

In [None]:
library(glue)

In [None]:
glue("Hello, {name} {surname}.")

## Formatting numbers within strings

In [None]:
glue("The net profit is: {net_profit}")

Still, Glue does not support format specifiers similar to Python's. You can use the `format` function inside Glue's curly braces to achieve the same result.

In [None]:
glue("The net profit is: {format(net_profit, nsmall=2)}")

In [None]:
glue("The net profit is: {format(net_profit, nsmall=2, big.mark=',')}")

Similarly, to correctly print fractions as percentages, we need yet another library: `scales`. We will use its `percent` function.

In [None]:
library("scales")

In [None]:
glue("The tax rate is: {percent(tax_rate)}")

# Logical operators

In [None]:
net_profit > 10000

In [None]:
net_profit <= 12000

In [None]:
net_profit == 51000

In [None]:
net_profit > 40000 && tax_rate < 0.25

In [None]:
net_profit <= 10000 || tax_rate >= 0.3

# The first data structure: a vector

In [None]:
v = c(1, 2, 3)

In R, everything is a vector, so `v` does not have any special class, mode, or type. On the contrary, we could say that, e.g., `profit` is a length-one vector containing one double.

In [None]:
class(v)

In [None]:
mode(v)

In [None]:
typeof(v)

In [None]:
surnames = c("Valjean", "Javert", "Myriel")
ages = c(40, 38, 62)

In [None]:
surnames

In [None]:
ages

In [None]:
append(surnames, "Pontmercy")
append(ages, 26)

In [None]:
surnames

In R, append does not change the vector in-place. This forces you to type the vector variable name twice.

In [None]:
surnames = append(surnames, "Pontmercy")
ages = append(ages, 26)

In [None]:
surnames

In [None]:
ages

You cannot have mixed-type vectors in R. If you try, everything gets converted to the "most common" type. If you try mixing numbers and strings, for example, everything becomes a string.

In [None]:
mixed_vector = c(24L, "Barcelona", TRUE, 5.6)

In [None]:
mixed_vector

In [None]:
class(mixed_vector)

Use the `%in%` operator to check for membership.

In [None]:
"Valjean" %in% surnames

In [None]:
"Hugo" %in% surnames

You also have the `sum` function in R and it works exactly like in Python.

In [None]:
sum(ages)

## Filtering a vector

In [None]:
ages[ages < 50]

Filtering a vector of strings is harder. One way of doing it is by using the `grepl` function.

In [None]:
surnames[grepl("J", surnames)]

In [None]:
surnames[grepl("j", surnames)]

Exercise: filter surnames that contain either *j* or *J*.

In [None]:
surnames[grepl("J", surnames, ignore.case = TRUE)]

## Transforming a vector

In [None]:
ages ^ 2

In [None]:
ages[ages < 50] ^ 2

## Iterating over vectors: for loops

You will frequently see the `cat` function used to print. (Although what the function does is actually concatenating things.)

The named parameter `sep = "\n"` means: "use a new line as the separator".

In [None]:
for(surname in surnames) {
  cat(glue("Hello, Mr. {surname}!"), sep = "\n")
}

Hello, Mr. Valjean!
Hello, Mr. Javert!
Hello, Mr. Myriel!
Hello, Mr. Pontmercy!


R does not have an analog of Python's `zip`. We have a couple of different options to achieve the same result. The first one is perhaps conceptually simpler and involves using the vector's indices (if we are sure both vectors have the same length).

In [None]:
for(idx in seq_along(surnames)) {
  cat(sprintf("Mr. %s is %d years old.", surnames[idx], ages[idx]), sep = "\n")
}

Mr. Valjean is 40 years old.
Mr. Javert is 38 years old.
Mr. Myriel is 62 years old.
Mr. Pontmercy is 26 years old.


Remark:


*   We access the element at index `idx` of, e.g., vector `surnames` with the syntax `surnames[idx]`.
*   We use `seq_along` to obtain a sequence of valid indices for a given vector.
*   We cannot use `glue` with the `surnames[idx]` syntax. In this example, we instead use `sprintf` to print and `cat` to concatenate and print.



The next one is the more idiomatic `R` way: using vectorised functions. In our case, we use the vectorised `sprintf`.

In [None]:
cat(sprintf("Mr. %s is %d years old.", surnames, ages), sep = "\n")

Mr. Valjean is 40 years old.
Mr. Javert is 38 years old.
Mr. Myriel is 62 years old.
Mr. Pontmercy is 26 years old.


A similar version uses `paste` instead of `sprintf`. The `paste` function is simpler, but it offers fewer ways of customising the output. In our case, we can use either.

In [None]:
cat(paste("Mr.", surnames, "is", ages, "years old."), sep = "\n")

Mr. Valjean is 40 years old.
Mr. Javert is 38 years old.
Mr. Myriel is 62 years old.
Mr. Pontmercy is 26 years old.


### Iterating over a sorted vector

In [None]:
for(age in sort(ages)) {
  cat(sprintf("%d ", age))
}

26 38 40 62 

In [None]:
for(age in sort(ages, decreasing = TRUE)) {
  cat(sprintf("%d ", age))
}

62 40 38 26 

Use `order` instead of `sort` to obtain the order of the indices that would sort the vector.

In [None]:
order(ages)

Exercise: print surnames and ages in pairs, but the loop should go through the pairs in ascending order of age.

In [None]:
ord = order(ages)
cat(paste("Mr.", surnames[ord], "is", ages[ord], "years old."), sep = "\n")

Mr. Pontmercy is 26 years old.
Mr. Javert is 38 years old.
Mr. Valjean is 40 years old.
Mr. Myriel is 62 years old.


The vector of indices can be used inside the square brackets to reorder the elements of the vectors.

# A new data structure: a list

R calls a "list" what Python calls a "dictionary"!

In [None]:
character_age = list('Valjean' = 40, 'Javert' = 38, 'Myriel' = 62)

In [None]:
character_age

In [None]:
character_age['Valjean']

You can add new elements to a list exactly as you would add new elements to a Python's dictionary.

In [None]:
character_age['Pontmercy'] = 26

In [None]:
character_age

By default, looping through a Python's dictionary with a `for` loop yields the keys. In R, it yields the values.

Btw, Python's keys are called *names* in R.

In [None]:
for(x in character_age) {
  cat(x, sep = "\n")
}

40
38
62
26


Use the `names` function to obtain the vector of names.

In [None]:
names(character_age)

You can use the `unlist` function to obtain the vector of values.

In [None]:
unlist(character_age, use.names = FALSE)

Use `names` and `unlist` to achieve the same result as Python's `.items()`, i.e., accessing each name-value pair. This works well with vectorised functions.

In [None]:
cat(paste("Mr.", names(character_age), "is", unlist(character_age), "years old."), sep = "\n")

Mr. Valjean is 40 years old.
Mr. Javert is 38 years old.
Mr. Myriel is 62 years old.
Mr. Pontmercy is 26 years old.


## Filtering a list

In [None]:
young = character_age[unlist(character_age) < 30]

In [None]:
young

In [None]:
has_j = character_age[grepl("J", names(character_age), ignore.case = TRUE)]

In [None]:
has_j

# Functions

In [None]:
after_taxes = function(profit, tax_rate) {
  profit * (1 - tax_rate)
}

In [None]:
after_taxes(profit = 100000, tax_rate = 0.15)

Remark: our function is automatically vectorised.

In [None]:
after_taxes(profit = c(100000, 80000), tax_rate = 0.15)

We can specify default parameters values.

In [None]:
after_taxes = function(profit, tax_rate = 0.15) {
  profit * (1 - tax_rate)
}

In [None]:
after_taxes(profit = 100000)

We can handle the negative profits case...

In [None]:
# Seems ok... (but)

after_taxes = function(profit, tax_rate = 0.15) {
  if(profit > 0) {
    profit * (1 - tax_rate)
  } else {
    profit
  }
}

In [None]:
after_taxes(profit = -20000)

However, adding an `if` statement causes the function to not be vectorised any longer.

In [None]:
after_taxes(profit = c(100000, -20000))

ERROR: Error in if (profit > 0) {: the condition has length > 1


The following function is vectorised.

In [None]:
# Slightly better

after_taxes = function(profit, tax_rate = 0.15) {
  profit[profit > 0] = profit[profit > 0] * (1 - tax_rate)
  profit
}

In [None]:
after_taxes(profit = c(100000, -20000))

And the following one is both vectorised and more "idiomatic".

In [None]:
# More idiomatic

after_taxes = function(profit, tax_rate = 0.15) {
  ifelse(profit > 0, profit * (1 - tax_rate), profit)
}

In [None]:
after_taxes(profit = c(100000, -20000))

In [None]:
# Or using a math trick...

after_taxes = function(profit, tax_rate = 0.15) {
  profit - pmax(profit, 0) * tax_rate
}

In [None]:
after_taxes(profit = c(100000, -20000))

Remark: we can pass different tax rates for the different profits.

In [None]:
after_taxes(profit = c(100000, 50000), tax_rate = c(0.1, 0.05))

Finally, we impose the constraint that the tax rate must be between 0 and 1.

In [None]:
# Okay (but...)

after_taxes = function(profit, tax_rate = 0.15) {
  if(tax_rate < 0 || tax_rate > 1) {
    stop("The tax rate must be between 0 and 1.")
  }

  ifelse(profit > 0, profit * (1 - tax_rate), profit)
}

In [None]:
after_taxes(profit = 100000, tax_rate = -0.5)

ERROR: Error in after_taxes(profit = 1e+05, tax_rate = -0.5): The tax rate must be between 0 and 1.


However, this function cannot handle vector `tax_rate` parameters.

In [None]:
after_taxes(profit = c(100000, 50000), tax_rate = c(0.1, 0.05))

ERROR: Error in tax_rate < 0 || tax_rate > 1: 'length = 2' in coercion to 'logical(1)'


Let's fix it by using vectorised functions only.

In [None]:
# Better!

after_taxes = function(profit, tax_rate = 0.15) {
  if(any(tax_rate < 0) || any(tax_rate > 1)) {
    stop("All tax rates must be between 0 and 1.")
  }

  ifelse(profit > 0, profit * (1 - tax_rate), profit)
}

In [None]:
after_taxes(profit = c(100000, 50000), tax_rate = c(0.1, 0.05))

In [None]:
after_taxes(profit = c(100000, 50000), tax_rate = c(0.1, -0.05))

ERROR: Error in after_taxes(profit = c(1e+05, 50000), tax_rate = c(0.1, -0.05)): All tax rates must be between 0 and 1.
