
## All about arguments

Learn how to set defaults for arguments, how to pass arguments between functions, and how to check that users specified arguments correctly.

### Numeric defaults
cut_by_quantile() converts a numeric vector into a categorical variable where quantiles define the cut points. 
This is a useful function, but at the moment you have to specify five arguments to make it work. 
This is too much thinking and typing.
By specifying default arguments, you can make it easier to use. Let's start with n, which specifies 
how many categories to cut x into.

In [2]:
# A numeric vector of the number of visits to Snake River is provided as n_visits.

snake_river_visits <- readRDS(file = "snake_river_visits.rds")
n_visits = snake_river_visits$n_visits

# Set the default for n to 5
cut_by_quantile <- function(x, n = 5, na.rm, labels, interval_type) {
  probs <- seq(0, 1, length.out = n + 1)
  qtiles <- quantile(x, probs, na.rm = na.rm, names = FALSE)
  right <- switch(interval_type, "(lo, hi]" = TRUE, "[lo, hi)" = FALSE)
  cut(x, qtiles, labels = labels, right = right, include.lowest = TRUE)
}

# Remove the n argument from the call
cut_by_quantile(
  n_visits, 
  na.rm = FALSE, 
  labels = c("very low", "low", "medium", "high", "very high"),
  interval_type = "(lo, hi]"
)



### Logical defaults
cut_by_quantile() is now slightly easier to use, but you still always have to specify the na.rm argument. 
This removes missing values – it behaves the same as the na.rm argument to mean() or sd().
Where functions have an argument for removing missing values, the best practice is to not remove 
them by default (in case you hadn't spotted that you had missing values). 
That means that the default for na.rm should be FALSE.

In [3]:


# Set the default for na.rm to FALSE
cut_by_quantile <- function(x, n = 5, na.rm = FALSE, labels, interval_type) {
  probs <- seq(0, 1, length.out = n + 1)
  qtiles <- quantile(x, probs, na.rm = na.rm, names = FALSE)
  right <- switch(interval_type, "(lo, hi]" = TRUE, "[lo, hi)" = FALSE)
  cut(x, qtiles, labels = labels, right = right, include.lowest = TRUE)
}

# Remove the na.rm argument from the call
cut_by_quantile(
  n_visits, 
  labels = c("very low", "low", "medium", "high", "very high"),
  interval_type = "(lo, hi]"
)

### NULL defaults
The cut() function used by cut_by_quantile() can automatically provide sensible labels for each category. 
The code to generate these labels is pretty complicated, so rather than appearing in the function signature directly,  
its labels argument defaults to NULL, and the calculation details are shown on the ?cut help page.

In [4]:
# Set the default for labels to NULL
cut_by_quantile <- function(x, n = 5, na.rm = FALSE, labels = NULL, interval_type) {
  probs <- seq(0, 1, length.out = n + 1)
  qtiles <- quantile(x, probs, na.rm = na.rm, names = FALSE)
  right <- switch(interval_type, "(lo, hi]" = TRUE, "[lo, hi)" = FALSE)
  cut(x, qtiles, labels = labels, right = right, include.lowest = TRUE)
}

# Remove the labels argument from the call
cut_by_quantile(
  n_visits,
  labels = c("very low", "low", "medium", "high", "very high"),
  interval_type = "(lo, hi]"
)


### Categorical defaults
When cutting up a numeric vector, you need to worry about what happens if a value lands exactly on a boundary. You can either put this value into a category of the lower interval or the higher interval. That is, you can choose your intervals to include values at the top boundary but not the bottom (in mathematical terminology, "open on the left, closed on the right", or (lo, hi]). Or you can choose the opposite ("closed on the left, open on the right", or [lo, hi)). cut_by_quantile() should allow these two choices.


In [5]:

# The pattern for categorical defaults is:

# function(cat_arg = c("choice1", "choice2")) {
#  cat_arg <- match.arg(cat_arg)
#}

# Set the categories for interval_type to "(lo, hi]" and "[lo, hi)"
cut_by_quantile <- function(x, n = 5, na.rm = FALSE, labels = NULL, 
                            interval_type = c("(lo, hi]", "[lo, hi)")) {
  # Match the interval_type argument
  interval_type <- match.arg(interval_type)
  probs <- seq(0, 1, length.out = n + 1)
  qtiles <- quantile(x, probs, na.rm = na.rm, names = FALSE)
  right <- switch(interval_type, "(lo, hi]" = TRUE, "[lo, hi)" = FALSE)
  cut(x, qtiles, labels = labels, right = right, include.lowest = TRUE)
}

# Remove the interval_type argument from the call
cut_by_quantile(n_visits)

### Harmonic mean
The harmonic mean is the reciprocal of the arithmetic mean of the reciprocal of the data. That is 

harmonic_mean(X) = 1/Aritmetic_mean(1/x)

The harmonic mean is often used to average ratio data. You'll be using it on the price/earnings ratio of stocks 
in the Standard and Poor's 500 index, provided as std_and_poor500. Price/earnings ratio is a measure of 
how expensive a stock is.

In [7]:
library(dplyr)

std_and_poor500 <- readRDS(file = "std_and_poor500_with_pe_2019-06-21.rds")

# Look at the Standard and Poor 500 data
print(head(std_and_poor500))

# Write a function to calculate the reciprocal
get_reciprocal <- function(x) {
   1/x
}

# Write a function, calc_harmonic_mean(), that calculates the harmonic mean of its only input, x.
calc_harmonic_mean <- function(x, na.rm = FALSE) {
  x %>%
    get_reciprocal() %>%
    mean(na.rm = na.rm) %>%
    get_reciprocal()
}

# Using std_and_poor500, group by sector, and summarize to calculate the harmonic mean of the price/earning 
# ratios in the pe_ratio column

std_and_poor500 %>% 
  # Group by sector
  group_by(sector) %>% 
  # Summarize, calculating harmonic mean of P/E ratio
  summarise(hmean_pe_ratio = calc_harmonic_mean(pe_ratio, na.rm = TRUE))


"package 'dplyr' was built under R version 3.6.3"
Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union



Rows: 505
Columns: 5
$ symbol   <chr> "MMM", "ABT", "ABBV", "ABMD", "ACN", "ATVI", "ADBE", "AMD"...
$ company  <chr> "3M Company", "Abbott Laboratories", "AbbVie Inc.", "ABIOM...
$ sector   <chr> "Industrials", "Health Care", "Health Care", "Health Care"...
$ industry <chr> "Industrial Conglomerates", "Health Care Equipment", "Phar...
$ pe_ratio <dbl> 18.31678, 57.66621, 22.43805, 45.63993, 27.00233, 20.13596...


`summarise()` ungrouping output (override with `.groups` argument)


sector,hmean_pe_ratio
Communication Services,17.51679
Consumer Discretionary,15.20211
Consumer Staples,19.7876
Energy,13.74589
Financials,12.86337
Health Care,26.57289
Industrials,18.17022
Information Technology,21.56407
Materials,16.30492
Real Estate,32.50105


### Passing arguments with '...'
Rather than explicitly giving calc_harmonic_mean() and na.rm argument, you can use ... to simply 
"pass other arguments" to mean().

In [8]:

calc_harmonic_mean <- function(x, ...) {
  x %>%
    get_reciprocal() %>%
    mean(...) %>%
    get_reciprocal()
}

std_and_poor500 %>% 
  # Group by sector
  group_by(sector) %>% 
  # Summarize, calculating harmonic mean of P/E ratio
  summarise(hmean_pe_ratio = calc_harmonic_mean(pe_ratio, na.rm = TRUE))

# !! Did you notice that this code was the same as in the previous exercise? Using ... doesn't change 
# how people use your function; it just means the function is more flexible. Whether flexible means better (or not) 
# is up to you to decide. Also take in account that using the ... can be "potential" dangerous (is two-edged sword!!). 

`summarise()` ungrouping output (override with `.groups` argument)


sector,hmean_pe_ratio
Communication Services,17.51679
Consumer Discretionary,15.20211
Consumer Staples,19.7876
Energy,13.74589
Financials,12.86337
Health Care,26.57289
Industrials,18.17022
Information Technology,21.56407
Materials,16.30492
Real Estate,32.50105


### Throwing errors with bad arguments
If a user provides a bad input to a function, the best course of action is to throw an error letting them know. 
The two rules are

1. Throw the error message as soon as you realize there is a problem (typically at the start of the function).

2. Make the error message easily understandable.

You can use the assert_*() functions from assertive to check inputs and throw errors when they fail.

In [1]:
# library
#install.packages("assertive")
# install.packages("devtools")
# library(devtools)
# install_bitbucket("richierocks/assertive")


# add a line to the body of calc_harmonic_mean() to assert that x is numeric.
calc_harmonic_mean <- function(x, na.rm = FALSE) {
  # Assert that x is numeric
  # assert_is_numeric(x) if you can install assertive package
    if (!is.numeric(x)){
        stop("x is not of class 'numeric'; it has class '", class(x),"'.'")
    }
  x %>%
    get_reciprocal() %>%
    mean(na.rm = na.rm) %>%
    get_reciprocal()
}

# See what happens when you pass it strings
calc_harmonic_mean(std_and_poor500$sector)

ERROR: Error in calc_harmonic_mean(std_and_poor500$sector): object 'std_and_poor500' not found


### Custom error logic
Sometimes the assert_*() functions in assertive don't give the most informative error message. 
For example, the assertions that check if a number is in a numeric range will tell the user that 
a value is out of range, but the won't say why that's a problem. In that case, you can use 
the is_*() functions in conjunction with messages, warnings, or errors to define custom feedback.
The harmonic mean only makes sense when x has all positive values. 
(Try calculating the harmonic mean of one and minus one to see why.) Make sure your users know this!

In [17]:
# If any values of x are non-positive (ignoring NAs) then throw an error.

calc_harmonic_mean <- function(x, na.rm = FALSE) {
   # check if it is numeric 
    if (!is.numeric(x)){
        stop("x is not of class 'numeric'; it has class '", class(x),"'.'")
    }
  # Check if any values of x are non-positive
  # you can use is_non_positive() if you can install assertive package
  if(any(x < 0, na.rm = TRUE)) {
    # Throw an error
    stop("x contains non-positive values, so the harmonic mean makes no sense.")
  }
  x %>%
    get_reciprocal() %>%
    mean(na.rm = na.rm) %>%
    get_reciprocal()
}

# See what happens when you pass it negative numbers
calc_harmonic_mean(std_and_poor500$pe_ratio - 20)

ERROR: Error in calc_harmonic_mean(std_and_poor500$pe_ratio - 20): x contains non-positive values, so the harmonic mean makes no sense.


In [18]:
# Update the function definition to fix the na.rm argument
calc_harmonic_mean <- function(x, na.rm = FALSE) {
   # check if it is numeric 
    if (!is.numeric(x)){
        stop("x is not of class 'numeric'; it has class '", class(x),"'.'")
    }
  # Check if any values of x are non-positive
  # you can use is_non_positive() if you can install assertive package
  if(any(x < 0, na.rm = TRUE)) {
    # Throw an error
    stop("x contains non-positive values, so the harmonic mean makes no sense.")
  }
  # Use the first value of na.rm, and coerce to logical
  na.rm <- coerce_to(use_first(na.rm), target_class = "logical")  
  x %>%
    get_reciprocal() %>%
    mean(na.rm = na.rm) %>%
    get_reciprocal()
}

# See what happens when you pass it negative numbers
calc_harmonic_mean(std_and_poor500$pe_ratio, na.rm = 1:5)

ERROR: Error in coerce_to(use_first(na.rm), target_class = "logical"): could not find function "coerce_to"
