# Developing R Packages

In this course, you will learn the end-to-end process for creating an R package from scratch. You will start off by creating the basic structure for your package, and adding in important details like functions and metadata. Once the basic components of your package are in place, you will learn about how to document your package, and why this is important for creating quality packages that other people - as well as your future self - can use with ease. Once you have created the components of your package, you will learn how to test they work properly, by creating tests, running checks, and building your package. By the end of this course you can expect to have all the necessary skills to create and share your own R packages.

## The R Package Structure

In this chapter, you will learn the basics of creating an R package. You will learn about the structure of R packages, set up a package, and write a function and include it in your package. You will also learn about the metadata stored in the DESCRIPTION and NAMESPACE files.

### The structure of an R package
You can use devtools to create the basic structure of an R package by using the create() function.

There are some optional arguments to this function but the main one that you will use is the path argument. You use this to specify where your package will be created and the name that your package will take.

If you want to create the package in your current working directory, as you often will, you just need to supply the name for the package. When naming your package remember to think about:

If the name is already taken by another package.
Whether the name makes it clear what the package does.
devtools is loaded in your workspace.

In [2]:
library(devtools)
# Use the create function to set up your first package
create("datasummary")

# Take a look at the files and folders in your package
dir(path = "datasummary")

"package 'devtools' was built under R version 3.6.3"Loading required package: usethis
"package 'usethis' was built under R version 3.6.3"√ Creating 'datasummary/'
√ Setting active project to 'C:/Users/Migue/datacamp R/Developing R Packages/datasummary'
√ Creating 'R/'
√ Writing 'DESCRIPTION'


Package: datasummary
Title: What the Package Does (One Line, Title Case)
Version: 0.0.0.9000
Authors@R (parsed):
    * First Last <first.last@example.com> [aut, cre] (<https://orcid.org/YOUR-ORCID-ID>)
Description: What the package does (one paragraph).
License: `use_mit_license()`, `use_gpl3_license()` or friends to pick a
    license
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.1


√ Writing 'NAMESPACE'
√ Setting active project to '<no active project>'


### Writing a simple function
Whilst there are packages that contain only data, typically packages are created to collect together functions for performing a specific task. If you need a refresher on writing functions you might want to review the course Writing Functions in R.

For your package you are going to keep the functions simple. You are going to create a package that produces custom summary output for your data.

In [3]:
# Create numeric_summary() function
numeric_summary <- function(x, na.rm) {

    # Include an error if x is not numeric
    if(!is.numeric(x)){
        stop("Data must be numeric")
    }
    
    # Create data frame
    data.frame( min = min(x, na.rm = na.rm),
                median = median(x, na.rm = na.rm),
                sd = sd(x, na.rm = na.rm),
                max = max(x, na.rm = na.rm))
}

# Test numeric_summary() function
numeric_summary(runif(1000), TRUE)

min,median,sd,max
0.002214653,0.4997426,0.2869247,0.999113


### Including functions in a package
Once you have written your function code you need to save it in the R directory of your package. Typically you can do that by saving an R script in the usual manner (i.e. "Save As").

In the instance that you already have objects created, as you did in the last exercise, that you want to write to the R directory you can do this programmatically. You can use the function dump() to send a named R function to a particular file. The two arguments that you need to pass to this function are the name of the R object, as a character string, and the path to the file that you want to create, including the extension .R.

The package datasummary has already been created, along with the function numeric_summary() and is available in your workspace.

In [4]:
# What is in the R directory before adding a function?
dir("datasummary/R")

# Use the dump() function to write the numeric_summary function
dump("numeric_summary", file = "datasummary/R/numeric_summary.R")

# Verify that the file is in the correct directory
dir("datasummary/R")

### The use_* functions
Beyond the required structure you can include a number of additional directories containing elements such as vignettes (user guides), data and unit tests. The devtools package makes it really simple for you to add to the package structure by providing a series of use_* functions. For example, use_data() and use_vignette(). Note that when adding vignettes, it's best not to include any spaces in the vignette name.

When you are adding data you need to provide the name of the data object along with the argument pkg, giving the path to the package that you want to put your data in.

devtools is loaded in your workspace.

In [7]:
## Not run
# What is in the package at the moment?
dir(path = "datasummary")

weather = runif(10000)
# Add the weather data
use_data(weather, pkg = "datasummary", overwrite = TRUE)

# Add a vignette called "Generating Summaries with Data Summary"
use_vignette("Generating_Summaries_with_Data_Summary", pkg = "datasummary")

# What directories do you now have in your package now?
 dir(path = "datasummary")

### Best practice for structuring code
A typical R package contains a number of functions that you need to maintain. Whilst there are no strict rules around how you should structure code in a package you generally want to avoid having all of your code in a single script. As you can't have sub-directories you also need to think carefully about how you name the file so that you can find your code again in the future.

Suppose you were to write another function for your package that takes all numeric columns in your data and returns a data frame of all of their summary statistics. What would be the best way to structure this code?

In [8]:
data_summary <- function(x, na.rm = TRUE){
  
  num_data <- select_if(x, .predicate = is.numeric) 
  
  map_df(num_data, .f = numeric_summary, na.rm = TRUE, .id = "ID")
  
}

# Write the function to the R directory
dump("data_summary", file = "datasummary/R/data_summary.R")