R/render_site.R

#' Render multiple documents as a website
#'
#' Render all of the R Markdown documents within a directory as a website.
#'
#' The \code{render_site} function enables you to render a collection of
#' markdown documents within a directory as a website. There are two
#' requirements for a directory to be rendered as a website:
#' \enumerate{
#'   \item{It must contain either an "index.Rmd" or "index.md" file.}
#'   \item{It must contain a site configuration file ("_site.yml").}
#' }
#'
#' The most minimal valid website is an empty "index.Rmd" and an empty
#' "_site.yml". With this configuration a single empty webpage would be
#' generated via a call to \code{render_site}. If you add additional markdown
#' documents to the directory they will also be rendered. By default a site is
#' rendered in the following fashion:
#'
#' \enumerate{
#'   \item{R Markdown (.Rmd) and plain markdown (.md) files in the root
#'   directory are rendered. Note however that markdown files beginning with "_"
#'   are not rendered (this is a convention to designate files that are included
#'   by top level documents).}
#'   \item{All output and supporting files are copied to a "_site" subdirectory
#'   of the website directory (this is configurable, see discussion below).}
#'   \item{The following files are \bold{not} copied to the "_site"
#'   sub-directory:
#'     \itemize{
#'       \item{Files beginning with "." (hidden files).}
#'       \item{Files beginning with "_"}
#'       \item{Files known to contain R source code (e.g. ".R", ".s", ".Rmd"), R
#'       data (e.g. ".RData", ".rds"), or configuration data (e.g. ".Rproj",
#'       "rsconnect")).}
#'     }
#'     Note that you can override which files are included or excluded via
#'     settings in "_site.yml" (described below).}
#'   \item{Normally R Markdown renders documents as self-contained HTML.
#'   However, \code{render_site} ensures that dependencies (e.g. CSS,
#'   JavaScript, images, etc.) remain in external files. CSS/JavaScript
#'   libraries are copied to a "site_libs" sub-directory and plots/images are
#'   copied to "_files" sub-directories.}
#' }
#'
#' You can remove the files generated by \code{render_site} using the
#' \code{clean_site} function.
#'
#' @section Configuration:
#' A "_site.yml" file can be used to configure the behavior of site generation.
#' Here is an example configuration file:
#'
#' \preformatted{
#' name: my-website
#' output_dir: _site
#' include: ["demo.R"]
#' exclude: ["docs.txt", "*.csv"]
#' navbar:
#'   title: "My Website"
#'   left:
#'     - text: "Home"
#'       href: index.html
#'     - text: "About"
#'       href: about.html
#' output:
#'   html_document:
#'     toc: true
#'     highlight: textmate
#' }
#'
#' The \code{name} field provides a suggested URL path for your website when it
#' is published (by default this is just the name of the directory containing
#' the site). The \code{output_dir} indicates which directory to copy site
#' content into ("_site" is the default if none is specified). Note that this
#' can be "." to keep all content within the root website directory alongside
#' the source code.
#'
#' The \code{include} and \code{exclude} fields enable you to override the
#' default behavior vis-a-vis what files are copied into the "_site" directory
#' (wildcards can be used as in the above example).
#'
#' The \code{navbar} field can be used to define a navigation bar for websites
#' based on the \code{\link{html_document}} format.
#'
#' Finally, the \code{output} field enables you to specify output options that
#' are common to all documents within the website (you can also still provide
#' local options within each document that override any common options).
#'
#' \code{new_session: true} causes each file to be rendered in a new R session.
#' This prevents the masking problem that arises when different files use
#' functions from different packages (namespaces) that share a common name, such
#' as \code{here::here} and \code{lubridate::here} or \code{dplyr::filter} and
#' \code{MASS::filter}. The default behaviour of \code{render_site} is to use a
#' common R session.
#'
#' \code{autospin: true} causes \code{.R} files to be spinned and rendered
#' (as well as \code{.Rmd} files). If \code{autospin} is set to false (the default),
#' \code{.R} files will not be spinned nor rendered. \code{autospin} can also
#' enumerate a list of .R files to be spinned and rendered.
#'
#' @section Custom Site Generation:
#' The behavior of the default site generation function
#' (\code{rmarkdown::default_site}) is described above. It is also possible to
#' define a custom site generator that has alternate behavior. A site generator
#' is an R function that is bound to by including it in the "site:" field of the
#' "index.Rmd" or "index.md" file. For example:
#'
#' \preformatted{
#' title: "My Book"
#' output: bookdown::gitbook
#' site: bookdown::bookdown_site
#' }
#'
#' A site generation function should return a list with the following elements:
#' \itemize{
#'   \item{\code{name}} {The name for the website (e.g. the parent directory
#'   name).}
#'   \item{\code{output_dir} {The directory where the website output is written
#'   to. This path should be relative to the site directory (e.g. "." or
#'   "_site")}}
#'   \item{\code{render}} {An R function that can be called to generate the
#'   site. The function should accept the \code{input_file},
#'   \code{output_format}, \code{envir}, and \code{quiet} arguments.}
#'   \item{\code{clean}} {An R function that returns relative paths to the files
#'   generated by \code{render_site} (these files are the ones which will be
#'   removed by the \code{clean_site} function.}
#' }
#'
#' Note that the \code{input_file} argument will be \code{NULL} when the entire
#' site is being generated. It will be set to a specific file name if a
#' front-end tool is attempting to preview it (e.g. RStudio IDE via the Knit
#' button).
#'
#' When \code{quiet = FALSE} the \code{render} function should also print a line
#' of output using the \code{\link{message}} function indicating which output
#' file should be previewed, for example:
#'
#' \preformatted{if (!quiet)
#'   message("\nOutput created: ", output)
#' }
#'
#' Emitting this line enables front-ends like RStudio to determine which file
#' they should open to preview the website.
#'
#' See the source code of the \code{rmarkdown::default_site} function for a
#' example of a site generation function.
#'
#' @param input Website directory (or the name of a file within the directory).
#' @param output_format R Markdown format to convert to (defaults to "all").
#' @param envir The environment in which the code chunks are to be evaluated
#'   during knitting (can use \code{\link{new.env}} to guarantee an empty new
#'   environment).
#' @param quiet \code{TRUE} to suppress messages and other output.
#' @inheritParams render
#'
#' @return \code{render_site} returns the name of the site output file (relative
#'   to the input directory). \code{clean_site} returns the names of the
#'   generated files removed during cleaning. \code{site_config} returns the
#'   contents of _site.yml as an R list. \code{default_site_generator} returns
#'   the default site generator for R Markdown websites.
#' @export
render_site <- function(input = ".",
                        output_format = "all",
                        envir = parent.frame(),
                        quiet = FALSE,
                        encoding = "UTF-8") {

  # capture original input
  original_input <- input

  # normalize to a directory
  input <- input_as_dir(input)

  # if it's a file then capture that and force output_format to be NULL
  # (to only render a single format for incremental/previewing)
  input_file <- NULL
  if (!dir_exists(original_input)) {
    input_file <- original_input
    if (output_format == "all")
      output_format <- NULL
  }

  # find the site generator
  generator <- site_generator(input, output_format)
  if (is.null(generator))
    stop("No site generator found.")

  # execute it
  generator$render(input_file = input_file,
                   output_format = output_format,
                   envir = envir,
                   quiet = quiet)

  # compute the name of the output file. if the input was a filename
  # use that as a base (otherwise use index.html)
  if (!dir_exists(original_input))
    output <- file_with_ext(basename(original_input), "html")
  else
    output <- "index.html"
  output <- file.path(input, generator$output_dir, output)
  output <- normalized_relative_to(input, output)

  # return it invisibly
  invisible(output)
}

#' @rdname render_site
#' @param preview Whether to list the files to be removed rather than actually
#'   removing them.
#' @export
clean_site <- function(input = ".", preview = FALSE, quiet = FALSE,
                       encoding = "UTF-8") {

  # normalize to a directory
  input <- input_as_dir(input)

  # find the site generator
  generator <- site_generator(input, output_format = NULL)
  if (is.null(generator))
    stop("No site generator found.")

  # get the files to be cleaned
  files <- generator$clean()

  # if it's just a preview then return the files, otherwise
  # actually remove the files
  if (preview)
    files
  else {
    if (!quiet) {
      cat("Removing files: \n")
      cat(paste0(" ", files), sep = "\n")
    }
    unlink(file.path(input, files), recursive = TRUE)
  }
}

#' @rdname render_site
#' @export
site_generator <- function(input = ".", output_format = NULL) {

  # look for the closest index file with 'site' metadata
  root <- tryCatch(
    proj_root(input, "^index.R?md$", "^\\s*site:.*::.*$"),
    error = function(e) NULL
  )

  # normalize input
  input <- input_as_dir(input)

  # if none found then look for a _site.yml file
  if (is.null(root)) {
    if (file.exists(site_config_file(input))) {
      return (default_site(input))
    } else {
      return(NULL)
    }
  }

  # determine the index file (will be index.Rmd or index.md)
  index <- file.path(root, "index.Rmd")
  if (!file.exists(index))
    index <- file.path(root, "index.md")

  # is this in a subdir of the site root? (only some generators support this)
  in_subdir <- !same_path(input, root)

  # read index.Rmd and extract the front matter
  front_matter <- yaml_front_matter(index)

  # create the site generator (passing the root dir)
  create_site_generator <- eval(parse(text = front_matter$site))
  generator <- create_site_generator(root)

  # if it's in a subdir check to see if the generator supports nested files
  if (in_subdir) {
    if (isTRUE(generator$subdirs)) {
      generator
    } else {
      NULL
    }
  } else {
    generator
  }
}


#' @rdname render_site
#' @export
site_config <- function(input = ".", encoding = "UTF-8") {

  # normalize input
  input <- input_as_dir(input)

  # check for config file
  config_file <- site_config_file(input)
  if (file.exists(config_file)) {

    # parse the yaml
    config_lines <- read_utf8(config_file)
    config <- yaml_load(config_lines)
    if (!is.list(config))
      config <- list()

    # provide defaults if necessary
    if (is.null(config$name))
      config$name <- basename(normalize_path(input))
    if (is.null(config$output_dir))
      config$output_dir <- "_site"
    if (is.null(config$new_session))
      config$new_session <- FALSE

    # return config
    config

  # no _site.yml
  } else {
    NULL
  }
}

# default site implementation (can be overridden by custom site generators)

#' @rdname render_site
#' @param ... Currently unused.
#' @export
default_site_generator <- default_site <- function(input, ...) {

  # get the site config
  config <- site_config(input)
  if (is.null(config))
    stop("No site configuration (_site.yml) file found.")

  # helper function to get all input files. includes all .Rmd and
  # .md files that don't start with "_" (note that we don't do this
  # recursively because rmarkdown in general handles applying common
  # options/elements across subdirectories poorly). Also excludes
  # README.R?md as those files are intended for GitHub. If
  # config$autospin is TRUE, we also spin and render .R files.
  input_files <- function() {
    pattern <- sprintf(
      "^[^_].*\\.%s$", if (isTRUE(config$autospin)) {
        "([Rr]|[Rr]?md)"
      } else {
        "[Rr]?md"
      }
    )
    files <- list.files(input, pattern)
    if (is.character(config$autospin)) files <- c(files, config$autospin)
    files[!grepl("^README\\.R?md$", files)]
  }

  # define render function (use ... to gracefully handle future args)
  render <- function(input_file,
                     output_format,
                     envir,
                     quiet,
                     ...) {

    # track outputs
    outputs <- c()

    # see if this is an incremental render
    incremental <- !is.null(input_file)

    # files list is either a single file (for incremental) or all
    # file within the input directory
    if (incremental)
      files <- input_file
    else {
      files <- file.path(input, input_files())
    }
    sapply(files, function(x) {
      render_one <- if (isTRUE(config$new_session)) {
        render_new_session
      } else {
        render_current_session
      }

      # log the file being rendered
      if (!quiet) message("\nRendering: ", x)

      output <- render_one(input = x,
                           output_format = output_format,
                           output_options = list(lib_dir = "site_libs",
                                                 self_contained = FALSE),
                           envir = envir,
                           quiet = quiet)

      # add to global list of outputs
      outputs <<- c(outputs, output)

      # check for files dir and add that as well
      sidecar_files_dir <- knitr_files_dir(output)
      files_dir_info <- file.info(sidecar_files_dir)
      if (isTRUE(files_dir_info$isdir))
        outputs <<- c(outputs, sidecar_files_dir)
    })

    # do we have a relative output directory? if so then remove,
    # recreate, and copy outputs to it (we don't however remove
    # it for incremental builds)
    if (config$output_dir != '.') {

      # remove and recreate output dir if necessary
      output_dir <- file.path(input, config$output_dir)
      if (file.exists(output_dir)) {
        if (!incremental) {
          unlink(output_dir, recursive = TRUE)
          dir.create(output_dir)
        }
      } else {
        dir.create(output_dir)
      }

      # move outputs
      for (output in outputs) {

        # don't move it if it's a _files dir that has a _cache dir
        if (grepl("^.*_files$", output)) {
          cache_dir <- gsub("_files$", "_cache", output)
          if (dir_exists(cache_dir))
            next;
        }

        output_dest <- file.path(output_dir, basename(output))
        if (dir_exists(output_dest))
          unlink(output_dest, recursive = TRUE)
        file.rename(output, output_dest)
      }

      # copy lib dir a directory at a time (allows it to work with incremental)
      lib_dir <- file.path(input, "site_libs")
      output_lib_dir <- file.path(output_dir, "site_libs")
      if (!file.exists(output_lib_dir))
        dir.create(output_lib_dir)
      libs <- list.files(lib_dir)
      for (lib in libs)
        file.copy(file.path(lib_dir, lib), output_lib_dir, recursive = TRUE)
      unlink(lib_dir, recursive = TRUE)

      # copy other files
      copy_site_resources(input)
    }

    # Print output created for rstudio preview
    if (!quiet) {
      # determine output file
      output_file <- ifelse(is.null(input_file),
                            "index.html",
                            file_with_ext(basename(input_file), "html"))
      if (config$output_dir != ".")
        output_file <- file.path(config$output_dir, output_file)
      message("\nOutput created: ", output_file)
    }
  }

  # define clean function
  clean <- function() {

    # build list of generated files
    generated <- c()

    # enumerate rendered markdown files
    files <- input_files()

    # get html files
    html_files <- file_with_ext(files, "html")

    # _files peers are always removed (they could be here due to
    # output_dir == "." or due to a _cache existing for the page)
    html_supporting <- paste0(knitr_files_dir(html_files), '/')
    generated <- c(generated, html_supporting)

    # _cache peers are always removed
    html_cache <- paste0(knitr_root_cache_dir(html_files), '/')
    generated <- c(generated, html_cache)

    # for rendering in the current directory we need to eliminate
    # output files for our inputs (including _files) and the lib dir
    if (config$output_dir == ".") {

      # .html peers
      generated <- c(generated, html_files)

      # site_libs dir
      generated <- c(generated, "site_libs/")

    # for an explicit output_dir just remove the directory
    } else {
      generated <- c(generated, paste0(config$output_dir, '/'))
    }

    # filter out by existence
    generated[file.exists(file.path(input, generated))]
  }

  # return site generator
  list(
    name = config$name,
    output_dir = config$output_dir,
    render = render,
    clean = clean
  )
}

# we suppress messages during render so that "Output created" isn't emitted
# (which could result in RStudio previewing the wrong file)
render_current_session <- function(...) suppressMessages(rmarkdown::render(...))

render_new_session <- function(...) {
  xfun::Rscript_call(
    function(...) { suppressMessages(rmarkdown::render(...)) },
    args = list(...)
  )
}

# utility function to copy all files into the _site directory
copy_site_resources <- function(input) {

  # get the site config
  config <- site_config(input)

  if (config$output_dir != ".") {

    # get the list of files
    files <- copyable_site_resources(input = input, config = config)

    # perform the copy
    output_dir <- file.path(input, config$output_dir)
    file.copy(from = file.path(input, files),
              to = output_dir,
              recursive = TRUE)
  }
}


#' Determine website resource files for a directory
#'
#' Determine which files within a given directory should be copied in
#' order to serve a website from the directory. Attempts to automatically
#' exclude source, data, hidden, and other files not required to serve
#' website content.
#'
#' @param site_dir Site directory to analyze
#' @param include Additional files to include (glob wildcards supported)
#' @param exclude Files to exclude  (glob wildcards supported)
#' @param recursive \code{TRUE} to return a full recursive file listing;
#'   \code{FALSE} to just provide top-level files and directories.
#'
#' @return Character vector of files and directories to copy
#'
#' @export
site_resources <- function(site_dir, include = NULL, exclude = NULL, recursive = FALSE) {

  # get the original file list (we'll need it to apply includes)
  all_files <- list.files(site_dir, all.files = TRUE)

  # excludes:
  #   - known source/data extensions
  #   - anything that starts w/ '.' or '_'
  #   - rsconnect and packrat directories
  #   - user excludes
  extensions <- c("R", "r", "S", "s",
                  "Rmd", "rmd", "md", "Rmarkdown", "rmarkdown",
                  "Rproj", "rproj",
                  "RData", "rdata", "rds")
  extensions_regex <- utils::glob2rx(paste0("*.", extensions))
  excludes <- c("^rsconnect$", "^packrat$", "^\\..*$", "^_.*$", "^.*_cache$",
                extensions_regex,
                utils::glob2rx(exclude))
  files <- all_files
  for (exclude in excludes)
    files <- files[!grepl(exclude, files)]

  # allow back in anything specified as an explicit "include"
  includes <- utils::glob2rx(include)
  for (include in includes) {
    include_files <- all_files[grepl(include, all_files)]
    files <- unique(c(files, include_files))
  }

  # if this is recursive then we need to blow out the directories
  if (recursive) {
    recursive_files <- c()
    for (file in files) {
      file_path <- file.path(site_dir, file)
      if (dir_exists(file_path)) {
        dir_files <- file.path(list.files(file_path,
                                          full.names = FALSE,
                                          recursive = TRUE))
        dir_files <- file.path(file, dir_files)
        recursive_files <- c(recursive_files, dir_files)
      } else {
        recursive_files <- c(recursive_files, file)
      }
    }
    recursive_files
  } else {
    files
  }
}


# utility function to list the files that should be copied
#
# NOTE: '...' is kept for backwards compatibility with older versions of
# RStudio which called 'copyable_site_resources(encoding = "UTF-8")'
copyable_site_resources <- function(input, config = site_config(input), ...) {

  include <- config$include

  exclude <- config$exclude
  if (config$output_dir != ".")
    exclude <- c(exclude, config$output_dir)

  site_resources(input, include, exclude)
}


# utility function to ensure that 'input' is a valid directory
# (converts from file to parent directory as necessary)
input_as_dir <- function(input) {

  # ensure the input dir exists
  if (!file.exists(input)) {
    input <- normalize_path(input, mustWork = FALSE)
    if (!file.exists(input)) stop(
      "The specified directory '", input, "' does not exist.", call. = FALSE
    )
  }

  # convert from file to directory if necessary
  if (!dir_exists(input))
    input <- dirname(input)

  # return it
  input
}

# get the path to the site config file
site_config_file <- function(input) {
  file.path(input, "_site.yml")
}

site_skeleton <- function(path, ...) {
  dir.create(path, recursive = TRUE, showWarnings = FALSE)
  file.copy(
    list.files(pkg_file('rmd', 'site'), full.names = TRUE),
    path, overwrite = TRUE
  )
}