docs/09-advanced.Rmd

# Advanced Topics

In this appendix, we talk about a few advanced topics that may be of interest to developers and advanced users.

## More global options

There are a few more advanced global options\index{Global Options} in addition to those introduced in Section \@ref(global-options), and they are listed in Table \@ref(tab:global-options2).

```{r global-options2, echo=FALSE}
knitr::kable(matrix(c(
  'blogdown.hugo.dir', '', 'The directory of the Hugo executable',
  'blogdown.method', 'html', 'The building method for R Markdown',
  'blogdown.publishDir', '', 'The publish dir for local preview',
  'blogdown.new_bundle', FALSE, 'Create a new post in a bundle?',
  'blogdown.widgetsID', TRUE, "Incremental IDs for HTML widgets?",
  NULL
), ncol = 3, byrow = TRUE, dimnames = list(NULL, c('Option name', 'Default', 'Meaning'))), booktabs = TRUE, caption = 'A few more advanced global options.')
```

If you want to install Hugo to a custom path, you can set the global option `blogdown.hugo.dir` to a directory to store the Hugo executable before you call `install_hugo()`, e.g., `options(blogdown.hugo.dir = '~/Downloads/hugo_0.20.1/')`. This may be useful for you to use a specific version of Hugo for a specific website,^[You can set this option per project. See Section \@ref(global-options) for details.] or store a copy of Hugo on a USB Flash drive along with your website.

The option `blogdown.method` is explained in Section \@ref(methods).

When your website project is under version control in the RStudio IDE, continuously previewing the site can be slow, if it contains hundreds of files or more. The default publish directory is `public/` under the project root directory, and whenever you make a change in the source that triggers a rebuild, RStudio will be busy tracking file changes in the `public/` directory. The delay before you see the website in the RStudio Viewer can be 10 seconds or even longer. That is why we provide the option `blogdown.publishDir`. You may set a temporary publish directory to generate the website, and this directory should not be under the same RStudio project, e.g., `options(blogdown.publishDir = '../public_site')`, which means the website will be generated to the directory `public_site/` under the parent directory of the current project.

To understand the option `blogdown.new_bundle`, you have to know the concept of [page bundles](https://gohugo.io/content-management/page-bundles/) in Hugo. When this option is set to `TRUE`, the function `blogdown::new_site()` (or the RStudio addin "New Post") will create a new post as the index file of a page bundle, e.g., `post/2015-07-23-bye-world/index.md` instead of `post/2015-07-23-bye-world.md`. One benefit of using a page bundle instead of a normal page file is that you can put resource files associated with the post (such as images) under the same directory of the post. This means you no longer have to put them under the `static/` directory, which is often confusing to Hugo beginners.

The option `blogdown.widgetsID` is only relevant if your website source is under version control and you have HTML widgets on the website. If this option is `TRUE` (default), the random IDs of HTML widgets will be changed to incremental IDs in the HTML output, so these IDs are unlikely to change every time you recompile your website; otherwise, every time you will get different random IDs.

## LiveReload

As we briefly mentioned\index{LiveReload} in Section \@ref(a-quick-example), you can use `blogdown::serve_site()` to preview a website, and the web page will be automatically rebuilt and reloaded in your web browser when the source file is modified and saved. This is called "LiveReload."

We have provided two approaches to LiveReload. The default approach is through `servr::httw()`, which will continuously watch the website directory for file changes, and rebuild the site when changes are detected. This approach is relatively slow because the website is fully regenerated every time. This may not be a real problem for Hugo, because Hugo is often fast enough: it takes about a millisecond to generate one page, so a website with a thousand pages may only take about one second to be fully regenerated.

If you are not concerned about the above issue, we recommend that you use the default approach, otherwise you can set the global option `options(blogdown.generator.server = TRUE)` to use an alternative approach to LiveReload, which is based on the native support for LiveReload from the static site generator. At the moment, this has only been tested against Hugo-based websites. It does not work with Jekyll and we were not successful with Hexo, either.

This alternative approach requires an additional R packages to be installed: **processx** [@R-processx]. You may use this approach when you primarily work on plain Markdown posts instead of R Markdown posts, because it can be much faster to preview Markdown posts using the web server of Hugo. The web server can be stopped by `blogdown::stop_server()`, and it will always be stopped when the R session is ended, so you can restart your R session if `stop_server()` fails to stop the server for some reason.

The web server is established via the command `hugo server` (see [its documentation](https://gohugo.io/commands/hugo_server/) for details). You can pass command-line arguments via the global option `blogdown.hugo.server`. The default value for this option is `c('-D', '-F')`, which means to render draft and future posts in the preview. We want to highlight a special argument `--navigateToChanged` in a recent version of Hugo, which asks Hugo to automatically navigate to the changed page. For example, you can set the options:

```{r eval=FALSE}
options(blogdown.hugo.server = c('-D', '-F', '--navigateToChanged'))
```

Then, when you edit a source file under `content/`, Hugo will automatically show you the corresponding output page in the web browser.

Note that Hugo renders and serves the website from memory by default, so no files will be generated to `public/`. If you need to publish the `public/` folder manually, you will have to manually build the website via `blogdown::hugo_build()` or `blogdown::build_site()`.

## Building a website for local preview {#local-preview}

The function\index{blogdown::build\_site()} `blogdown::build_site()` has an argument `local` that defaults to `FALSE`, which means building the website for publishing instead of local previewing. The mode `local = TRUE` is primarily for `blogdown::serve_site()` to serve the website locally. There are three major differences between `local = FALSE` and `TRUE`. When `local = TRUE`:

- The `baseurl` option in `config.toml` is temporarily overridden by `"/"` even if you have set it to a full URL like `"http://www.example.com/"`.^[If your `baseurl` contains a subdirectory, it will be overridden by the subdirectory name. For example, for `baseurl = "http://www.example.com/project/"`, `build_site(local = TRUE)` will temporarily remove the domain name and only use the value `/project/`.] This is because when a website is to be previewed locally, links should refer to local files. For example, `/about/index.html` should be used instead of the full link `http://www.example.com/about/index.html`; the function `serve_site()` knows that `/about/index.html` means the file under the `public/` directory, and can fetch it and display the content to you, otherwise your browser will take you to the website `http://www.example.com` instead of displaying a local file.

- Draft and future posts are always rendered when `local = TRUE`, but not when `local = FALSE`. This is for you to preview draft and future posts locally. If you know the [Hugo command line,](https://gohugo.io/commands/hugo/) it means the `hugo` command is called with the flags `-D -F`, or equivalently, `--buildDrafts --buildFuture`.

- There is a caching mechanism to speed up building your website: an Rmd file will not be recompiled when its `*.html` output file is newer (in terms of file modification time). If you want to force `build_site(local = TRUE)` to recompile the Rmd file even if it is older than the HTML output, you need to delete the HTML output, or edit the Rmd file so that its modification time will be newer, or click the RStudio addin "Touch File". This caching mechanism does not apply to `local = FALSE`, i.e., `build_site(local = FALSE)` will always recompile all Rmd files, because when you want to publish a site, you may need to recompile everything to make sure the site is fully regenerated. If you have time-consuming code chunks in any Rmd files, you have to use either of these methods to save time:

    - Turn on **knitr**'s caching for time-consuming code chunks, i.e., the chunk option `cache = TRUE`.

    - Do not call `build_site()`, but `blogdown::hugo_build()` instead\index{blogdown::hugo\_build()}. The latter does not compile any Rmd files, but simply runs the `hugo` command to build the site. Please use this method only if you are sure that your Rmd files do not need to be recompiled.

You do not need to worry about these details if your website is automatically generated from source via a service like Netlify, which will make use of `baseurl` and not use `-D -F` by default. If you manually publish the `public/` folder, you need to be more careful:

- If your website does not work without the full `baseurl`, or you do not want the draft or future posts to be published, you should not publish the `public/` directory generated by `serve_site()`. Always run `blogdown::build_site()` or `blogdown::hugo_build()` before you upload this directory to a web server.

- If your drafts and future posts contain (time-)sensitive information, you are strongly recommended to delete the `/public/` directory before you rebuild the site for publishing every time, because Hugo never deletes it, and your sensitive information may be rendered by a certain `build_site(local = TRUE)` call last time and left in the directory. If the website is really important, and you need to make sure you absolutely will not screw up anything every time you publish it, put the `/public/` directory under version control, so you have a chance to see which files were changed before you publish the new site.

## Functions in the blogdown package {#functions}

There are about 20 exported functions\index{Functions} in the **blogdown** package, and many more non-exported functions. Exported functions are documented and you can use them after `library(blogdown)` (or via `blogdown::`). Non-exported functions are not documented, but you can access them via `blogdown:::` (the triple-colon syntax). This package is not very complicated, and consists of only about 2000 lines of R code (the number is given by the word-counting command `wc`):

```{bash, comment=''}
wc -l ../R/*.R ../inst/scripts/*.R
```

You may take a look at the source code (https://github.com/rstudio/blogdown) if you want to know more about a non-exported function. In this section, we selectively list some exported and non-exported functions in the package for your reference.

### Exported functions

Installation: You can install Hugo with `install_hugo()`, and install a Hugo theme with `install_theme()`.

Wrappers of Hugo commands: `hugo_cmd()` is a general wrapper of `system2('hugo', ...)`, and all later functions execute specific Hugo commands based on this general wrapper function; `hugo_version()` executes the command `hugo version` (i.e., `system2('hugo', 'version')` in R); `hugo_build()` executes `hugo` with optional parameters; `new_site()` executes `hugo new site`; `new_content()` executes `hugo new` to create a new content file, and `new_post()` is a wrapper based on `new_content()` to create a new blog post with appropriate YAML metadata and filename; `hugo_convert()` executes `hugo convert`; `hugo_server()` executes `hugo server`.

Output format: `html_page()` is the only R Markdown output format function in the package. It inherits from `bookdown::html_document2()`, which in turn inherits from `rmarkdown::html_document()`. You need to read the documentation of these two functions to know the possible arguments. Section \@ref(output-format) has more detailed information about it.

Helper functions: `shortcode()` is a helper function to write a Hugo shortcode `{{% %}}` in an Rmd post; `shortcode_html()` writes out `{{< >}}`.

Serving a site: `serve_site()` starts a local web server to build and preview a site continuously; you can stop the server via `stop_server()`, or restart your R session.

Dealing with YAML metadata: `find_yaml()` can be used to find content files that contain a specified YAML field with specified values; `find_tags()` and `find_categories()` are wrapper functions based on `find_yaml()` to match specific tags and categories in content files, respectively; `count_yaml()` can be used to calculate the frequencies of specified fields.

### Non-exported functions

Some functions are not exported in this package because average users are unlikely to use them directly, and we list a subset of them below:

- You can find the path to the Hugo executable via `blogdown:::find_hugo()`. If the executable can be found via the `PATH` environment variable, it just returns `'hugo'`.

- The helper function `modify_yaml()` can be used to modify the YAML metadata of a file. It has a `...` argument that takes arbitrary YAML fields, e.g., `blogdown:::modify_yaml('foo.md', author = 'Frida Gomam', date = '2015-07-23')` will change the `author` field in the file `foo.md` to `Frida Gomam`, and `date` to `2015-07-23`. We have shown the advanced usage of this function in Section \@ref(from-jekyll).

- We have also mentioned a series of functions to clean up Markdown posts in Section \@ref(from-jekyll). They include `remove_extra_empty_lines()`, `process_bare_urls()`, `normalize_chars()`, `remove_highlight_tags()`, and `fix_img_tags()`.

- In Section \@ref(local-preview), we mentioned a caching mechanism based on the file modification time. It is implemented in `blogdown:::require_rebuild()`, which takes two arguments of filenames. The first file is the output file, and the second is the source file. When the source file is older than the output file, or the output file does not exist or is empty, this function returns `TRUE`.

## Paths of figures and other dependencies {#dep-path}

One of the most challenging tasks in developing the **blogdown** package is to properly handle dependency files\index{Dependency Files} of web pages. If all pages of a website were plain text without dependencies like images or JavaScript libraries, it would be much easier for me to develop the **blogdown** package.

After **blogdown** compiles each Rmd document to HTML, it will try to detect the dependencies (if there are any) from the HTML source and copy them to the `static/` folder, so that Hugo will copy them to `public/` later. The detection depends on the paths of dependencies. By default, all dependencies, like R plots and libraries for HTML widgets, are generated to the `foo_files/` directory if the Rmd is named `foo.Rmd`. Specifically, R plots are generated to `foo_files/figure-html/` and the rest of files under `foo_files/` are typically from HTML widgets.

R plots under `content/*/foo_files/figure-html/` are copied to `static/*/foo_files/figure-html/`, and the paths in HTML tags like `<img src="foo_files/figure-html/bar.png" />` are substituted with `/*/foo_files/figure-html/bar.png`. Note the leading slash indicates the root directory of the published website, and the substitution works because Hugo will copy `*/foo_files/figure-html/` from `static/` to `public/`.

Any other files under `foo_files/` are treated as dependency files of HTML widgets, and will be copied to `static/rmarkdown-libs/`. The original paths in HTML will also be substituted accordingly, e.g., from `<script src="foo_files/jquery/jquery.min.js">` to `<script src="/rmarkdown-libs/jquery/jquery.min.js">`. It does not matter whether these files are generated by HTML widgets or not. The links on the published website will be correct and typically hidden from the readers of the pages.^[For example, a reader will not see the `<script>` tag on a page, so it does not really matter what its `src` attribute looks like as long as it is a path that actually exists.]

You should not modify the **knitr** chunk option `fig.path` or `cache.path` unless the above process is completely clear to you, and you want to handle dependencies by yourself.

In the rare cases when **blogdown** fails to detect and copy some of your dependencies (e.g., you used a fairly sophisticated HTML widget package that writes out files to custom paths), you have two possible choices:

- Do not ignore `_files$` in the option `ignoreFiles` in `config.toml`, do not customize the `permalinks` option, and set the option `uglyURLs` to `true`. This way, **blogdown** will not substitute paths that it cannot recognize, and Hugo will copy these files to `public/`. The relative file locations of the `*.html` file and its dependencies will remain the same when they are copied to `public/`, so all links will continue to work.

- If you choose to ignore `_files$` or have customized the `permalinks` option, you need to make sure **blogdown** can recognize the dependencies. One approach is to use the path returned by the helper function `blogdown::dep_path()` to write out additional dependency files. Basically this function returns the current `fig.path` option in **knitr**, which defaults to `*_files/figure-html/`. For example, you can generate a plot manually under `dep_path()`, and **blogdown** will process it automatically (copy the file and substitute the image path accordingly).

If you do not understand all these technical details, we recommend that you use the first choice, and you will have to sacrifice custom permanent links and clean URLs (e.g., `/about.html` instead of `/about/`). With this choice, you can also customize the `fig.path` option for code chunks if you want.

## HTML widgets

We do not recommend that you use different HTML widgets\index{HTML Widgets} from many R packages on the same page, because it is likely to cause conflicts in JavaScript. For example, if your theme uses the jQuery library, it may conflict with the jQuery library used by a certain HTML widget. In this case, you can conditionally load the theme's jQuery\index{jQuery} library by setting a parameter in the YAML metadata of your post and revising the Hugo template that loads jQuery. Below is the example code to load jQuery conditionally in a Hugo template:

```html
{{ if not .Params.exclude_jquery}}
<script src="path/to/jquery.js"></script>
{{ end }}
```

Then if you set `exclude_jquery: true` in the YAML metadata of a post, the theme's jQuery will not be loaded, so there will not be conflicts when your HTML widgets also depend on jQuery.

Another solution is the [**widgetframe** package](https://github.com/bhaskarvk/widgetframe) [@R-widgetframe]. It solves this problem by embedding HTML widgets in `<iframe></iframe>`. Since an iframe is isolated from the main web page on which it is embedded, there will not be any JavaScript conflicts.

A widget is typically not of full width on the page. To set its width to 100%, you can use the chunk option `out.width = "100%"`.

## Version control

If your website source files are under version control\index{Version Control}, we recommend that you add at least these two folder names to your `.gitignore` file^[In the pattern use, the first `/` means to not ignore recursively but only from the current directory of the `.gitignore`. See more about `.gitignore` pattern in <https://git-scm.com/book/en/v2/Git-Basics-Recording-Changes-to-the-Repository#_ignoring>]:

```bash
/blogdown/
/public/
```

The `blogdown/` directory is used to store cache files, and they are most likely to be useless to the published website. Only **knitr** may use them, and the published website will not depend on these files.

The `public/` directory should be ignored if your website is going to be automatically (re)built on a remote server such as Netlify.

As we mentioned in Section \@ref(dep-path), R plots will be copied to `static/`, so you may see new files in GIT after you render an Rmd file that has graphics output. You need to add and commit these new files in GIT, because the website will use them.

Although it is not relevant to **blogdown**, macOS users should remember to ignore `.DS_Store` and Windows users should ignore `Thumbs.db`.

If you are relatively familiar with GIT\index{GIT Submodules}, there is a special technique that may be useful for you to manage Hugo themes, which is called "GIT submodules." A submodule in GIT allows you to manage a particular folder of the main repository using a different remote repository. For example, if you used the default `hugo-lithium` from my GitHub repository, you might want to sync it with my repository occasionally, because I may update it from time to time. You can add the GIT submodule via the command line: 

```bash
git submodule add \
  https://github.com/yihui/hugo-lithium.git \
  themes/hugo-lithium
```

If the folder `themes/hugo-lithium` exists, you need to delete it before adding the submodule. Then you can see a SHA string associated with the "folder" `themes/hugo-lithium` in the GIT status of your main repository indicating the version of the submodule. Note that you will only see the SHA string instead of the full content of the folder. Next time when you want to sync with my repository, you may run the command:

```bash
git submodule update --recursive --remote
```

In general, if you are happy with how your website looks, you do not need to manage the theme using GIT submodules. Future updates in the upstream repository may not really be what you want. In that case, a physical and fixed copy of the theme is more appropriate for you.

## The default HTML template {#default-template}

As we mentioned in Section \@ref(output-format), the default output format for an Rmd document in **blogdown** is `blogdown::html_page`. This format passes a minimal HTML template to Pandoc\index{Pandoc} by default:

```{r default-template, engine='cat', code=readLines(blogdown:::pkg_file('resources', 'template-minimal.html')), class.source='html'}
```

You can find this template file via `blogdown:::pkg_file('resources', 'template-minimal.html')` in R, and this file path is the default value of the `template` argument of `html_page()`. You may change this default template, but you should understand what this template is supposed to do first.

If you are familiar with Pandoc templates, you should realize that this is not a complete HTML template, e.g., it does not have the tags `<html>`, `<head>`, or `<body>`. That is because we do not need or want Pandoc to return a full HTML document to us. The main thing we want Pandoc to do is to convert our Markdown document to HTML, and give us the body of the HTML document, which is in the template variable `$body$`. Once we have the body, we can further pass it to Hugo, and Hugo will use its own template to embed the body and generate the full HTML document. Let's explain this by a minimal example. Suppose we have an R Markdown document `foo.Rmd`:

```markdown
---
title: "Hello World"
author: "Yihui Xie"
---

I found a package named **blogdown**.
```

It is first converted to an HTML file `foo.html` through `html_page()`, and note that YAML metadata are ignored for now:

```html
I found a package named <strong>blogdown</strong>.
```

Then **blogdown** will read the YAML metadata of the Rmd source file, and insert the metadata into the HTML file so it becomes:

```html
---
title: "Hello World"
author: "Yihui Xie"
---

I found a package named <strong>blogdown</strong>.
```

This is the file to be picked up by Hugo and eventually converted to an HTML page of a website. Since the Markdown body has been processed to HTML by Pandoc, Hugo will basically use the HTML. That is how we bypass Hugo's Markdown engine BlackFriday.

Besides `$body$`, you may have noticed other Pandoc template variables like `$header-includes$`, `$css$`, `$include-before$`, `$toc$`, and `$include-after$`. These variables make it possible to customize the `html_page` format. For example, if you want to generate a table of contents, and apply an additional CSS stylesheet to a certain page, you may set `toc` to `true` and pass the stylesheet path to the `css` argument of `html_page()`, e.g.,

```yaml
---
title: "Hello World"
author: "Yihui Xie"
output:
  blogdown::html_page:
    toc: true
    css: "/css/my-style.css"
---
```

## Different building methods {#methods}

If your website does not contain any Rmd files, it is very straightforward to render it --- just a system call to the `hugo` command. When your website contains Rmd files, **blogdown** has provided two rendering methods to compile these Rmd files. A website can be built using the function `blogdown::build_site()`:

```{r eval=FALSE, code=formatR::usage(blogdown::build_site, output=FALSE), tidy=FALSE}
```

As mentioned in Section \@ref(global-options), the default value of the `method` argument is determined by the global option `blogdown.method`, and you can set this option in `.Rprofile`.

For `method = 'html'`, `build_site()` renders `*.Rmd` to `*.html`, and `*.Rmarkdown` to `*.markdown`, and keeps the `*.html`/`*.markdown` output files under the same directory as `*.Rmd`/`*.Rmarkdown` files.

An Rmd file may generate two directories for figures (`*_files/`) and cache (`*_cache/`), respectively, if you have plot output or HTML widgets [@R-htmlwidgets] in your R code chunks, or enabled the chunk option `cache = TRUE` for caching. In the figure directory, there will be a subdirectory `figure-html/` that contains your plot output files, and possibly other subdirectories containing HTML dependencies from HTML widgets (e.g., `jquery/`). The figure directory is moved to `/static/`, and the cache directory is moved to `/blogdown/`.

After you run `build_site()`, your website is ready to be compiled by Hugo. This gives you the freedom to use deploying services like Netlify (Chapter \@ref(deployment)), where neither R nor **blogdown** is available, but Hugo is.

For `method = 'custom'`, `build_site()` will not process any R Markdown files, nor will it call Hugo to build the site. No matter which method you choose to use, `build_site()` will always look for an R script `/R/build.R` and execute it if it exists. This gives you the complete freedom to do anything you want for the website. For example, you can call `knitr::knit()` to compile Rmd to Markdown (`*.md`) in this R script instead of using `rmarkdown::render()`. This feature is designed for advanced users who are really familiar with the **knitr** package^[Honestly, it was originally designed for Yihui himself to build his own website, but he realized this feature could actually free users from Hugo. For example, it is possible to use Jekyll (another popular static site generator) with **blogdown**, too.] and Hugo or other static website generators (see Chapter \@ref(other-generators)).

When `R/build.R` exists and `method = 'html'`, the R Markdown files are knitted first, then the script `R/build.R` is executed, and lastly Hugo is called to build the website.