Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle duplicate numeric footnotes via pandoc --file-scope #897

Merged
merged 16 commits into from Jun 17, 2020

Conversation

jjallaire
Copy link
Member

@jjallaire jjallaire commented Jun 4, 2020

Numeric footnotes duplicated across chapters can now automatically renumbered. This is done by setting the pandoc_file_scope option to true in _bookdown.yml, which results in the use of the --file-scope (https://pandoc.org/MANUAL.html#option--file-scope) argument to pandoc (and having it operate on split out individual chapters of the target .md file rather than a combined file).

See also rstudio/rmarkdown#1837 (which this PR depends on)

…meter.

Numeric footnotes duplicated across chapters are now automatically renumbered. This is done by passing the `--file-scope` argument to pandoc (and having it operate on split out individual chapters of the target .md file rather than a combined file). This behavior can be toggled off by setting `pandoc_file_scope` to `false` in `_bookdown.yml`.

See also rstudio/rmarkdown#1837 (which this PR depends on)
@jjallaire jjallaire requested a review from yihui June 4, 2020 18:07
@jjallaire
Copy link
Member Author

A consequence of using --file-scope is that reference links (https://pandoc.org/MANUAL.html#reference-links) and footnotes are no longer resolved globally (so you can't e.g. reference a named footnote or a named link across chapters). I don't know how common this is, but it would represent breakage of existing books.

As a result I think that this behavior should be disabled by default. I'll make that change to the PR.

Copy link
Member

@yihui yihui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good. One question: with --file-scope, do internal links still work? https://pandoc.org/MANUAL.html#internal-links

@jjallaire
Copy link
Member Author

The first form works -- [Introduction](#introduction), however the second form -- [Introduction], does not work.

To me this implies that I should not make this the default in the project template, as I think users should explicitly opt in to this behavior and understand the pros/cons rather than have it set by default. I'll update the PR.

@jjallaire
Copy link
Member Author

Some more thoughts on how we could make this work better.

  1. Don't enable pandoc_file_scope by default in new projects, but do a trial run pandoc render (to JSON) and capture the footnote warning message and in that case print a message to the user advising them to enable pandoc_file_scope.

  2. Have RStudio detect the situation and do the same advising/prompt.

  3. Implement an "auto" mode (and make it the default) whereby we'd do a trial run render to JSON (again looking for the warning message) and then use ---file-scope if needed. This would effectively mean that if you use duplicate footnotes across chapters you are implicitly telling us to use --file-scope (and accept that shorthand reference links won't work across chapters). This would only "break" existing books that (a) Contain duplicate footnotes so are already broken in some fashion; and (b) Use naked reference links (which is already a bad idea in a long form manuscript, esp. with the ability to use \@ref or explicit ids to create more resilient reference links)

  4. I haven't thought this all the way through, but perhaps we could fix footnotes as follows (note that in the Pandoc JSON AST all footnotes are inline -- i.e. there are no footnote identifiers in the AST even if they were used in the source document):

    • Render to JSON, looking for the warning about duplicate footnotes
    • If we see the warning, render to JSON again but this time with --file-scope. Now we have 2 JSON documents the first of which has a bunch of duplicate footnotes (I believe the last instance of a duplicated footnotes is used for all instances of the footnote id in the "broken" document).
    • Re-write the AST of the first document with the "fixed" footnotes from the second document.
      I guess one bit of leakage in this scenario would be that naked reference links to other chapters inside footnotes wouldn't work, but that's a pretty narrow case.

My biggest reservation about (4) is that there are some scenarios where it doesn't work at all (and thus all footnotes are off by one, etc.). I can't say what these scenarios would be, just that there is some risk that they exist. I guess this risk is mitigated a bit by the fact that we'd only do it if we saw the duplicate footnote warning in the first place.

I prefer (3) or (4) as solutions as they require less user fiddling/complexity/failure modes. Let me know what you think....

@jjallaire
Copy link
Member Author

@jcheng5 and I talked realtime about this today and agreed that the following scheme should work with very little risk of error or disruption of existing bookdown projects:

  1. Have bookdown pass the pandoc_file_scope argument to rmarkdown::render() by default (this can be turned off in _bookdown.yml).

  2. When receiving this argument, rmarkdown will not always use --file-scope, rather it will first determine whether duplicate footnotes exists in the target manuscript (via the warning that pandoc emits). If they do, then it will resolve those using the scheme suggested in (4) above. This means that existing bookdown manuscripts that either use inline/named references, or use numeric references w/o duplicates will not get any different rendering behavior. Note that any existing manuscript currently not meeting these conditions is effectively already broken (has duplicate notes)

  3. A couple refinements to (4) to make it more robust will be: (a) To only substitute footnotes that are in fact duplicated/incorrect; and (b) To do no substitution at all if the total number of footnotes in the two documents is different.

I think all of this nets out to the following:

  • Today you can't really use numeric footnotes in bookdown (unless you are willing to do global renumbering --- I doubt anyone is actually doing this)

  • Now you can use numeric footnotes, but when you do, you just need to realize that naked reference links (e.g. [Introduction]) won't resolve across chapters (this not being considered a huge give b/c bookdown documentation already advises against this and offer \@ref links as well).

@yihui If this sounds okay to you I'll update the PR (and possibly find a better name for the argument/option) and then put the logic described in (4) into rmarkdown.

@hadley
Copy link
Member

hadley commented Jun 12, 2020

@jjallaire that sounds good to me too

rmarkdown will be updated to only use `--file-scope` if it actually detects duplicate footnote identifiers
@jjallaire
Copy link
Member Author

Okay, I have changed the name of the parameter to references_scope (to reflect that it only comes into play w/ references) and further clarified in the docs that this will only trigger the use of --file-scope when duplicate references are detected. This option is now enabled by default (and can be turned off in _bookdown.yml as it should not affect the rendering of existing bookdown manuscripts that use inline and/or named footnote identifiers.

@jjallaire
Copy link
Member Author

I've been struggling w/ the naming/semantics of the argument/option and I think I have a better approach. The crux of the problem is that bookdown is providing something very general to rmarkdown::render() (a function to split its input markdown into a set of named files) while the actual consumption of that construct is associated with a very specific behavior.

I think we need to separate these concepts:

  1. Any caller of rmarkdown::render() can optionally provide a function to split the input markdown into multiple named files. This function is provided as a part of the output_format definition (which includes all of the existing hooks, knitr and pandoc options, behavioral stuff like keep_md, etc.). bookdown would always provide this function (as the mere providing of it won't trigger any behavior).

  2. Then, if we want to trigger behavior we can have more aptly named output format options that map to exactly what's being done. This could accommodate multiple strategies for resolving duplicate footnotes --- e.g. option (3) above is essentially equivalent to using raw pandoc --file-scope whereas option (4) tries to dance around the limitations of --file-scope.

@jjallaire
Copy link
Member Author

I just pushed a new scheme which I think is much more sensible. In summary:

  1. File scope and renumbering concerns are now handled as part of the underlying rmarkdown::output_format() (which is where similar concerns are already handled) rather than an argument to render() (which always felt weird to me)

  2. bookdown always provides a file_scope handler as part of its output format definitions (this is the function used to unroll chapters from an aggregated book file). Note that nothing is actually done w/ this function by default, but it's nice that it's there for other features we might come up with in the future that require it.

  3. There is a new renumber_footnotes output format option (potentially applicable to any rmarkdown format that passes a file_scope). When this option is present and duplicate footnote ids are detected, then rmarkdown::render() will use the pandoc --file-scope option. The idea is that by requesting footnote renumbering you are acknowledging that pandoc's way of doing that is --file-scope and you are okay with those semantics.

  4. Bookdown formats set renumber_footnotes = TRUE by default (not that this can be enabled/disabled on a per-format or project-wide basis via including a renumber_footnotes entry in _bookdown.yml). I think it's very important that we enable renumber_footnotes by default, as without that anyone authoring in visual mode will need to discover and enable this option to get footnotes to work correctly. Note that I don't expect this to have a negative impact on existing manuscripts (see next point for discussion of this).

  5. It's already a constraint of bookdown manuscripts that duplicate footnote identifiers aren't handled correctly (i.e. any manuscript w/ duplicates is getting warnings and has incorrect footnotes). The renumber_footnotes option only results in --file-scope if there are duplicates, meaning that it won't happen for well formed existing manuscripts. This is all the more likely considering that the existing constraint leads authors to use inline footnotes (or possibly named footnotes, for which they'd be getting warnings about any duplicates).

  6. Note that this implementation is the less aggressive option 3 from the choices I presented previously. I chose this b/c implementing option 4 was going to be be considerably more complex/risky than I thought it would, and also because to me it seems cleaner to just map to pandoc's underlying semantics than to create a new set of semantics enabled by a special hack.

In this new scheme, we simply explain to users that they can use duplicated numeric footnotes if they'd like to (note that w/ the visual editor this is currently the only option) and they'll be handled correctly subject to the constraints of --file-scope. In turn these "constraints" basically amount to following bookdown documentation recommend practice for reference links in a large manuscript (always link to an explicit id e.g. #foo rather than rely on fragile implicit header linking).

@jjallaire
Copy link
Member Author

I made the change to use just file_scope (and do so by default). There is one test failure occurring though related to footnotes & splitting by section:

* checking tests ...
  Running ‘test-all.R’
  Running ‘test-rmd.R’ ERROR
Running the tests in ‘tests/test-rmd.R’ failed.
Last 13 lines of output:
  +   }
  +   ## footnote two is moved to second section
  +   if (!any(readLines('rmd/subsection-footnotes-1.html') == '<div class="footnotes">') ||
  +       !any(grepl('id="fn2"', readLines('rmd/subsection-footnotes-1.html')))) {
  +     stop('Failed to move the footnotes back to subsection 1 in parse_footnotes.Rmd')
  +   }
  +   # multiline footnote is also moved
  +   if (!any(readLines('rmd/subsection-footnotes-1.html') == '<div class="footnotes">') ||
  +       !any(grepl('id="fn3"', readLines('rmd/subsection-footnotes-1.html')))) {
  +     stop('Failed to move the footnotes back to subsection 1 in parse_footnotes.Rmd')
  +   }
  + })
  Error in names[[idx]] : subscript out of bounds
  Calls: local ... eval -> eval -> <Anonymous> -> convert -> <Anonymous>
  Execution halted

Copy link
Member

@yihui yihui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@yihui yihui merged commit 593668b into master Jun 17, 2020
@yihui yihui deleted the feature/pandoc-file-scope branch June 17, 2020 19:22
@jtbayly
Copy link
Contributor

jtbayly commented Aug 14, 2020

This is fantastic work! Thanks everybody. I've been running a previous version of bookdown and just discovered this problem. And here you've already fixed it! 👍

@yihui
Copy link
Member

yihui commented Aug 14, 2020

I meant to make a new CRAN release as soon as possible but haven't gotten the time yet. I'll try next week.

@jtbayly
Copy link
Contributor

jtbayly commented Oct 7, 2020

Ugh. I didn't realize this had been removed. Isn't it possible to leave this in but check for duplicate section titles and disable --file-scope if they are found?

I'm really at a loss right now about how to handle footnotes. By far the most obvious way to do them is to number them per chapter, but right now in the latest dev version, that outputs a book with totally screwed up footnotes (edit: with warning).

@yihui
Copy link
Member

yihui commented Oct 12, 2020

@jtbayly It was not removed, but disabled by default: 285cba4 You can re-enable it through the global option by yourself if you want, but I'm not sure why you would want to do that. Our original motivation to add this feature is no longer there: the R Markdown visual editor in RStudio supports footnotes well, so we don't really need this feature ourselves.


Update: Oh I just saw https://community.rstudio.com/t/how-can-i-enable-the-file-scope-option/83741/7.

@jtbayly
Copy link
Contributor

jtbayly commented Oct 12, 2020

Yeah, I was going to throw that link up. One other thing to consider: most of my editors won't be using RStudio. The beauty of markdown being that it can be edited in any text editor, and all that. :)

linogaliana pushed a commit to InseeFrLab/utilitr-bonnes-pratiques that referenced this pull request Jan 20, 2021
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 14, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants