Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Links in epub and epubcheck #766

Closed
3 tasks done
muschellij2 opened this issue Aug 28, 2019 · 4 comments · Fixed by #1426
Closed
3 tasks done

Links in epub and epubcheck #766

muschellij2 opened this issue Aug 28, 2019 · 4 comments · Fixed by #1426
Labels
bug an unexpected problem or unintended behavior

Comments

@muschellij2
Copy link

By filing an issue to this repo, I promise that

  • I have fully read the issue guide at https://yihui.name/issue/.
  • I have provided the necessary information about my issue.
    • If I'm asking a question, I have already asked it on Stack Overflow or RStudio Community, waited for at least 24 hours, and included a link to my question there.
    • If I'm filing a bug report, I have included a minimal, self-contained, and reproducible example, and have also included xfun::session_info('bookdown'). I have upgraded all my packages to their latest versions (e.g., R, RStudio, and R packages), and also tried the development version: remotes::install_github('rstudio/bookdown').
    • If I have posted the same issue elsewhere, I have also mentioned it in this issue.
  • I have learned the Github Markdown syntax, and formatted my issue correctly.

I understand that my issue may be closed if I don't fulfill my promises.

This has been posted at rstudio/bookdown-demo#42, but probably better here. I will look into pandoc and bookdown to see if I can diagnose

Clone the Repo

library(git2r)
library(bookdown)
local_path = "bookdown-demo"
git2r::clone("https://github.com/rstudio/bookdown-demo.git",
             local_path = local_path)
#> cloning into 'bookdown-demo'...
#> Receiving objects:   1% (6/530),    9 kb
#> Receiving objects:  11% (59/530),   17 kb
#> Receiving objects:  21% (112/530),  121 kb
#> Receiving objects:  31% (165/530),  321 kb
#> Receiving objects:  41% (218/530),  409 kb
#> Receiving objects:  51% (271/530),  473 kb
#> Receiving objects:  61% (324/530),  545 kb
#> Receiving objects:  71% (377/530),  577 kb
#> Receiving objects:  81% (430/530),  585 kb
#> Receiving objects:  91% (483/530),  593 kb
#> Receiving objects: 100% (530/530),  723 kb, done.
#> Local:    master /private/var/folders/1s/wrtqcpxn685_zk570bnx9_rr0000gr/T/Rtmpe46Igu/reprex95b433c7d4f8/bookdown-demo
#> Remote:   master @ origin (https://github.com/rstudio/bookdown-demo.git)
#> Head:     [4e34630] 2018-10-22: Add now.json and Dockerfile for building HTML book and deploy to now.sh (#36)
setwd(local_path)
epub_file = bookdown::render_book(
  "index.Rmd",
  bookdown::epub_book())
#> processing file: bookdown-demo.Rmd
#> output file: bookdown-demo.knit.md
#> /usr/local/bin/pandoc +RTS -K512m -RTS bookdown-demo.utf8.md --to epub3 --from markdown+autolink_bare_uris+ascii_identifiers+tex_math_single_backslash --output bookdown-demo.epub --number-sections --filter /usr/local/bin/pandoc-citeproc
#> 
#> Output created: _book/bookdown-demo.epub
epub_file = normalizePath(epub_file)

This is a function to fix one simple id, which is hard coded.

fix_one_id = function(epub_file) {
  epub_dir = tempfile()
  dir.create(epub_dir, recursive = TRUE)
  epub_files = unzip(epub_file, exdir = epub_dir, 
                     junkpaths = TRUE, list = TRUE)
  epub_files = epub_files$Name
  res = unzip(epub_file, exdir = epub_dir)
  
  all_xhtml = list.files(
    pattern = ".xhtml", 
    path = file.path(epub_dir, "EPUB", "text"),
    recursive = FALSE, full.names = TRUE)
  
  ifile = all_xhtml[2]
  # for (ifile in all_xhtml) {
  x = readLines(ifile)
  x[grep("file0", x)-1] = paste0(
    '<div class="figure" style="text-align: center" ', 
    'id="fig:nice-fig">')
  writeLines(x, ifile)
  # }
  owd = getwd()
  on.exit({
    setwd(owd)
  })
  setwd(epub_dir)
  new_epub = tempfile(fileext = ".epub")
  zip(new_epub, files = epub_files)
  # file.copy(new_epub, epub_file, overwrite = TRUE)
  return(new_epub)
}

Simple epub checker function

The epubcheck R function will get the output from epubcheck.

epubcheck = function(epub_file) {
  res = system2("epubcheck", epub_file, stdout = TRUE, stderr = TRUE)
  res
}

Then num_errors will count the number of errors

num_errors = function(out) {
  out = grep("Messages", out, value = TRUE)
  out = sub(".* (.*) errors.*", "\\1", out)
  as.numeric(out)
}

Test output

Here we see we get 5 errors from the result

result = epubcheck(epub_file)
#> Warning in system2("epubcheck", epub_file, stdout = TRUE, stderr
#> = TRUE): running command ''epubcheck' /private/var/folders/1s/
#> wrtqcpxn685_zk570bnx9_rr0000gr/T/Rtmpe46Igu/reprex95b433c7d4f8/bookdown-
#> demo/_book/bookdown-demo.epub 2>&1' had status 1
result
#>  [1] "Validating using EPUB version 3.2 rules."                                                                                                                                    
#>  [2] "ERROR(RSC-005): ./bookdown-demo/_book/bookdown-demo.epub/EPUB/nav.xhtml(19,9): Error while parsing file: element \"ol\" incomplete; missing required element \"li\""         
#>  [3] "ERROR(RSC-005): ./bookdown-demo/_book/bookdown-demo.epub/EPUB/text/ch002.xhtml(87,74): Error while parsing file: value of attribute \"width\" is invalid; must be an integer"
#>  [4] "ERROR(RSC-012): ./bookdown-demo/_book/bookdown-demo.epub/EPUB/text/ch002.xhtml(82,247): Fragment identifier is not defined."                                                 
#>  [5] "ERROR(RSC-012): ./bookdown-demo/_book/bookdown-demo.epub/EPUB/text/ch002.xhtml(92,123): Fragment identifier is not defined."                                                 
#>  [6] "ERROR(RSC-012): ./bookdown-demo/_book/bookdown-demo.epub/EPUB/text/ch002.xhtml(92,252): Fragment identifier is not defined."                                                 
#>  [7] ""                                                                                                                                                                            
#>  [8] "Check finished with errors"                                                                                                                                                  
#>  [9] "Messages: 0 fatals / 5 errors / 0 warnings / 0 infos"                                                                                                                        
#> [10] ""                                                                                                                                                                            
#> [11] "EPUBCheck completed"                                                                                                                                                         
#> attr(,"status")
#> [1] 1
num_errors(result)
#> [1] 5

Here we see we get only 4 errors (one fixed) after adding an id.

fixed = fix_one_id(epub_file)
new_result = epubcheck(fixed)
#> Warning in system2("epubcheck", epub_file, stdout = TRUE,
#> stderr = TRUE): running command ''epubcheck' /var/folders/1s/
#> wrtqcpxn685_zk570bnx9_rr0000gr/T//RtmpRWHDVK/file993116f24b23.epub 2>&1'
#> had status 1
new_result
#>  [1] "Validating using EPUB version 3.2 rules."                                                                                                                                                                              
#>  [2] "ERROR(RSC-005): /var/folders/1s/wrtqcpxn685_zk570bnx9_rr0000gr/T//RtmpRWHDVK/file993116f24b23.epub/EPUB/nav.xhtml(19,9): Error while parsing file: element \"ol\" incomplete; missing required element \"li\""         
#>  [3] "ERROR(RSC-005): /var/folders/1s/wrtqcpxn685_zk570bnx9_rr0000gr/T//RtmpRWHDVK/file993116f24b23.epub/EPUB/text/ch002.xhtml(87,74): Error while parsing file: value of attribute \"width\" is invalid; must be an integer"
#>  [4] "ERROR(RSC-012): /var/folders/1s/wrtqcpxn685_zk570bnx9_rr0000gr/T//RtmpRWHDVK/file993116f24b23.epub/EPUB/text/ch002.xhtml(82,247): Fragment identifier is not defined."                                                 
#>  [5] "ERROR(RSC-012): /var/folders/1s/wrtqcpxn685_zk570bnx9_rr0000gr/T//RtmpRWHDVK/file993116f24b23.epub/EPUB/text/ch002.xhtml(92,252): Fragment identifier is not defined."                                                 
#>  [6] ""                                                                                                                                                                                                                      
#>  [7] "Check finished with errors"                                                                                                                                                                                            
#>  [8] "Messages: 0 fatals / 4 errors / 0 warnings / 0 infos"                                                                                                                                                                  
#>  [9] ""                                                                                                                                                                                                                      
#> [10] "EPUBCheck completed"                                                                                                                                                                                                   
#> attr(,"status")
#> [1] 1
num_errors(new_result)
#> [1] 4

Created on 2019-08-28 by the reprex package (v0.3.0)

Session info
devtools::session_info()
#> ─ Session info ──────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 3.6.0 (2019-04-26)
#>  os       macOS Mojave 10.14.6        
#>  system   x86_64, darwin15.6.0        
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  ctype    en_US.UTF-8                 
#>  tz       America/New_York            
#>  date     2019-08-28                  
#> 
#> ─ Packages ──────────────────────────────────────────────────────────────
#>  package     * version     date       lib
#>  assertthat    0.2.1       2019-03-21 [1]
#>  backports     1.1.4       2019-04-10 [1]
#>  bookdown    * 0.11        2019-05-28 [1]
#>  callr         3.3.1       2019-07-18 [1]
#>  cli           1.1.0       2019-03-19 [1]
#>  crayon        1.3.4       2017-09-16 [1]
#>  curl          4.0         2019-07-22 [1]
#>  desc          1.2.0       2019-07-10 [1]
#>  devtools      2.1.0       2019-07-06 [1]
#>  digest        0.6.20      2019-07-04 [1]
#>  evaluate      0.14        2019-05-28 [1]
#>  fs            1.3.1       2019-05-06 [1]
#>  git2r       * 0.26.1      2019-06-29 [1]
#>  glue          1.3.1       2019-03-12 [1]
#>  highr         0.8         2019-03-20 [1]
#>  htmltools     0.3.6       2017-04-28 [1]
#>  httr          1.4.1       2019-08-05 [1]
#>  knitr         1.24        2019-08-08 [1]
#>  magrittr      1.5         2014-11-22 [1]
#>  memoise       1.1.0       2017-04-21 [1]
#>  mime          0.7         2019-06-11 [1]
#>  pkgbuild      1.0.3       2019-03-20 [1]
#>  pkgload       1.0.2       2018-10-29 [1]
#>  prettyunits   1.0.2       2015-07-13 [1]
#>  processx      3.4.1       2019-07-18 [1]
#>  ps            1.3.0       2018-12-21 [1]
#>  R6            2.4.0       2019-02-14 [1]
#>  Rcpp          1.0.2       2019-07-25 [1]
#>  remotes       2.1.0       2019-06-24 [1]
#>  rlang         0.4.0       2019-06-25 [1]
#>  rmarkdown     1.14        2019-07-12 [1]
#>  rprojroot     1.3-2       2018-01-03 [1]
#>  rstudioapi    0.10.0-9000 2019-07-30 [1]
#>  sessioninfo   1.1.1       2018-11-05 [1]
#>  stringi       1.4.3       2019-03-12 [1]
#>  stringr       1.4.0       2019-02-10 [1]
#>  testthat      2.1.1       2019-04-23 [1]
#>  usethis       1.5.1.9000  2019-08-15 [1]
#>  withr         2.1.2       2018-03-15 [1]
#>  xfun          0.8         2019-06-25 [1]
#>  xml2          1.2.1       2019-07-29 [1]
#>  yaml          2.2.0       2018-07-25 [1]
#>  source                             
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  Github (muschellij2/desc@b0c374f)  
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  Github (rstudio/rstudioapi@31d1afa)
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  local                              
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#>  CRAN (R 3.6.0)                     
#> 
#> [1] /Library/Frameworks/R.framework/Versions/3.6/Resources/library
@cderv
Copy link
Collaborator

cderv commented May 4, 2023

I finally had opportunity to look closer into that. We indeed to not add the id when resolving the reference for epub using our 'no so great' md resolution.

Here is what happens internally (FYI @yihui as you may have thoughts)

We resolve reference for epub in process_markdown() by doing a trick :

  • We render the input md document to HTML and look for the resolution there

    bookdown/R/ebook.R

    Lines 92 to 94 in feca9bc

    x = read_utf8(intermediate_html)
    x = clean_html_tags(x)
    figs = parse_fig_labels(x, global)
  • Base on what we resolve on HTML we replace the (#fig:id) found in the input md file by the correct label and number
    content[i] = resolve_refs_md(content[i], c(figs$ref_table, parse_section_labels(x)), to_md)

Part of the resolving, we correctly create a <span> with id

bookdown/R/html.R

Lines 717 to 727 in e8dfa70

switch(type, fig = {
if (length(grep('^<p class="caption', content[i - 0:1])) == 0) {
# remove these labels, because there must be a caption on this or
# previous line (possible negative case: the label appears in the alt
# text of <img>)
labs[[i]] = character(length(lab))
next
}
labs[[i]] = label_prefix(type, sep = ': ')(num)
k = max(figs[figs <= i])
content[k] = paste(c(content[k], sprintf('<span style="display:block;" id="%s"></span>', lab)), collapse = '')

and this is used in HTML to create the anchor for link associated to the reference that we create in ref_to_number

bookdown/R/ebook.R

Lines 139 to 142 in feca9bc

# look for \@ref(label) and resolve to actual figure/table/section numbers
m = gregexpr('(?<!`)\\\\@ref\\(([-:[:alnum:]]+)\\)', content, perl = TRUE)
refs = regmatches(content, m)
regmatches(content, m) = lapply(refs, ref_to_number, ref_table, TRUE)

However, for epub we never used the $content part of the resolution as we do for HTML here

bookdown/R/html.R

Lines 599 to 601 in e8dfa70

res = parse_fig_labels(content, global)
content = res$content
ref_table = c(res$ref_table, parse_section_labels(content))

So indeed we don't add the id. I am surprised that of this because I seems to not have work from the start.

Anyhow, I see several solutions :

  • Do like with theorem and add it on the caption label

    bookdown/R/ebook.R

    Lines 123 to 127 in e8dfa70

    if (type %in% theorem_abbr) {
    id = sprintf('<span id="%s"></span>', j)
    sep = ''
    }
    label = label_prefix(type, sep = sep)(ref_table[j])

  • Leverage the recent fig.id option from knitr for the epub_format (Assign default id to each fig for html output yihui/knitr#2169)

  • Do a specific epub post processing by unzipping the epub instead of trying to it in a pre processing.

Probably the first one will solve it and be the easiest - the two other would be better on the long run probably but they would be best done with a big rewrite of the processing (and not sure that will happen). We'll see.

@cderv
Copy link
Collaborator

cderv commented May 4, 2023

Additional note: It seems the issues is the same with table and figure - hence the several Fragment identifier is not defined. error we see in epubcheck

So it seems the first fix mentioned is the easiest for now.

I think we have last one error of id due to section reference. It seems something related to pandoc as we generate

<a href="javascript:void(0)" data-xuv2xq4gdswhm4kgz2zkee="{"name": "EPUB/text/ch002.xhtml", "frag": "methods"}">4</a>

when inspecting the epub in calibre ebook viewer and obviously there is no element with id methods in the current ch002.xhtml page

This is an issue to fix differently probably (tracked in #890)

@cderv
Copy link
Collaborator

cderv commented May 4, 2023

@muschellij2 @N0rbert @tstratopoulos @jasonmosborne I believe PR #1426 should fix the issue of epubcheck errors
This is the simplest fix for next release. Hopefully this is good.

If you want to try it...

On my side it remove the error from epubcheck results

Copy link

github-actions bot commented Nov 1, 2023

This old thread has been automatically locked. If you think you have found something related to this, please open a new issue by following the issue guide (https://yihui.org/issue/), and link to this old issue if necessary.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 1, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug an unexpected problem or unintended behavior
Projects
Archived in project
2 participants