-
-
Notifications
You must be signed in to change notification settings - Fork 968
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
html notebook rendering error when ploting data with unicode chinese char in ggplotly #1762
Comments
Btw since the problem lies in the function |
I am having the same issue, probably due to the Chinese characters in the data. Could you provide your solution to this issue? Thank you! |
Unfortunately, I can't reproduce on my end with last dev version. I get a plotly graph with chinese character in legend. But I may not have the correct encoding... @caimiao0714 do you have another minimal reproducible example ? @everdark if you think the issue is in Not sure what and where to fix this yet... 🤔 |
Hello, I think I have the same problem with French characters. I get an error when i try to rmarkdown::render() a DT of a data.frame that contains at least 21 letters with an accent. (e.g. "é" that is very common in French) The Rstudio preview is fine, the error occurs only with render(). A minimal reproducible Rmd file:
then:
returns:
In my case, it only happens when the number of characters with an accent is >= 21 (globally in the data.frame, it doesn't have to be in one cell or in one variable).
By filing an issue to this repo, I promise that
I understand that my issue may be closed if I don't fulfill my promises. |
Thanks @debdagybra for the reprex. I helped me found the issue, that I previously missed in @everdark post. Sorry ! There is an issue with why need to find why the additional closing part is added without opening, in order to find where the issue is. Still not sure where to fix it. The 21 characters at least is also odd but with one less é it works. 🤷♂ Thank for the report everyone ! |
To workaround it, comment the sanity check block of extractPreserveChunks <- function(strval) {
# Literal start/end marker text. Case sensitive.
startmarker <- "<!--html_preserve-->"
endmarker <- "<!--/html_preserve-->"
# Start and end marker length MUST be different, it's how we tell them apart
startmarker_len <- nchar(startmarker)
endmarker_len <- nchar(endmarker)
# Pattern must match both start and end markers
pattern <- "<!--/?html_preserve-->"
# It simplifies string handling greatly to collapse multiple char elements
if (length(strval) != 1)
strval <- paste(strval, collapse = "\n")
# matches contains the index of all the start and end markers
matches <- gregexpr(pattern, strval)[[1]]
lengths <- attr(matches, "match.length", TRUE)
# No markers? Just return.
if (matches[[1]] == -1)
return(list(value = strval, chunks = character(0)))
# If TRUE, it's a start; if FALSE, it's an end
boundary_type <- lengths == startmarker_len
# Positive number means we're inside a region, zero means we just exited to
# the top-level, negative number means error (an end without matching start).
# For example:
# boundary_type - TRUE TRUE FALSE FALSE TRUE FALSE
# preserve_level - 1 2 1 0 1 0
preserve_level <- cumsum(ifelse(boundary_type, 1, -1))
# Sanity check.
if (any(preserve_level < 0) || tail(preserve_level, 1) != 0) {
#stop("Invalid nesting of html_preserve directives")
}
# Identify all the top-level boundary markers. We want to find all of the
# elements of preserve_level whose value is 0 and preceding value is 1, or
# whose value is 1 and preceding value is 0. Since we know that preserve_level
# values can only go up or down by 1, we can simply shift preserve_level by
# one element and add it to preserve_level; in the result, any value of 1 is a
# match.
is_top_level <- 1 == (preserve_level + c(0, preserve_level[-length(preserve_level)]))
preserved <- character(0)
top_level_matches <- matches[is_top_level]
# Iterate backwards so string mutation doesn't screw up positions for future
# iterations
for (i in seq.int(length(top_level_matches) - 1, 1, by = -2)) {
start_outer <- top_level_matches[[i]]
start_inner <- start_outer + startmarker_len
end_inner <- top_level_matches[[i+1]]
end_outer <- end_inner + endmarker_len
id <- htmltools:::withPrivateSeed(
paste("preserve", paste(
format(as.hexmode(sample(256, 8, replace = TRUE)-1), width=2),
collapse = ""),
sep = "")
)
preserved[id] <- gsub(pattern, "", substr(strval, start_inner, end_inner-1))
strval <- paste(
substr(strval, 1, start_outer - 1),
id,
substr(strval, end_outer, nchar(strval)),
sep="")
substr(strval, start_outer, end_outer-1) <- id
}
list(value = strval, chunks = preserved)
} Then to overwrite the function exported from the package, run this line (after the above function) before you render the rmarkdown file: assignInNamespace("extractPreserveChunks", extractPreserveChunks, "htmltools") |
Here's another clue. I have extracted the argument Everytime, we have a value (nr 35 in my example) that starts with But when we have any accent, the value ends with So, I still don't know where the error is, but it's before Edit: My guess is that the function that writes this string always have OK_noAccent OK_1Accent OK_20Accents ERR_21Accents ERR_25Accents |
Thanks @debdagybra for your analysis ! I now identified the issue and you were right about the substring with a number of character. In fact this is a number of bytes, and the issue is here. Lines 3 to 8 in ec8fd0f
It may be time to use
Yes this is it ! The fix is to be made in Thanks all for the investigation ! |
This comment has been minimized.
This comment has been minimized.
Yes, it works perfectly. With the example and with my real data. |
I've now added the changes in the PR. Waiting to be merged now. |
This old thread has been automatically locked. If you think you have found something related to this, please open a new issue by following the issue guide (https://yihui.org/issue/), and link to this old issue if necessary. |
A minimally reprodicuble
Rmd
file:Then run:
Results in the following error:
Interestingly, in RStudio for preview everything is fine.
And I've found out that the length matters.
So
"我我我我"
will cause the error but NOT"我我我"
.Here is my system info:
I encounter the same error in my Ubuntu 16.04 machine.
I really have no idea what's going on here and it took me quite some time to finally pin down the root cause being UTF-8 Chinese characters. Still no clue why this happens. :(
By filing an issue to this repo, I promise that
xfun::session_info('rmarkdown')
. I have upgraded all my packages to their latest versions (e.g., R, RStudio, and R packages), and also tried the development version:remotes::install_github('rstudio/rmarkdown')
.I understand that my issue may be closed if I don't fulfill my promises.
The text was updated successfully, but these errors were encountered: