-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error on Latin1 locale on Debian #264
Comments
It took me a while to figure out how to reproduce the issue. Here's a build using Docker that results in the errors:
|
I've narrowed this down to Line 56 in 622c76a
(And probably the same issue would happen here: httpuv/src/filedatasource-unix.cpp Line 14 in 622c76a
The In gdb: RD -d gdb
b fs.cpp:56
r
library(devtools)
library(testthat)
load_all()
# test code from: https://github.com/rstudio/httpuv/blob/622c76a749efbbb13903cd488cd1b8c54a48793c/tests/testthat/test-static-paths.R#L768-L795
nonascii_path <- test_path("apps/f\U00FC")
dir.create(nonascii_path)
on.exit(unlink(nonascii_path, recursive = TRUE))
index_file_path <- file.path(nonascii_path, "index.html")
writeLines("Hello world!", index_file_path)
file_content <- raw_file_content(index_file_path)
s <- startServer("0.0.0.0", randomPort(),
list(
call = function(req) {
list(
status = 200L,
headers = list('Content-Type' = 'text/html'),
body = "R code path"
)
},
staticPaths = list(
"/f\U00FC" = nonascii_path,
"/foo" = nonascii_path
)
)
)
on.exit(s$stop(), add = TRUE)
# URL-encoded non-ASCII URL path, which maps to non-ASCII local path.
r <- fetch(local_url("/f%C3%BC", s$getPort()))
# ======= In GDB =======
# It can't lstat() the filename. (We use lstat() instead of stat() because gdb
# doesn't like that the stat function has the same name as the stat struct type.)
p (int)lstat(filename.c_str(), &sb)
#> $74 = -1
# With the explicit filename, copied and pasted
p filename.c_str()
#> $75 = 0x7fffe8013620 "/httpuv/tests/testthat/apps/fü"
p (int)lstat("/httpuv/tests/testthat/apps/fü" , &sb)
#> $76 = -1
# With the explicit filename, with native encoding (I think)
p (int)lstat("/httpuv/tests/testthat/apps/f\xfc", &sb)
#> $78 = 0
# Show that these strings are not identical - strlen is different:
p (int)strlen("/httpuv/tests/testthat/apps/fü")
#> $79 = 31
p (int)strlen("/httpuv/tests/testthat/apps/f\xfc")
#> $81 = 30
# \U00FC also returns the shorter byte sequence
p (int)strlen("/httpuv/tests/testthat/apps/f\U00FC")
#> $82 = 30
# Show the contents of the different encodings
# The filename.c_str() value is the UTF-8 encoding of ü, which is 195 188.
p (unsigned char) "ü"[0]
#> $115 = 195 '?'
p (unsigned char) "ü"[1]
#> $116 = 188 '?'
p (unsigned char) "ü"[2]
#> $117 = 0 '\000'
# Using "\xfc" is the ISO 8859-1 encoding.
p (unsigned char) "\xfc"[0]
#> $91 = 252 '?'
p (unsigned char) "\xfc"[1]
#> $92 = 0 '\000'
# Using "\U00FC" is also the ISO 8859-1 encoding.
p (unsigned char) "\U00FC"[0]
#> $93 = 252 '?'
p (unsigned char) "\U00FC"[1]
#> $94 = 0 '\000' So I think the real solution would be to convert the string to the native encoding before looking for the file. However, I don't think that it's really worth doing at this time, for a couple of reasons:
In summary, this is an edge case where the cost and risk of fixing it isn't really worth it. I think we should just disable the test on Unix systems where there's a non-UTF-8 locale. |
An email from CRAN:
Check log
The text was updated successfully, but these errors were encountered: