Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error on Latin1 locale on Debian #264

Closed
wch opened this issue May 4, 2020 · 2 comments · Fixed by #265
Closed

Error on Latin1 locale on Debian #264

wch opened this issue May 4, 2020 · 2 comments · Fixed by #265

Comments

@wch
Copy link
Collaborator

wch commented May 4, 2020

An email from CRAN:

Please see the problems shown on
https://cran.r-project.org/web/checks/check_results_httpuv.html.
The Debian r-devel-clang problems can be reproduced by checking in a strict Latin1 locale (in my case, LANG=en_US.iso88591).

Check log
* using log directory '/home/hornik/tmp/R.check/r-devel-clang/Work/PKGS/httpuv.Rcheck'
* using R Under development (unstable) (2020-05-01 r78341)
* using platform: x86_64-pc-linux-gnu (64-bit)
* using session charset: ISO8859-15
* checking for file 'httpuv/DESCRIPTION' ... OK
* checking extension type ... Package
* this is package 'httpuv' version '1.5.2'
* package encoding: UTF-8
* checking package namespace information ... OK
* checking package dependencies ... OK
* checking if this is a source package ... OK
* checking if there is a namespace ... OK
* checking for executable files ... OK
* checking for hidden files and directories ... OK
* checking for portable file names ... OK
* checking for sufficient/correct file permissions ... OK
* checking serialization versions ... OK
* checking whether package 'httpuv' can be installed ... OK
* checking package directory ... OK
* checking for future file timestamps ... OK
* checking DESCRIPTION meta-information ... OK
* checking top-level files ... OK
* checking for left-over files ... OK
* checking index information ... OK
* checking package subdirectories ... OK
* checking R files for non-ASCII characters ... OK
* checking R files for syntax errors ... OK
* checking whether the package can be loaded ... OK
* checking whether the package can be loaded with stated dependencies ... OK
* checking whether the package can be unloaded cleanly ... OK
* checking whether the namespace can be loaded with stated dependencies ... OK
* checking whether the namespace can be unloaded cleanly ... OK
* checking loading without being on the library search path ... OK
* checking use of S3 registration ... OK
* checking dependencies in R code ... OK
* checking S3 generic/method consistency ... OK
* checking replacement functions ... OK
* checking foreign function calls ... OK
* checking R code for possible problems ... [5s/6s] OK
* checking Rd files ... OK
* checking Rd metadata ... OK
* checking Rd line widths ... OK
* checking Rd cross-references ... OK
* checking for missing documentation entries ... OK
* checking for code/documentation mismatches ... OK
* checking Rd \usage sections ... OK
* checking Rd contents ... OK
* checking for unstated dependencies in examples ... OK
* checking line endings in C/C++/Fortran sources/headers ... OK
* checking line endings in Makefiles ... OK
* checking compilation flags in Makevars ... OK
* checking for GNU extensions in Makefiles ... NOTE
GNU make is a SystemRequirements.
* checking for portable use of $(BLAS_LIBS) and $(LAPACK_LIBS) ... OK
* checking use of PKG_*FLAGS in Makefiles ... OK
* checking use of SHLIB_OPENMP_*FLAGS in Makefiles ... OK
* checking include directives in Makefiles ... OK
* checking pragmas in C/C++ headers and code ... OK
* checking compilation flags used ... OK
* checking compiled code ... OK
* checking examples ... [1s/1s] OK
* checking for unstated dependencies in 'tests' ... OK
* checking tests ... [6s/8s] ERROR
  Running 'testthat.R' [6s/7s]
Running the tests in 'tests/testthat.R' failed.
Complete output:
  > library(testthat)
  > library(httpuv)
  > 
  > test_check("httpuv")
  -- 1. Failure: Paths with non-ASCII characters (@test-static-paths.R#796)  -----
  r$status_code not identical to 200L.
  1/1 mismatches
  [1] 404 - 200 == 204
  
  -- 2. Failure: Paths with non-ASCII characters (@test-static-paths.R#797)  -----
  r$content not identical to `file_content`.
  Lengths (14, 13) differ (comparison on first 13 components)
  13 element mismatches
  
  -- 3. Failure: Paths with non-ASCII characters (@test-static-paths.R#801)  -----
  r$status_code not identical to 200L.
  1/1 mismatches
  [1] 404 - 200 == 204
  
  -- 4. Failure: Paths with non-ASCII characters (@test-static-paths.R#802)  -----
  r$content not identical to `file_content`.
  Lengths (14, 13) differ (comparison on first 13 components)
  13 element mismatches
  
  == testthat results  ===========================================================
  [ OK: 204 | SKIPPED: 8 | WARNINGS: 0 | FAILED: 4 ]
  1. Failure: Paths with non-ASCII characters (@test-static-paths.R#796) 
  2. Failure: Paths with non-ASCII characters (@test-static-paths.R#797) 
  3. Failure: Paths with non-ASCII characters (@test-static-paths.R#801) 
  4. Failure: Paths with non-ASCII characters (@test-static-paths.R#802) 
  
  Error: testthat unit tests failed
  Execution halted
* checking PDF version of manual ... OK
* checking for non-standard things in the check directory ... OK
* DONE
Status: 1 ERROR, 1 NOTE
@wch
Copy link
Collaborator Author

wch commented May 4, 2020

It took me a while to figure out how to reproduce the issue. Here's a build using Docker that results in the errors:

docker run --security-opt seccomp=unconfined --rm -ti rocker/r-devel /bin/bash

apt install git libssl-dev vim nano

echo en_US ISO-8859-1 >> /etc/locale.gen
locale-gen

echo MAKEFLAGS=-j4 > ~/.Renviron

git clone https://github.com/rstudio/httpuv.git
RD -e "install.packages('remotes')"
RD -e 'remotes::install_local("httpuv", dependencies = TRUE)'

export LANG=en_US.iso88591
# This prints a warning, but it seems to be necessary for things to work.
export LC_ALL=en_US.iso88591

RD --quiet -e 'Sys.getlocale()'

RD CMD build httpuv
RD CMD check httpuv_1.5.2.9000.tar.gz 

@wch
Copy link
Collaborator Author

wch commented May 4, 2020

I've narrowed this down to

if (stat(filename.c_str(), &sb) == 0 && S_ISDIR(sb.st_mode)) {

(And probably the same issue would happen here:

_fd = open(path.c_str(), O_RDONLY);
)

The stat() call is passed filename.c_str(). I believe this is a UTF-8 encoded string, but since the process is running with non-UTF-8 locale, it can't find the file.

In gdb:

RD -d gdb
b fs.cpp:56
r

library(devtools)
library(testthat)
load_all()

# test code from: https://github.com/rstudio/httpuv/blob/622c76a749efbbb13903cd488cd1b8c54a48793c/tests/testthat/test-static-paths.R#L768-L795

  nonascii_path <- test_path("apps/f\U00FC")
  dir.create(nonascii_path)
  on.exit(unlink(nonascii_path, recursive = TRUE))

  index_file_path <- file.path(nonascii_path, "index.html")
  writeLines("Hello world!", index_file_path)
  file_content <- raw_file_content(index_file_path)

  s <- startServer("0.0.0.0", randomPort(),
    list(
      call = function(req) {
        list(
          status = 200L,
          headers = list('Content-Type' = 'text/html'),
          body = "R code path"
        )
      },
      staticPaths = list(
        "/f\U00FC" = nonascii_path,
        "/foo" = nonascii_path
      )
    )
  )
  on.exit(s$stop(), add = TRUE)

  # URL-encoded non-ASCII URL path, which maps to non-ASCII local path.
  r <- fetch(local_url("/f%C3%BC", s$getPort()))

# ======= In GDB =======
# It can't lstat() the filename. (We use lstat() instead of stat() because gdb
# doesn't like that the stat function has the same name as the stat struct type.)
p (int)lstat(filename.c_str(), &sb)
#> $74 = -1


# With the explicit filename, copied and pasted
p filename.c_str()
#> $75 = 0x7fffe8013620 "/httpuv/tests/testthat/apps/fü"
p (int)lstat("/httpuv/tests/testthat/apps/fü" , &sb)
#> $76 = -1


# With the explicit filename, with native encoding (I think)
p (int)lstat("/httpuv/tests/testthat/apps/f\xfc", &sb)
#> $78 = 0


# Show that these strings are not identical - strlen is different:
p (int)strlen("/httpuv/tests/testthat/apps/fü")
#> $79 = 31
p (int)strlen("/httpuv/tests/testthat/apps/f\xfc")
#> $81 = 30
# \U00FC also returns the shorter byte sequence
p (int)strlen("/httpuv/tests/testthat/apps/f\U00FC")
#> $82 = 30


# Show the contents of the different encodings
# The filename.c_str() value is the UTF-8 encoding of ü, which is 195 188.
p (unsigned char) "ü"[0]
#> $115 = 195 '?'
p (unsigned char) "ü"[1]
#> $116 = 188 '?'
p (unsigned char) "ü"[2]
#> $117 = 0 '\000'

# Using "\xfc" is the ISO 8859-1 encoding.
p (unsigned char) "\xfc"[0]
#> $91 = 252 '?'
p (unsigned char) "\xfc"[1]
#> $92 = 0 '\000'

# Using "\U00FC" is also the ISO 8859-1 encoding.
p (unsigned char) "\U00FC"[0]
#> $93 = 252 '?'
p (unsigned char) "\U00FC"[1]
#> $94 = 0 '\000'

So I think the real solution would be to convert the string to the native encoding before looking for the file. However, I don't think that it's really worth doing at this time, for a couple of reasons:

  • This only happens in a very weird case, on UNIX, where the process is using a non-UTF-8 locale, and there's a directory with a non-ASCII to be served. If a user is doing something like this, they're asking for problems -- UNIX systems almost always use UTF-8 these days.
  • This solution can't work if the process's locale is unable to represent the filename. (For example, in this ISO8599-1 locale, Chinese characters can't be represented.) I don't think there's any way to make it work in this situation.
  • The current system does work on Windows, where the locale is likely to be non-UTF-8. So Windows is probably doing something different under the hood, or handling this more intelligently.

In summary, this is an edge case where the cost and risk of fixing it isn't really worth it. I think we should just disable the test on Unix systems where there's a non-UTF-8 locale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant