Error on Latin1 locale on Debian #264

wch · 2020-05-04T17:14:49Z

An email from CRAN:

Please see the problems shown on
https://cran.r-project.org/web/checks/check_results_httpuv.html.
The Debian r-devel-clang problems can be reproduced by checking in a strict Latin1 locale (in my case, LANG=en_US.iso88591).

Check log

* using log directory '/home/hornik/tmp/R.check/r-devel-clang/Work/PKGS/httpuv.Rcheck'
* using R Under development (unstable) (2020-05-01 r78341)
* using platform: x86_64-pc-linux-gnu (64-bit)
* using session charset: ISO8859-15
* checking for file 'httpuv/DESCRIPTION' ... OK
* checking extension type ... Package
* this is package 'httpuv' version '1.5.2'
* package encoding: UTF-8
* checking package namespace information ... OK
* checking package dependencies ... OK
* checking if this is a source package ... OK
* checking if there is a namespace ... OK
* checking for executable files ... OK
* checking for hidden files and directories ... OK
* checking for portable file names ... OK
* checking for sufficient/correct file permissions ... OK
* checking serialization versions ... OK
* checking whether package 'httpuv' can be installed ... OK
* checking package directory ... OK
* checking for future file timestamps ... OK
* checking DESCRIPTION meta-information ... OK
* checking top-level files ... OK
* checking for left-over files ... OK
* checking index information ... OK
* checking package subdirectories ... OK
* checking R files for non-ASCII characters ... OK
* checking R files for syntax errors ... OK
* checking whether the package can be loaded ... OK
* checking whether the package can be loaded with stated dependencies ... OK
* checking whether the package can be unloaded cleanly ... OK
* checking whether the namespace can be loaded with stated dependencies ... OK
* checking whether the namespace can be unloaded cleanly ... OK
* checking loading without being on the library search path ... OK
* checking use of S3 registration ... OK
* checking dependencies in R code ... OK
* checking S3 generic/method consistency ... OK
* checking replacement functions ... OK
* checking foreign function calls ... OK
* checking R code for possible problems ... [5s/6s] OK
* checking Rd files ... OK
* checking Rd metadata ... OK
* checking Rd line widths ... OK
* checking Rd cross-references ... OK
* checking for missing documentation entries ... OK
* checking for code/documentation mismatches ... OK
* checking Rd \usage sections ... OK
* checking Rd contents ... OK
* checking for unstated dependencies in examples ... OK
* checking line endings in C/C++/Fortran sources/headers ... OK
* checking line endings in Makefiles ... OK
* checking compilation flags in Makevars ... OK
* checking for GNU extensions in Makefiles ... NOTE
GNU make is a SystemRequirements.
* checking for portable use of $(BLAS_LIBS) and $(LAPACK_LIBS) ... OK
* checking use of PKG_*FLAGS in Makefiles ... OK
* checking use of SHLIB_OPENMP_*FLAGS in Makefiles ... OK
* checking include directives in Makefiles ... OK
* checking pragmas in C/C++ headers and code ... OK
* checking compilation flags used ... OK
* checking compiled code ... OK
* checking examples ... [1s/1s] OK
* checking for unstated dependencies in 'tests' ... OK
* checking tests ... [6s/8s] ERROR
  Running 'testthat.R' [6s/7s]
Running the tests in 'tests/testthat.R' failed.
Complete output:
  > library(testthat)
  > library(httpuv)
  > 
  > test_check("httpuv")
  -- 1. Failure: Paths with non-ASCII characters (@test-static-paths.R#796)  -----
  r$status_code not identical to 200L.
  1/1 mismatches
  [1] 404 - 200 == 204
  
  -- 2. Failure: Paths with non-ASCII characters (@test-static-paths.R#797)  -----
  r$content not identical to `file_content`.
  Lengths (14, 13) differ (comparison on first 13 components)
  13 element mismatches
  
  -- 3. Failure: Paths with non-ASCII characters (@test-static-paths.R#801)  -----
  r$status_code not identical to 200L.
  1/1 mismatches
  [1] 404 - 200 == 204
  
  -- 4. Failure: Paths with non-ASCII characters (@test-static-paths.R#802)  -----
  r$content not identical to `file_content`.
  Lengths (14, 13) differ (comparison on first 13 components)
  13 element mismatches
  
  == testthat results  ===========================================================
  [ OK: 204 | SKIPPED: 8 | WARNINGS: 0 | FAILED: 4 ]
  1. Failure: Paths with non-ASCII characters (@test-static-paths.R#796) 
  2. Failure: Paths with non-ASCII characters (@test-static-paths.R#797) 
  3. Failure: Paths with non-ASCII characters (@test-static-paths.R#801) 
  4. Failure: Paths with non-ASCII characters (@test-static-paths.R#802) 
  
  Error: testthat unit tests failed
  Execution halted
* checking PDF version of manual ... OK
* checking for non-standard things in the check directory ... OK
* DONE
Status: 1 ERROR, 1 NOTE

The text was updated successfully, but these errors were encountered:

wch · 2020-05-04T17:33:14Z

It took me a while to figure out how to reproduce the issue. Here's a build using Docker that results in the errors:

docker run --security-opt seccomp=unconfined --rm -ti rocker/r-devel /bin/bash

apt install git libssl-dev vim nano

echo en_US ISO-8859-1 >> /etc/locale.gen
locale-gen

echo MAKEFLAGS=-j4 > ~/.Renviron

git clone https://github.com/rstudio/httpuv.git
RD -e "install.packages('remotes')"
RD -e 'remotes::install_local("httpuv", dependencies = TRUE)'

export LANG=en_US.iso88591
# This prints a warning, but it seems to be necessary for things to work.
export LC_ALL=en_US.iso88591

RD --quiet -e 'Sys.getlocale()'

RD CMD build httpuv
RD CMD check httpuv_1.5.2.9000.tar.gz

wch · 2020-05-04T19:40:25Z

I've narrowed this down to

httpuv/src/fs.cpp

Line 56 in 622c76a

if (stat(filename.c_str(), &sb) == 0 && S_ISDIR(sb.st_mode)) {

(And probably the same issue would happen here:

httpuv/src/filedatasource-unix.cpp

Line 14 in 622c76a

_fd = open(path.c_str(), O_RDONLY);

)

The stat() call is passed filename.c_str(). I believe this is a UTF-8 encoded string, but since the process is running with non-UTF-8 locale, it can't find the file.

In gdb:

RD -d gdb
b fs.cpp:56
r

library(devtools)
library(testthat)
load_all()

# test code from: https://github.com/rstudio/httpuv/blob/622c76a749efbbb13903cd488cd1b8c54a48793c/tests/testthat/test-static-paths.R#L768-L795

  nonascii_path <- test_path("apps/f\U00FC")
  dir.create(nonascii_path)
  on.exit(unlink(nonascii_path, recursive = TRUE))

  index_file_path <- file.path(nonascii_path, "index.html")
  writeLines("Hello world!", index_file_path)
  file_content <- raw_file_content(index_file_path)

  s <- startServer("0.0.0.0", randomPort(),
    list(
      call = function(req) {
        list(
          status = 200L,
          headers = list('Content-Type' = 'text/html'),
          body = "R code path"
        )
      },
      staticPaths = list(
        "/f\U00FC" = nonascii_path,
        "/foo" = nonascii_path
      )
    )
  )
  on.exit(s$stop(), add = TRUE)

  # URL-encoded non-ASCII URL path, which maps to non-ASCII local path.
  r <- fetch(local_url("/f%C3%BC", s$getPort()))

# ======= In GDB =======
# It can't lstat() the filename. (We use lstat() instead of stat() because gdb
# doesn't like that the stat function has the same name as the stat struct type.)
p (int)lstat(filename.c_str(), &sb)
#> $74 = -1


# With the explicit filename, copied and pasted
p filename.c_str()
#> $75 = 0x7fffe8013620 "/httpuv/tests/testthat/apps/fü"
p (int)lstat("/httpuv/tests/testthat/apps/fü" , &sb)
#> $76 = -1


# With the explicit filename, with native encoding (I think)
p (int)lstat("/httpuv/tests/testthat/apps/f\xfc", &sb)
#> $78 = 0


# Show that these strings are not identical - strlen is different:
p (int)strlen("/httpuv/tests/testthat/apps/fü")
#> $79 = 31
p (int)strlen("/httpuv/tests/testthat/apps/f\xfc")
#> $81 = 30
# \U00FC also returns the shorter byte sequence
p (int)strlen("/httpuv/tests/testthat/apps/f\U00FC")
#> $82 = 30


# Show the contents of the different encodings
# The filename.c_str() value is the UTF-8 encoding of ü, which is 195 188.
p (unsigned char) "ü"[0]
#> $115 = 195 '?'
p (unsigned char) "ü"[1]
#> $116 = 188 '?'
p (unsigned char) "ü"[2]
#> $117 = 0 '\000'

# Using "\xfc" is the ISO 8859-1 encoding.
p (unsigned char) "\xfc"[0]
#> $91 = 252 '?'
p (unsigned char) "\xfc"[1]
#> $92 = 0 '\000'

# Using "\U00FC" is also the ISO 8859-1 encoding.
p (unsigned char) "\U00FC"[0]
#> $93 = 252 '?'
p (unsigned char) "\U00FC"[1]
#> $94 = 0 '\000'

So I think the real solution would be to convert the string to the native encoding before looking for the file. However, I don't think that it's really worth doing at this time, for a couple of reasons:

This only happens in a very weird case, on UNIX, where the process is using a non-UTF-8 locale, and there's a directory with a non-ASCII to be served. If a user is doing something like this, they're asking for problems -- UNIX systems almost always use UTF-8 these days.
This solution can't work if the process's locale is unable to represent the filename. (For example, in this ISO8599-1 locale, Chinese characters can't be represented.) I don't think there's any way to make it work in this situation.
The current system does work on Windows, where the locale is likely to be non-UTF-8. So Windows is probably doing something different under the hood, or handling this more intelligently.

In summary, this is an edge case where the cost and risk of fixing it isn't really worth it. I think we should just disable the test on Unix systems where there's a non-UTF-8 locale.

wch added a commit that referenced this issue May 4, 2020

Skip filename encoding tests on non-UTF-8 Unix platforms. Resolves #264

32ba7d3

wch mentioned this issue May 4, 2020

Skip filename encoding tests on non-UTF-8 Unix platforms #265

Merged

wch closed this as completed in #265 May 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error on Latin1 locale on Debian #264

Error on Latin1 locale on Debian #264

wch commented May 4, 2020

wch commented May 4, 2020 •

edited

Loading

wch commented May 4, 2020 •

edited

Loading

Error on Latin1 locale on Debian #264

Error on Latin1 locale on Debian #264

Comments

wch commented May 4, 2020

wch commented May 4, 2020 • edited Loading

wch commented May 4, 2020 • edited Loading

wch commented May 4, 2020 •

edited

Loading

wch commented May 4, 2020 •

edited

Loading