New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with encoding (latin1 vs. UTF-8) when running devtools:test() #1306

Closed
hansharhoff opened this Issue Aug 25, 2016 · 4 comments

Comments

Projects
None yet
3 participants
@hansharhoff
Contributor

hansharhoff commented Aug 25, 2016

TL;DR

UTF-8 encoded files with non-ASCII strings are not correctly represented when running devtools::test()
on Windows. There should be an option to get the encoding from the Description file or from e.g. devtools::test(encoding = "UTF-8")

Description of issue

I am writing a package where strings containing non-ASCII characters need to be sent to a SQL server through RODBC.

By saving a query string with non-ASCII characters and running this in RStudio, R-terminal or using source from RStudio I am able to get the correct result.

However if I use devtools:test() my code fails since it seems that it does not source the functions using the correct encoding.

I have made a minimal example test functions (not using the SQL server). Here is the relevant code:

library(testthat)
context("Test encoding")

test_that("encoding works", {
  query <- "Test encoding: ó æ ø å  END TEST"
  print(query)
})

Running this code in RStudio (using Ctrl-R) returns:

[1] "Test encoding: ó æ ø å END TEST"
Which is the correct and expected result.

Pressing the source button in RStudio runs:
source('C:/Users/hahad/workspaces/bugreports/tests/testthat/test_encoding.R', encoding = 'UTF-8', echo=TRUE)
and returns the correct, expected result:

> source('C:/Users/hahad/workspaces/bugreports/tests/testthat/test_encoding.R', echo=TRUE)

> library(testthat)

> context("Test encoding")

> test_that("encoding works", {
+   query <- "Test encoding: ó æ ø å  END TEST"
+   print(query)
+ })
[1] "Test encoding: ó æ ø å  END TEST"

From this I conclude that the files are saved as UTF-8 and should therefore be sourced as UTF-8 (also confirmed using external program and by inspecting the RStudio settings).

If I save this script in a testthat folder in an otherwise empty package and run

devtools::test()

I get the following result:

devtools::test()
Loading bugreports
Testing bugreports
Test encoding: [1] "Test encoding: ó æ ø å  END TEST"


DONE =========================================================================================

This means that devtools is sourcing the files as latin1 instead of the UTF-8 files that they are. This happens irrespective of the Encoding: UTF-8 statement in the description file.

I have attempted to change to a UTF-8 encoding for my locale using

> Sys.setlocale("LC_ALL", 'UTF-8')
[1] ""
Warning message:
In Sys.setlocale("LC_ALL", "UTF-8") :
  OS reports request to set locale to "UTF-8" cannot be honored

Summary

When using devtools::test() the encoding is not properly handled (at least on Windows). It is not possible to change to a UTF-8 encoding for the locale on Windows (as far as I can tell), even if this is the suggested encoding to use.

Proposed solutions

  1. Make devtools::test() respect the Encoding in the DESCRIPTION file.
  2. Make optional argument to devtools:test() for encoding (as there is for source-function.
  3. I should develop package entirely in latin1
  4. Find a way to use UTF-8 compatible locale in Windows.

I would prefer solution 1
:-)

@hansharhoff

This comment has been minimized.

Contributor

hansharhoff commented Aug 25, 2016

I traced the calling functions from

devtools::test()

to

testthat::test_dir

to

testthat::test_files

to

testthat::test_file

and finally to
>testthat::source_file

This uses readLines to get the lines from the test-R-file. I can confirm that using:

readLines(pathToFile, encoding = "UTF-8")

reads the file correctly.
But

readLines(pathToFile)

does not.

So propagating the encoding to this function would solve the problem.

@jimhester

This comment has been minimized.

Member

jimhester commented Aug 25, 2016

Looks like testthat::test_file() and testthat::source_file() need to gain an encoding option (defaulting to "unknown") and devtools::test() needs to call testthat::test_dir(..., encoding = pkg$encoding %||% "unknown").

@hansharhoff You seem to have a good grip on the problem and solution, do you want to try submitting pull requests to devtools and testthat with the above fixes? See https://github.com/hadley/devtools/blob/master/CONTRIBUTING.md#pull-requests for information on contributing to devtools.

hansharhoff pushed a commit to hansharhoff/testthat that referenced this issue Aug 29, 2016

hansharhoff pushed a commit to hansharhoff/devtools that referenced this issue Aug 29, 2016

This was referenced Aug 30, 2016

@hansharhoff

This comment has been minimized.

Contributor

hansharhoff commented Sep 6, 2016

I have made two pull requests in regards to this issue. They should fix the issue without disturbing any functionality. I also to add a test to test the filter parameter of test(), but I did not manage to make a consistent test since the indirection of pointing at another set of tests confused me.

I also did not add a test of the encoding issue, in part because it only occurs on some platforms.

@hadley hadley added the bug label Nov 3, 2016

@hadley hadley closed this Nov 4, 2016

hansharhoff pushed a commit to hansharhoff/testthat that referenced this issue Nov 24, 2016

Hans Harhoff Andersen
Add encoding parameter to source_file and test_file to ensure UTF-8 c…
…ompatability on Windows (see e.g. r-lib/devtools#1306)

Add NEWS.md line about the change.

hadley added a commit to r-lib/testthat that referenced this issue Dec 15, 2016

Add encoding support (#550)
* Add encoding parameter to source_file and test_file to ensure UTF-8 compatability on Windows (see e.g. r-lib/devtools#1306)

* Add encoding parameter to source_file and test_file to ensure UTF-8 compatability on Windows (see e.g. r-lib/devtools#1306)
Add NEWS.md line about the change.

* Simplify documentation of encoding parameter.
Remove ... from call/definition of test_files
@lock

This comment has been minimized.

lock bot commented Sep 18, 2018

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Sep 18, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.