Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testthat tests fail with the new stringi using ICU 63.1 #1604

Closed
gagolews opened this issue Feb 12, 2019 · 6 comments
Closed

testthat tests fail with the new stringi using ICU 63.1 #1604

gagolews opened this issue Feb 12, 2019 · 6 comments
Assignees
Labels

Comments

@gagolews
Copy link

gagolews commented Feb 12, 2019

Hi there!

Running against the most recent stringi 1.3.1 (devel version https://github.com/gagolews/stringi) gives:

══ Failed ══════════════════════════════════════════════════════════════════════
── 1. Failure: summary.character works with character objects (#1285) (@test-sum
as.character(summary(txt)) not equal to c("2", "character", "character").
Lengths differ: 2 is not 3

── 2. Failure: tokens works for strange spaces (#796) (@test-tokens.R#301)  ────
ntoken(txt, remove_punct = FALSE, remove_separators = FALSE) not equal to c(text1 = 18).
1/1 mismatches
[1] 17 - 18 == -1

── 3. Failure: tokens works for strange spaces (#796) (@test-tokens.R#302)  ────
as.character(tokens(txt, remove_punct = FALSE, remove_separators = FALSE))[16:18] not equal to c("variationselector16", " ", ".").
3/3 mismatches
x[1]: " "
y[1]: "variationselector16"

x[2]: "."
y[2]: " "

x[3]: NA
y[3]: "."

── 4. Failure: tokens works for strange spaces (#796) (@test-tokens.R#306)  ────
ntoken(txt, remove_punct = TRUE, remove_separators = FALSE) not equal to c(text1 = 16).
1/1 mismatches
[1] 15 - 16 == -1

── 5. Failure: tokens works for strange spaces (#796) (@test-tokens.R#310)  ────
as.character(tokens(txt, remove_punct = TRUE, remove_separators = FALSE))[15:16] not equal to c("variationselector16", " ").
2/2 mismatches
x[1]: " "
y[1]: "variationselector16"

x[2]: NA
y[2]: " "

This happens on a Ubuntu 18.10 system with libicu-dev version 63.1 (stringi compiled against system ICU)

@kbenoit
Copy link
Collaborator

kbenoit commented Feb 12, 2019

I just tried it on macOS with stringi 1.3.2 and it passed those tests fine.

Also I just tried it on Ubuntu 18.04 and it passed as well.

> packageVersion("stringi")
[1] ‘1.3.2
kbenoit@ubuntu:~$ sudo apt upgrade libicu-dev
Reading package lists... Done
Building dependency tree       
Reading state information... Done
libicu-dev is already the newest version (60.2-3ubuntu3).

How to get the 63.1 ICU version?

@gagolews
Copy link
Author

Hi Ken,
now you have it on CRAN, which uses (afaik) debian-testing (you can replicate that on Ubuntu 18.10 - not the LTS version too)

see https://cran.r-project.org/web/checks/check_results_quanteda.html

@kbenoit
Copy link
Collaborator

kbenoit commented Feb 14, 2019

Yep, just got the notice from CRAN by email. I can install Ubuntu 18.10 in (another) VM and will test this next week. It's a relatively obscure test about a weirdo space character so removing it is pretty inconsequential, but it would be good to understand more about the underlying ICU changes that triggered the test failure, so we will study it a bit too.

@gagolews
Copy link
Author

You can reproduce these with Docker:

sudo docker run --rm -ti debian:testing /bin/bash

and then, once a Debian-Testing environment is up and running:

echo 'APT::Default-Release "testing";' > /etc/apt/apt.conf.d/default  
apt-get -y -qq update   
apt-get -y -qq upgrade 
apt-get -y -qq install git libicu-dev g++ r-base-dev pkg-config  libxml2-dev 
R

In R:

install.packages(c("stringi", "quanteda", "testthat"))
library(quanteda)
library(testthat)

and now:

>     txt <- "space tab\t newline\n non-breakingspace\u00A0, em-space\u2003 variationselector16 \uFE0F."
>     expect_equal(ntoken(txt, remove_punct = FALSE, remove_separators = TRUE), c(text1 = 8))
>     expect_equal(
+         as.character(tokens(txt, remove_punct = TRUE, remove_separators = TRUE)),
+         c("space", "tab", "newline", "non-breakingspace", "em-space", "variationselector16")
+     )
>     expect_equal(ntoken(txt, remove_punct = FALSE, remove_separators = FALSE), c(text1 = 18))
Error: ntoken(txt, remove_punct = FALSE, remove_separators = FALSE) not equal to c(text1 = 18).
1/1 mismatches
[1] 17 - 18 == -1
>     expect_equal(
+         as.character(tokens(txt, remove_punct = FALSE, remove_separators = FALSE))[16:18],
+         c("variationselector16", " ", ".")
+     )
Error: as.character(tokens(txt, remove_punct = FALSE, remove_separators = FALSE))[16:18] not equal to c("variationselector16", " ", ".").
3/3 mismatches
x[1]: " "
y[1]: "variationselector16"

x[2]: "."
y[2]: " "

x[3]: NA
y[3]: "."
>     expect_equal(
+         ntoken(txt, remove_punct = TRUE, remove_separators = FALSE),
+         c(text1 = 16)
+     )
Error: ntoken(txt, remove_punct = TRUE, remove_separators = FALSE) not equal to c(text1 = 16).
1/1 mismatches
[1] 15 - 16 == -1
>     expect_equal(
+         as.character(tokens(txt, remove_punct = TRUE, remove_separators = FALSE))[15:16],
+         c("variationselector16", " ")
+ )
Error: as.character(tokens(txt, remove_punct = TRUE, remove_separators = FALSE))[15:16] not equal to c("variationselector16", " ").
2/2 mismatches
x[1]: " "
y[1]: "variationselector16"

x[2]: NA
y[2]: " "

kbenoit added a commit that referenced this issue Feb 18, 2019
kbenoit added a commit that referenced this issue Feb 21, 2019
@kbenoit
Copy link
Collaborator

kbenoit commented Feb 21, 2019

@gagolews I think we fixed this in the current master, but I am having nothing but headaches trying to run the check either in the Docker container for which you helpfully provided instructions above, or using rhub. I just removed the offending test so it should pass fine now, but could you please check for us on your Ubuntu 18.10 build? Once you have confirmed, we will resubmit to CRAN. Thanks!

@gagolews
Copy link
Author

Confirming, works like a charm. Cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants