Skip to content

Commit

Permalink
work around failures in utf8_width(). @patperry: Would you support …
Browse files Browse the repository at this point in the history
…a version of `utf8_width()` that always returns a number?

- Work around failing CRAN tests on Windows.
  • Loading branch information
krlmlr committed Nov 27, 2017
1 parent 8ffbd38 commit d00c11f
Showing 1 changed file with 5 additions and 1 deletion.
6 changes: 5 additions & 1 deletion R/extent.R
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,11 @@
#' get_extent(c("abc", "de"))
#' get_extent("\u904b\u6c23")
get_extent <- function(x) {
utf8::utf8_width(crayon::strip_style(x), encode = FALSE)
x <- crayon::strip_style(x)
width <- utf8::utf8_width(x, encode = FALSE)
is_na <- which(is.na(width))
width[is_na] <- nchar(x[is_na], type = "width")
width
}

#' @description
Expand Down

5 comments on commit d00c11f

@patperry
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you insist on encode = FALSE? This code returns 1 for get_extent("\n"). Is that what you want?

@patperry
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw make sure you are using the latest version of utf8 (1.1.0). I fixed a bug in utf8_width in C locale, and added more precise control over assumed output capabilities with utf8 arg to utf8_width

@krlmlr
Copy link
Member Author

@krlmlr krlmlr commented on d00c11f Nov 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x should never be "\n" here, or anything un-printable. But I'm not sure what happens on Windows.

CRAN was seeing check errors for pillar on Windows, I needed a quick fix. I need to take a closer look to understand how utf8_width() could become NA. Anyway, for this routine I don't really care: A bad width "only" messes up the output, but an NA width just breaks everything.

@patperry
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-ASCII strings get NA width in C locale, unless utf8 arg is TRUE. With v1.1.0, you can specify utf8_width(, utf8=TRUE) to ignore the current locale when computing width. This might fix your NA issues.

@krlmlr
Copy link
Member Author

@krlmlr krlmlr commented on d00c11f Nov 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks.

Please sign in to comment.