Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the query string supports multibyte chars #314

Closed
wants to merge 5 commits into from
Closed
Changes from all commits
Commits
File filter...
Filter file types
Jump to…
Jump to file
Failed to load files.

Always

Just for now

@@ -22,13 +22,20 @@ parseQS <- function(qs){
parts <- strsplit(qs, "&", fixed = TRUE)[[1]]
kv <- strsplit(parts, "=", fixed = TRUE)
kv <- kv[sapply(kv, length) == 2] # Ignore incompletes
if (length(kv) == 0L) return(list())

keys <- sapply(kv, "[[", 1)
keys <- unname(sapply(keys, URLdecode))
Encoding(keys) <- "UTF-8"

This comment has been minimized.

Copy link
@wch

wch Dec 7, 2018

Collaborator

@schloerke Perhaps this should give a warning instead if keys is non-ASCII. I believe that the encoding will be unknown iff the text contains only ASCII characters, but I'm not 100% sure.

This comment has been minimized.

Copy link
@shrektan

shrektan Dec 8, 2018

Author Contributor

Yes it’s unknown for only ASCII strings but set it to UTF8 will cause no damage - it will stay in unknown.

This comment has been minimized.

Copy link
@wch

wch Dec 8, 2018

Collaborator

From my previous comment:

we think that it's not even possible to call a function with argument names that can't be represented in the R session's native encoding

So if you have a function with a Chinese argument name and your machine is a Windows machine with a Chinese locale, you will not encounter this problem. You could run the same code on a Linux machine, and you won't have the problem. However, if you run the exact same code on a Windows machine with a non-Chinese locale, then you will end up with code that doesn't work, and error message that are very confusing.

My idea here is to warn people when they try to use non-ASCII characters so that they don't end up with that situation.

# The query string (after URL decoding) is usually UTF-8,
# but this is not always true. For simplicity, we'll just
# enforce that it has to be UTF-8, and not support other encodings.
# We also need to ensure that R understands that it's UTF-8.

vals <- sapply(kv, "[[", 2)
vals[is.na(vals)] <- ""
vals <- unname(sapply(vals, URLdecode))
Encoding(vals) <- "UTF-8" # The reason is the same as above

ret <- as.list(vals)
names(ret) <- keys
@@ -0,0 +1,5 @@
#* @get /msg
#* @post /msg
function(param1, param2) {
paste(param1, param2, sep = "-")
}
@@ -0,0 +1,16 @@
context("test-query-with-multibytes")

test_that("Support multi-bytes queries", {
r <- plumber$new(test_path("files/query-with-multibytes.R"))
res <- PlumberResponse$new()

req <- make_req("GET", "/msg", "?param1=%E4%B8%AD%E6%96%87&param2=%E4%BD%A0%E5%A5%BD")
out <- r$serve(req, res)$body
expect_equal(Encoding(out), "UTF-8")
expect_identical(charToRaw(out), charToRaw(jsonlite::toJSON("\u4e2d\u6587-\u4f60\u597d")))

req <- make_req("POST", "/msg", "?param1=%E4%B8%AD%E6%96%87&param2=%E4%BD%A0%E5%A5%BD")
out <- r$serve(req, res)$body
expect_equal(Encoding(out), "UTF-8")
expect_identical(charToRaw(out), charToRaw(jsonlite::toJSON("\u4e2d\u6587-\u4f60\u597d")))
})
@@ -23,4 +23,18 @@ test_that("incomplete query strings are ignored", {

test_that("query strings with duplicates are made into vectors", {
expect_equal(parseQS("a=1&a=2&a=3&a=4"), list(a=c("1", "2", "3", "4")))
})
})

test_that("parseQS() will mark UTF-8 explicitly", {
out <- parseQS("%E5%8F%82%E6%95%B01=%E4%B8%AD%E6%96%87")
expect_equal(Encoding(names(out)), "UTF-8")
expect_identical(
charToRaw(names(out)),
as.raw(c(0xe5, 0x8f, 0x82, 0xe6, 0x95, 0xb0, 0x31))
)
expect_identical(
charToRaw(out[[1L]]),
as.raw(c(0xe4, 0xb8, 0xad, 0xe6, 0x96, 0x87))
)
expect_equal(Encoding(out[[1L]]), "UTF-8")
})
ProTip! Use n and p to navigate between commits in a pull request.
You can’t perform that action at this time.