New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

content() throws error when Content-Type response header has multiple parameters #362

Closed
walkerjeffd opened this Issue Apr 22, 2016 · 0 comments

Comments

Projects
None yet
1 participant
@walkerjeffd

walkerjeffd commented Apr 22, 2016

Hi! I think I found a bug: when a response has a content-type header with multiple parameters (e.g. "text/xml;charset=utf-8;foo=bar"), then content(response) throws a somewhat cryptic error. I don't think its common to have more than one parameter (usually just charset), but as far as I know it's not illegal to have multiple parameters.

Example:

> library(httr)
> URL <- 'http://cida.usgs.gov/noreast-sos/simple?request=GetObservation&featureID=MD-BC-BC-05&offering=RAW&observedProperty=WATER&beginPosition=2010-08-01T00:00:00&endPosition=2010-08-02T00:00:00'
> res <- GET(URL)
> content(res)
Error in mat[, seq_len(n), drop = FALSE] : incorrect number of dimensions
In addition: Warning messages:
1: In mapply(FUN = f, ..., SIMPLIFY = FALSE) :
  longer argument not a multiple of length of shorter
2: In mapply(FUN = f, ..., SIMPLIFY = FALSE) :
  longer argument not a multiple of length of shorter

The traceback and subsequent debugging indicate it comes from parsing the content-type header.

> traceback()
5: str_split_fixed(pieces[-1], "=", 2)
4: parse_media(type)
3: parseability(type)
2: as %||% parseability(type)
1: content(res)

For this URL, the content-type has a MIME type (text/xml) followed by TWO parameters:

> res$headers$`content-type`
[1] "text/xml; subtype=gml/3.1.1;charset=UTF-8"

parse_media() only works when there is one parameter in the content-type:

> httr::parse_media("text/xml;charset=UTF-8")               #OK
$complete
[1] "text/xml"

$type
[1] "text"

$subtype
[1] "xml"

$params
$params$charset
[1] "utf-8"


> httr::parse_media("text/xml;charset=UTF-8;foo=bar")      # FAILS
Error in mat[, seq_len(n), drop = FALSE] : incorrect number of dimensions
In addition: Warning messages:
1: In mapply(FUN = f, ..., SIMPLIFY = FALSE) :
  longer argument not a multiple of length of shorter
2: In mapply(FUN = f, ..., SIMPLIFY = FALSE) :
  longer argument not a multiple of length of shorter

I ultimately figured out this is due to the version of str_split_fixed() that is included within httr not being vectorized (as indicated by the comment at the top of str.R). So parse_media() removes the MIME type, then tries to split each of the two parameters at = but fails because str_split_fixed() is not vectorized and thus expects the input to be a character vector of length 1. In other words:

> httr:::str_split_fixed(c("charset=UTF-8"), "=", 2)               # OK
     [,1]      [,2]   
[1,] "charset" "UTF-8"
> httr:::str_split_fixed(c("charset=UTF-8", "foo=bar"), "=", 2)    # FAILS
Error in mat[, seq_len(n), drop = FALSE] : incorrect number of dimensions
In addition: Warning messages:
1: In mapply(FUN = f, ..., SIMPLIFY = FALSE) :
  longer argument not a multiple of length of shorter
2: In mapply(FUN = f, ..., SIMPLIFY = FALSE) :
  longer argument not a multiple of length of shorter

The stringr version works though:

> stringr:::str_split_fixed(c("charset=UTF-8", "foo=bar"), "=", 2)    # OK
     [,1]      [,2]   
[1,] "charset" "UTF-8"
[2,] "foo"     "bar"  

Not sure if it's easier to vectorize the version of str_split_fixed() in httr, or to have parse_media() run a loop over the vector of parameters (in case there is more than 1).

For now, I'm just manually changing the response header to only one parameter after calling GET(), which fixes the problem.

Using httr v1.1.0

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment