Skip to content

Allow custom content type for JSON body#190

Merged
hadley merged 36 commits intor-lib:mainfrom
mgirlich:req_body_json-type
Sep 6, 2023
Merged

Allow custom content type for JSON body#190
hadley merged 36 commits intor-lib:mainfrom
mgirlich:req_body_json-type

Conversation

@mgirlich
Copy link
Collaborator

@mgirlich mgirlich commented Jan 9, 2023

Fixes #189. Fixes #284.

This PR doesn't support parameters in the media type. They could be added but I never encountered them and I'm not sure the extra complexity is worth it.
For req_body_form() it only allows application/x-www-form-urlencoded. I'm happy to add support for the more general form similar to JSON but I'm not sure it's worth it.

@mgirlich
Copy link
Collaborator Author

I thought it's worth to properly refactor this into some helpers for the content type.

@hadley
Copy link
Member

hadley commented Jan 10, 2023

Can you take a look at the CI failures please?

@mgirlich
Copy link
Collaborator Author

Ah, it seems the names of capture groups are only properly supported for recent R versions. Fixed now 😄

R/resp-body.R Outdated
check_response(resp)
check_installed("jsonlite")
check_content_type(resp,
check_resp_content_type(resp,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can the suffix arguments go away now?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't needed as the suffix can be determined from the types argument.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that only coincidentally works for json and xml, but isn't generally true: https://www.iana.org/assignments/media-type-structured-suffix/media-type-structured-suffix.xhtml.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a counter example there. Maybe I misunderstood.
There is also an extensive list of media types.

Do you think any of these cases should be treated differently?

devtools::load_all("~/GitHub/httr2/")
#> ℹ Loading httr2

# anything with suffix `+cbor` should be valid
check_content_type("application/cbor", "application/cbor")
#> NULL
check_content_type("application/ace+cbor", "application/cbor")
#> NULL
check_content_type("application/ace", "application/cbor")
#> Error:
#> ! Unexpected content type 'application/ace'
#> ℹ Expecting 'application/cbor' or 'application/<subtype>+cbor'

#> Backtrace:
#>     ▆
#>  1. └─httr2:::check_content_type("application/ace", "application/cbor")
#>  2.   └─rlang::abort(...) at httr2/R/content-type.R:68:2

# if `valid_type` has <subtype>+<suffix> the content type must have the same structure
check_content_type("application/ace+cbor", "application/ace+cbor")
#> NULL
check_content_type("application/ace", "application/ace+cbor")
#> Error:
#> ! Unexpected content type 'application/ace'
#> ℹ Expecting 'application/ace+cbor'

#> Backtrace:
#>     ▆
#>  1. └─httr2:::check_content_type("application/ace", "application/ace+cbor")
#>  2.   └─rlang::abort(...) at httr2/R/content-type.R:68:2
check_content_type("application/cbor", "application/ace+cbor")
#> Error:
#> ! Unexpected content type 'application/cbor'
#> ℹ Expecting 'application/ace+cbor'

#> Backtrace:
#>     ▆
#>  1. └─httr2:::check_content_type("application/cbor", "application/ace+cbor")
#>  2.   └─rlang::abort(...) at httr2/R/content-type.R:68:2

Created on 2023-01-18 with reprex v2.0.2

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can see from reading the spec, there's no relationship between the subtype and the suffix, except that there happens to be both subtypes and suffixes for xml and json.

IOTW this is incorrect because there's no +plain suffix:

check_content_type("application/cbor", "text/plain")
#> Error:
#> ! Unexpected content type 'application/cbor'
#> ℹ Expecting 'text/plain' or 'text/<subtype>+plain'

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure I can follow. As far as I understand there are two cases:

  1. content type has no suffix: e.g. application/json or application/xml. The subtype json resp. xml tells you that the body is in JSON resp. XML format and can therefore be parsed accordingly.
  2. content type has a suffix: e.g. application/vnd.api+json has the suffix json. This tells you that the content is in JSON format. The subtype vnd.api tells you that the JSON has a specific structure.

This is also what Wikipedia says about this:

Suffix is an augmentation to the media type definition to additionally specify the underlying structure of that media type, allowing for generic processing based on that structure and independent of the exact type's particular semantics. Media types that make use of a named structured syntax should use the appropriate IANA registered "+"suffix for that structured syntax when they are registered.

and this is how I understood section 4.2.8. Structured Syntax Name Suffixes and 4.11. Fragment Identifier Requirements of RFC6838 (though I find that a bit difficult to understand).

Therefore I would argue that check_content_type("application/vnd.api+json", "application/json") should work (the actual content type is more specific than the demanded one) but check_content_type("application/json", "application/vnd.api+json") should not work (the actual content type is less specific than the demanded one).

Does this make sense to you or do you see other issues here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're pre-supposing a relationship between type and suffix that doesn't exist, and it would be better to keep suffix as a separate argument. In practice, I don't think this will affect how the code works, but I don't like writing a function that implies a structure that I don't believe to be true.

But I don't think I'm doing a good job of explaining exactly what I mean here, so would you mind just trusting me and switching back to an explicit suffix argument? I think it's a pretty simple change.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to go back to suffix. But I'm not quite sure what the interface should like like in the end. I think there are three types of questions on the content type:

  • Is the content in JSON format?
  • Is the content type application/json?
  • Is the content type application/vnd.api+json?
Is the content in JSON format? Is the content type application/json? Is the content type application/vnd.api+json?
application/cbor no no no
example/json yes no no
application/json yes yes no
application/test+json yes yes no
application/vnd.api+json yes yes yes

Do you agree that these three questions all make sense? And what do you think the interface should look like then?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let take a stab at this PR and give you something to look at.

@jonthegeek
Copy link
Contributor

It looks like the only remaining issue in this is a slight conflict in NEWS.md.


# Must set header afterwards
# Respect existing Content-Type if set
type_idx <- match("content-type", tolower(names(req$headers)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the logic a bit here — now if you manually override with the content-type header, we don't check it. I think this is ok, and gives the user an escape hatch if they're having problems. But generally we expect people to set the content type via the appropriate req_body_* function, which will verify that the type is as expected.

#'
#' @param resp A response object.
#' @param valid_types A character vector of valid content types.
#' @param valid_suffix A string given an "structured media type" suffix.
Copy link
Member

@hadley hadley Aug 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This keeps the type and suffixes orthogonal in a way I think matches the spec.

@hadley
Copy link
Member

hadley commented Aug 26, 2023

@mgirlich let me know what you think of this approach.

hadley added 7 commits August 27, 2023 13:50
#Conflicts:
#	NEWS.md
#	R/resp-body.R
#	tests/testthat/_snaps/req-body.md
#	tests/testthat/_snaps/req-error.md
#	tests/testthat/_snaps/resp-body.md
@mgirlich
Copy link
Collaborator Author

mgirlich commented Sep 6, 2023

I find it a bit confusing that the following works:

check_content_type(
  "audio/abc+json",
  valid_types = "application/json",
  valid_suffix = "json"
)

If we want to allow this then valid_types should be an optional argument.

Another really confusing example is

check_content_type("application/test+json", "application/test+json")

This would suggest that we should validate the valid_types argument or that maybe the interface isn't yet fully right.

# ```
stopifnot(length(x) == 1)
regex <- "^(?<type>application|audio|font|example|image|message|model|multipart|text|video)/(?<subtype>(?:(?:vnd|prs|x)\\.)?(?:[^+;])+)(?:\\+(?<suffix>(?:[^;])+))?(?:;(?<parameters>(?:.)+))?$"
if (!grepl(regex, x, perl = TRUE)) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should be an error instead? And we might just have a special handling when the content type is NULL or the empty string?

@hadley
Copy link
Member

hadley commented Sep 6, 2023

Oh yeah, we can definitely make valid_types optional. I don't think it's worth validating it; but I can document it and fix that test.

@hadley hadley merged commit 9c60bd2 into r-lib:main Sep 6, 2023
@mgirlich mgirlich deleted the req_body_json-type branch September 7, 2023 08:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

check_content_type() should correctly error when response has no content_type but suffix is checked. Allow custom content type for JSON body

3 participants