Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define fragment identifiers for application/yaml #38

Merged
merged 6 commits into from
May 12, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
48 changes: 47 additions & 1 deletion draft-ietf-httpapi-yaml-mediatypes.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,52 @@ The terms "content", "content negotiation", "resource",
and "user agent"
in this document are to be interpreted as in {{!SEMANTICS=I-D.ietf-httpbis-semantics}}.

The terms "fragment" and "fragment identifier"
in this document are to be interpreted as in {{!URI=RFC3986}}.

The terms "node", "anchor" and "named anchor"
in this document are to be intepreded as in [YAML].

## Fragment identification {#application-yaml-fragment}

This section describes how to use
named anchors (see Section 3.2.2.2 of [YAML])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using named anchors because they can be defined even when no alias is defined.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, this works for now. The intent of the original language was to account for an expected expansion of alias functionality in the YAML spec, which would allow with a document like this:

- &foo 
  bar:
    - 1
    - 2
    - 42

for an alias *foo/bar/2 to point at the value 42.

By referring directly to anchors here, such a later change to the YAML spec would not be reflected in the mediatype's fragment id.

To be clear, pathlike aliases are not yet valid and there's no fixed schedule for when we might get a YAML 1.3 spec out, so defining the mediatype according to current reality is an entirely valid thing to do.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eemeli thanks for the insight! Some question then

  1. is / a valid character for a named anchor? If / is used as a pathlike separator, isn't using / in named anchors problematic?
  2. since keys can contain non-string characters, how can I address pathlike alias node such as *foo/bar/1 or *fizz/buzz/baz ?
- &foo
  bar:
    1: "integer"
   "1": "string"
- baz: *foo/bar/1
- &fizz
  "buzz/baz": "a"
  "buzz":
    "baz": "b"
- roc: *fizz/buzz/baz

If we want to achieve publication quickly, I think that using "named anchors" is easier, It is always possible to amend the media type registration and the according fragment identifiers interacting directly with IANA.

It is ok to spend some time trying to use alias nodes, provided that:

  1. we need to specify that the fragment identifier "should be interpreted as an alias node": this is because a named anchor might not be referenced by an alias node;
  2. since / is a valid key character, we need to encode it properly like it is done in json pointers. This is probably valid independently on the fragment identifier;
  3. for the sake of interoperability, I suggest to at least having an idea of how to handle the behavior of the above yaml document with the following fragments:
  • file.yaml#foo
  • file.yaml#foo/bar
  • file.yaml#foo/bar/1

Copy link
Collaborator Author

@eemeli eemeli Apr 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. is / a valid character for a named anchor? If / is used as a pathlike separator, isn't using / in named anchors problematic?

Yes, it's a valid character, and yes it's potentially problematic. Not fatally so, though, as preferentially matching the longest substring allows for a deterministic and pretty sensible resolution. I've a prototypical implementation of how this could work here: eemeli/yaml#380.

  1. since keys can contain non-string characters, how can I address pathlike alias node such as *foo/bar/1 or *fizz/buzz/baz ?

Badly. The resolution algorithm can end up pretty straightforward, but with degenerate cases like this that'll mean resolving one of the possible nodes while making the other one unaddressable via a pathlike alias. But it's possible to attach an anchor to a node, which circumvents the problem. That's also the solution for addressing e.g. nodes that are in a mapping and have a non-scalar key like { [ foo, bar ]: value }.

If we want to achieve publication quickly, I think that using "named anchors" is easier, It is always possible to amend the media type registration and the according fragment identifiers interacting directly with IANA.

Yeah, that's why I said referring to anchors directly should be ok for now. They'll need to continue working in the future as well, and any changes should just make expressions that currently fail potentially start resolving, rather than changing the meaning of anything that's currently valid.

as fragment identifiers to designate nodes.

A YAML named anchor can be represented in a URI fragment identifier
by encoding it into octects using UTF-8 {{!UTF-8=RFC3629}},
while percent-encoding those characters not allowed by the fragment rule
in {{Section 3.5 of URI}}.
Comment on lines +116 to +119
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines taken from json pointer rfc

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't that read "referenced" instead of "represented"?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote "represent" because the serialization is different from the YAML's one.


If multiple nodes would match a fragment identifier,
the first such match is selected.

Users concerned with interoperability of fragment identifiers:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure whether anchor names allows all possible UTF-32, so here we suggest an interoperale behavior.

pyyaml for example only supports [a-zA-Z0-9\-_]+ for anchor names; I didn't test other implementations.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the above snippet looks odd. we're working on a media type registration. the above text seems to define/describe behavior that hopefully is well-defined for the format, and if it's not, then that's too bad but nothing that a media registration should attempt to change.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the YAML spec, the allowed characters in ns-anchor-name is literally everything up to \x10FFFF except for:

  • C0 and C1 control codes, though the Next Line character \x85 is allowed
  • \x20 | ',' | '[' | ']' | '{' | '}'
  • Surrogates [\xD800-\xD8FF]
  • the BOM character \xFEFF

That range is rather silly, and as @ioggstream noted, not supported by all implementations. Sticking to [\w-]+ is indeed recommended for interoperability.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eemeli

  • do you think it's worth reducing the possible values of ns-anchor-name in a future YAML revision ? @dret 's comment is relevant.
  • what happens if I have something like *foo/bar/baz ?
- &foo
  "bar/baz": "a"
  "bar":
    "baz": "b"

for example, json pointers encodes them in a special way

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, reducing the valid space of anchor names is definitely planned. I even wrote up a proposal for it (yaml/yaml-spec#64), but the spec update progress has been a bit stop-and-go-ish.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eemeli it would be a major improvement :)


- SHOULD limit named anchors to a set of characters
that do not require encoding
to be expressed as URI fragment identifiers:
this is always possible since named anchors are a serialization
detail;
- SHOULD NOT use a named anchor that matches multiple nodes.

In the example resource below, the URL `file.yaml#foo`
references the anchor `foo` pointing to the node with value `scalar`;
whereas
the URL `file.yaml#bar` references the anchor `bar` pointing to the node
with value `[ some, sequence, items ]`.

~~~ example
%YAML 1.2
---
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a bug in kramdown: it seems it can't process --- in code blocks. I kludged it adding a space to each line. Alternatively, we could remove the %YAML header.

one: &foo scalar
two: &bar
- some
- sequence
- items
~~~


# Media Type registrations

This section describes the information required to register
Expand Down Expand Up @@ -138,7 +184,7 @@ Applications that use this media type:
: HTTP

Fragment identifier considerations:
: None
: see {{application-yaml-fragment}}

Additional information:

Expand Down