Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YAML-LD canonicalization (c14n) #43

Open
VladimirAlexiev opened this issue Jul 4, 2022 · 9 comments
Open

YAML-LD canonicalization (c14n) #43

VladimirAlexiev opened this issue Jul 4, 2022 · 9 comments
Labels
UCR Issue on Use Case/Recommendation
Projects

Comments

@VladimirAlexiev
Copy link
Contributor

VladimirAlexiev commented Jul 4, 2022

As an information architect.
I want no variation in YAML format for the same semantic content.
So that I can easily compare or sign YAML.

Canonicalization (also called c14n or normalization) is quite useful to enable the following use cases :

  • meaningful diff
  • signed content. As Manu puts it "to ensure that different expressions result in the same hash"
  • TODO: what more?

Prior art:

NOTE THAT this UCR is quite the opposite of #42. So if we cater to both:

  • we should define a YAML c14n style (cosmetic controls) to produce canonic YAML
  • we should describe that using different YAML styles is not recommended for the above "canonicalized" use cases
@VladimirAlexiev VladimirAlexiev added the UCR Issue on Use Case/Recommendation label Jul 4, 2022
This was referenced Jul 4, 2022
@ioggstream
Copy link
Contributor

ioggstream commented Jul 4, 2022

@VladimirAlexiev I am not sure that YAML does not already provide something like that. I am not sure that's the most readable form, but did you ask to YAML folks?

e.g. for scalar, there's https://yaml.org/spec/1.2.2/#canonical-form

@VladimirAlexiev
Copy link
Contributor Author

@ioggstream Added your point above.
Whom should we ask, if you know such people, could you tag them here?
I googled for "yaml canonicalization, yaml c14n, yaml normalization" and came up with only 1 hit.

@ioggstream
Copy link
Contributor

ioggstream commented Jul 4, 2022

@VladimirAlexiev YAML repo or https://app.element.io/#/room/#chat:yaml.io

After a brief investigation, I understood there's no easy fix for that - at least if we want to include aliases/anchors.

@gkellogg gkellogg added this to UCR in YAML-LD Jul 4, 2022
@gkellogg
Copy link
Member

gkellogg commented Jul 4, 2022

The only use for YAML C14N I can see would be for a hypothetical YAML Literal (similar to JSON Literal). And as such, that would seem to be a spec to reference, not add to YAML-LD.

As for standardizing the serialization of YAML-LD itself, I would be a 👎 on that, as it should not be necessary for conveying semantic meaning. Granted that people will want to create pretty YAML-LD output, but controls for that should be pass-through (IMO) and not required for interoperability.

@ioggstream
Copy link
Contributor

YAML C14N ... a spec to reference

👍 I think that the c14n discussion can be managed in the YAML community (e.g. via element). I think there's some interest there. Note that c14n and readability might be different goals.

@VladimirAlexiev I suggest to file an issue in the YAML repo so that if they come up with a solution we could reference it.

@VladimirAlexiev VladimirAlexiev changed the title YAML canonicalization YAML-LD canonicalization (C14n) Jul 7, 2022
@VladimirAlexiev VladimirAlexiev changed the title YAML-LD canonicalization (C14n) YAML-LD canonicalization (c14n) Jul 7, 2022
@VladimirAlexiev
Copy link
Contributor Author

@gkellogg

The only use for YAML C14N I can see would be for a hypothetical YAML Literal

The main use of JSON-LD c14n is for crypto signing and verifiable credentials of whole JSON-LD files.
@OR13 do you see a case for using YAML-LD for verifiable credentials ?

(negative vote) as it should not be necessary for conveying semantic meaning

Gregg, I don't understand your position: are you also against https://json-ld.github.io/rdf-dataset-canonicalization/spec/ and JSON Canonicalization Scheme (JCS)? Aren't the 2 use cases listed enough?

pretty YAML-LD output: controls for that should be pass-through (IMO) and not required for interoperability.

Of course they are not required. But:

  • Defining semantic terms for such controls is IMHO fair game, because YAML is largely about readability, thus formatting
  • Using a fixed set of controls to achieve c14n is important for cases where you want a reproducible/predictable serialization

@ioggstream

no easy fix ... for aliases/anchors.

Alias names cannot be preserved. But c14n can generate predictable aliases to achieve identical serialization.
https://json-ld.github.io/rdf-dataset-canonicalization/spec/ (URGNA) does that for blank nodes, which is a lot more difficult since graph isomorphism is a problem of exponential complexity .

file an issue in the YAML repo

Posted yaml/yaml-spec#289, added some more info, and referenced this issue.

@gkellogg
Copy link
Member

gkellogg commented Jul 7, 2022

@gkellogg

The only use for YAML C14N I can see would be for a hypothetical YAML Literal

The main use of JSON-LD c14n is for crypto signing and verifiable credentials of whole JSON-LD files. @OR13 do you see a case for using YAML-LD for verifiable credentials ?

IIRC, VC uses RDF Dataset Canonicalization, which does not rely on JSON C14N (other than for JSON Literals) because of these issues, other than for JWT. Are you proposing a something congruent for JWT for YAML? I would favor sticking with the LD-friendly RDF C14N.

(negative vote) as it should not be necessary for conveying semantic meaning

Gregg, I don't understand your position: are you also against https://json-ld.github.io/rdf-dataset-canonicalization/spec/ and JSON Canonicalization Scheme (JCS)? Aren't the 2 use cases listed enough?

pretty YAML-LD output: controls for that should be pass-through (IMO) and not required for interoperability.

Of course they are not required. But:

  • Defining semantic terms for such controls is IMHO fair game, because YAML is largely about readability, thus formatting
  • Using a fixed set of controls to achieve c14n is important for cases where you want a reproducible/predictable serialization

I would support some descriptive way of passing formatting options to a YAML serializer, but think it may be difficult to standardize on that, unless YAML normatively defines this, in which case we should just reference that, along with the ability to pass such controls on.

I'm wary of defining a WebIDL API which is YAML-LD specific (although we may describe updates to any existing API methods to manage YAML serialization/deserialization).

@OR13
Copy link

OR13 commented Jul 8, 2022

I do see a use case for YAML-LD for both VCs and DIDs...
I have worked on YAML-LD type things for DIDs...

For example:

https://github.com/transmute-industries/did-core/blob/main/packages/did-yaml/src/__fixtures__/did-yaml/example-2.yml

Per the poor decisions of the DID WG, I have stripped @context from the yaml example, so all the terms are not defined...

In some future version there would be both:
application/did+ld+yaml and application/did+yaml

... and their only difference would be an @context or similar...

based on the convention set in DID Core v1... which as plagued by a complete lack of understanding with respect to JSON-LD.

@VladimirAlexiev
Copy link
Contributor Author

@gkellogg

Formatting... YAML normatively defines

I just answered @TallTed's similar comment in #44 (comment).

  • Sure, if YAML standardizes a set of formatting terms, we should adopt them
  • But if they don't, maybe we should?
  • I see no problem if the set of terms is extensible

IIRC, VC uses RDF Dataset Canonicalization, which does not rely on JSON C14N

You have a point: just like round-tripping can be seen at several different levels, so can c14n.
Where RDF is the fundamental level that we can use as a baseline (etalon), or default case, or even fallback.

I assume RDF c14n and JSON c14n are compatible (conformant to each other)?
Has anyone explored that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
UCR Issue on Use Case/Recommendation
Projects
Development

No branches or pull requests

4 participants