Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document Glossarist YAML v2 format using YAML Schema #27

Open
ronaldtse opened this issue Aug 9, 2023 · 16 comments
Open

Document Glossarist YAML v2 format using YAML Schema #27

ronaldtse opened this issue Aug 9, 2023 · 16 comments
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@ronaldtse
Copy link
Member

From glossarist/glossarist-ruby#76 (comment)

We need to document the Glossarist YAML v2 format in this repository using YAML/JSON Schema.

@ronaldtse ronaldtse added the documentation Improvements or additions to documentation label Aug 9, 2023
@HassanAkbar
Copy link
Member

@ronaldtse I am not sure about the structure of glossarist v2 format. To better understand what it is I was looking at metaversestandards-glossary and I have a few questions related to it.

  • From what I understand in V2 we removed the localizations from the concept files and moved them to their respective files and assigned an ID, other than that the keys and structure is almost same. Is this correct or are there some structural changes as well that I have missed?

  • I noticed that the keys are in camel casing e.g concept/01fa30d3-4e4b-4142-b68a-c299b55b3fb8

    id: 01fa30d3-4e4b-4142-b68a-c299b55b3fb8
    data:
      identifier: '88'
      localizedConcepts:
        eng: 2b959537-c600-41d1-aeb7-7233c35d30eb
    status: valid
    dateAccepted: 2023-08-04T11:33:09.535Z

    do we need to support camel case or snake case or both in glossarist?

@ronaldtse
Copy link
Member Author

From what I understand in V2 we removed the localizations from the concept files and moved them to their respective files [...]

Removed from the files and moved them to "their respective files"? What did you mean?

do we need to support camel case or snake case or both in glossarist?

I actually prefer snake case for YAML keys. @ribose-jeffreylau @strogonoff are you okay with this?

@HassanAkbar
Copy link
Member

Removed from the files and moved them to "their respective files"? What did you mean?

@ronaldtse I mean like in the example below, everything under the data is from current glossarist model, just moved to a separate localized-concept file with a uuid and in the concept file we are adding a reference to this file.

id: 00250c70-121c-40d3-8230-8b87380dd1ae
data:
  language_code: eng
  terms:
    - normative_status: preferred
      type: expression
      designation: time
  definition:
    - content: monotonically increasing value generated by a node
  notes: []
  examples: []
  authoritativeSource:
    - link: >-
        https://www.web3d.org/specifications/X3Dv4Draft/ISO-IEC19775-1v4-IS.proof/Part01/glossary.html#Time
status: valid
dateAccepted: 2023-08-04T11:33:09.535Z

@ronaldtse
Copy link
Member Author

In v2, every localized concept should be in a separate file, not a single file that contains multiple localized concepts.

@ribose-jeffreylau
Copy link

@ronaldtse To me, normal YAML keys are snake_case.

@HassanAkbar
Copy link
Member

HassanAkbar commented Aug 16, 2023

In v2, every localized concept should be in a separate file

@ronaldtse And the separate file structure will be almost similar to v1 model ?

@strogonoff
Copy link

I am a bit alarmed by this discussion.

If we are describing the schema how it effectively is right now, we ought to describe the schema how it effectively is right now. We can then take that as version 1.0 or 0.1 and evolve from there: implement schema version support in consumers and evolve data structures.

If we are not describing the schema how it is but designing the new schema here, we will 1) waste time on consensus and 2) end up with a schema that doesn’t match the data, so then someone will have to make sure all data sources are updated to the new schema, all implementations are updated to the new schema, etc., so depending on other ongoing projects it may easily be months before we can begin establishing some sort of cadence.

@strogonoff
Copy link

If previous comment was ambiguous: let’s describe the schema how it effectively is now and not debate what it should be, that process is potentially infinite and should take place in context of schema versioning.

@ronaldtse
Copy link
Member Author

@strogonoff: @HassanAkbar and I are describing how this structure has been implemented in the latest instance of ISO 10303-2, of which implementation is already integrated into Metanorma. We are attempting to reach consensus here with the Glossarist implementation.

@ronaldtse
Copy link
Member Author

@strogonoff please note that the Glossarist YAML format is already used in Geolexica and Metanorma, today.

@strogonoff
Copy link

strogonoff commented Aug 16, 2023

@ronaldtse Exactly. Let’s document what it is in its current state, i.e. snake or camel case as they are used now. Then we can work on making it better version by version.

@ronaldtse
Copy link
Member Author

To be fair, glossarist-ruby is already somewhat flexible in what YAML schema it reads -- it supports the old Geolexica format, and also supports the newer ISO 10303-2 format, and the new format used in the Metaverse Glossary. So this is why there is some confusion on what is the "proper" YAML format.

@HassanAkbar can you please come up with the YAML Schema for the ISO 10303-2 format and then we can discuss it in detail? Please put that in a PR so we can all comment by line... thanks.

@HassanAkbar
Copy link
Member

HassanAkbar commented Aug 16, 2023

@ronaldtse By skimming through the data sources I could not find new version in ISO 10303-2 e.g concept-3.1.1.1. I did find that isotc211-glossary is using the new glossarist format e.g 0002e0ac-f74e-5ae0-9b58-f459c7d60cfa.

I’m currently going through the documents in details and will updated you on it once I am done.

@strogonoff
Copy link

To be fair, glossarist-ruby is already somewhat flexible in what YAML schema it reads -- it supports the old Geolexica format, and also supports the newer ISO 10303-2 format, and the new format used in the Metaverse Glossary. So this is why there is some confusion on what is the "proper" YAML format.

This doesn’t look like an obstacle if we are documenting the schema as it is currently used. The schema would simply document those fuzzy instances as they are now in data. We can subsequently mark any undesired duplications as deprecated and clarify semantics as we iterate.

@strogonoff
Copy link

strogonoff commented Aug 17, 2023

One last note, I would advise against YAML schema, which seems to be building on top of JSON schema 4 (since then JSON schema version 5, 6, 7, 2019 and 2020 have come out), introducing specific incompatibilities with JSON schema (e.g., propertyOrder, which seems to have been dropped out from discussions on JSON schema vocabulary), and published ad-hoc rather than being backed by an Internet Draft for example (though I may be wrong here).

JSON schema is compatible with YAML as is, not only because JSON is a subset of YAML but also because validators operate on runtime representations of data anyway. In this sense, JSON schema is a bit of a misnomer, since validation typically takes place against runtime JavaScript objects (or Python dictionaries, etc.), and how they are obtained (whether from YAML or JSON) is orthogonal.

Other than that, I don’t have many opinions. As mentioned on Zulip, an update to how Glossarist data is represented (ditching universal concepts and some other changes) is likely coming in Glossarist format v3 (the current version with universal and localized concepts is v2), but I don’t see why we shouldn’t document the schema as is while the next version is being fleshed out.

@ronaldtse
Copy link
Member Author

YAML Schema is very well adopted in industry, e.g.

This is the very first time I've heard of Glossarist format v3, so please help explain what are the intended changes before we move ahead with that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

4 participants