Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate docs and schemas from common schema definition format #315

Merged
merged 16 commits into from Oct 6, 2022

Conversation

magnusbaeck
Copy link
Member

@magnusbaeck magnusbaeck commented Sep 1, 2022

Applicable Issues

Fixes #282

Description of the Change

A new schema definition format that's a superset of the normal JSON schema format is introduced and all existing schemas and documentation files are merged into files in this new file format, stored in the "definitions" directory.

At the same time, the definitions of the meta member and event links are extracted to their own definition files and referenced from the event files. This both allows reuse but also makes it trivial for scripts like SDK generators to understand the structure better. Since both meta and links have evolved over time we've reconstructed different versions of those subschemas.

The schema files aren't affected at all by these changes, i.e. running generate_schemas.py on a definition file will produce a byte-identical schema file. This isn't quite true for the .md documentation; some minor changes are introduced:

  • Whitespace and other things that doesn't affect the rendering.
  • Some link types are reordered so they're in alphabetical order.
  • Previously some key/value pairs listed for a member ("Type", "Required", "Description", etc) were present but empty. Now only non-empty values cause the key to get included at all.
  • Members below meta.security.integrityProtection and a few other members for other events are now listed in a slightly different order (namely the one from the JSON schema).

To make sure the definition files don't diverge from the generated files we have an additional job in the GitHub Actions workflow that's run for all pushes and PRs. It runs the generation scripts and verifies that we won't get any dirty files in the workspace and no untracked files. See https://github.com/magnusbaeck/eiffel/actions/runs/3007678382 and https://github.com/magnusbaeck/eiffel/actions/runs/3007462169 for examples of what violations look like.

Alternate Designs

A few other options were discussed in the associated issue.

Benefits

The current representation of event schemas and documentation is an obstacle for protocol maintainers and others who want to process the information with a program. Making this change would greatly improve the situation.

Possible Drawbacks

None, I think.

Sign-off

Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I have the right to submit it under the open source license indicated in the file; or

(b) The contribution is based upon previous work that, to the best of my knowledge, is covered under an appropriate open source license and I have the right under that license to submit that work with modifications, whether created in whole or in part by me, under the same open source license (unless I am permitted to submit under a different license), as indicated in the file; or

(c) The contribution was provided directly to me by some other person who certified (a), (b) or (c) and I have not modified it.

(d) I understand and agree that this project and the contribution are public and that a record of the contribution (including all personal information I submit with it, including my sign-off) is maintained indefinitely and may be redistributed consistent with this project or the open source license(s) involved.

Signed-off-by: Magnus Bäck <magnus.back@axis.com>

A new schema definition format that's a superset of the normal JSON
schema format is introduced and all existing schemas and documentation
files are merged into files in this new file format, stored in the
"definitions" directory.

At the same time, the definitions of the meta member and event links are
extracted to their own definition files and referenced from the event
files. This both allows reuse but also makes it trivial for scripts like
SDK generators to understand the structure better. Since both meta and
links have evolved over time we've reconstructed different versions of
those subschemas.

The schema files aren't affected at all by these changes, e.g. running
generate_schema.py on a definition file will produce a byte-identical
schema file. This isn't quite true for the .md documentation; some minor
changes are introduced:

    - Whitespace and other things that doesn't affect the rendering.
    - Some link types are reordered so they're in alphabetical order.
    - Previously some key/value pairs listed for a member ("Type",
      "Required", "Description", etc) were present but empty. Now only
      non-empty values cause the key to get included at all.
    - Members below meta.security.integrityProtection and a few other
      members for other events are now listed in a slightly different
      order (namely the one from the JSON schema).
@e-backmark-ericsson
Copy link
Member

e-backmark-ericsson commented Sep 1, 2022

Sorry for raising this concern now, I'm either forgetting some good argumentation we've had or I've understood the consequences of this change too late.

With this solution we still need to create new files for a new version of each event. I do believe the reason for having individual json schemas for all historical event version stored in the repo was to make it easy to reference those schemas for all historical versions through one single branch of the repo. Now that we move from having the json schemas as the source of truth, and instead generate them for each new release (or each new version of the events?), do we still need this or could we let the yaml files be the same and change them for each new version instead? It would make reviewing of protocol changes so much easier if we don't create a new file for each new version.

@magnusbaeck
Copy link
Member Author

It seems like orthogonal questions:

  • Should the schema files and documentation be hand-edited or generated from a single source?
  • Should new versions result in new files or changes in existing files?

Schema files need to be regenerated and committed along with schema the definition file so this PR doesn't change the structure of the repo and how it's consumed.

@e-backmark-ericsson
Copy link
Member

Well the first question is what this PR and its related issue is all about, so that is of no concern I believe.
The second question has a follow up question hidden in it - how do we make sure the generated json files are aligned with the commited yaml files? If we don't at the same time introduce a new structure, for example by not checking in those json schemas but instead generate them at merge of a PR, we instead need to add a sanity test to verify that the manually generated json schemas match the commited yaml files. It won't be sustainable to review such differences manually for each change to the protocol.
And I believe that the individual json schemas per version could still be generated even if we don't create new yaml files for each version, by reading old commits from the repo, or?

@e-backmark-ericsson
Copy link
Member

As long as the validation of the generated json files match the yaml files is part of the Github action checking any protocol PRs, we don't need to introduce any more improvements through this PR.
In the long run we might want to generate an external documentation site for our events, and then we will have a lot of new requirements to deal with there.

@magnusbaeck magnusbaeck marked this pull request as ready for review September 7, 2022 13:06
@magnusbaeck magnusbaeck requested a review from a team as a code owner September 7, 2022 13:06
@z-sztrom
Copy link
Contributor

z-sztrom commented Sep 8, 2022

Works fine for me. Both schema and documentation generated properly.

@z-sztrom
Copy link
Contributor

z-sztrom commented Sep 8, 2022

I cannot review the changes. No enough permissions?

@magnusbaeck
Copy link
Member Author

Weird, you should be able to review the PR (but since you're not a maintainer any approval you make won't count towards any minimum number of approvals).

Copy link
Contributor

@z-sztrom z-sztrom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally review accessible for me... Nice work! Generated schemas equal to the schemas I modified manually some time ago.

@magnusbaeck
Copy link
Member Author

One thing I don't think we've discussed is what to do about changes to the YAML files that don't affect the generated schemas, like documentation changes and changes in the links. For example, #287 adds a new link type to a few events types which in that PR amounts to a mere documentation change, but when merging that change into this PR I can only assume that we need to issue new event versions (with a minor version bump)? Those schemas would currently be identical to the preceding versions.

@magnusbaeck
Copy link
Member Author

No, wait. When merging the PRECURSOR branch (or master, after that PR is merged) into this PR branch the current events should be regarded as settled law and we obviously shouldn't issue new event versions after the fact, but what should we do the next time someone adds a link type or makes a similar non-trivial change to a schema definition file that doesn't affect the schema?

@magnusbaeck
Copy link
Member Author

This PR is now up to date with @e-backmark-ericsson's recently merged PRECURSOR change so it's ready for a review. Let me know if there would be any point in a "group review" where we go through things together.

@e-backmark-ericsson
Copy link
Member

No, wait. When merging the PRECURSOR branch (or master, after that PR is merged) into this PR branch the current events should be regarded as settled law and we obviously shouldn't issue new event versions after the fact, but what should we do the next time someone adds a link type or makes a similar non-trivial change to a schema definition file that doesn't affect the schema?

I'd say that the minor version of the event should be stepped if there is a link-only update to it, even though we (currently) don't check such things in the schemas. The reason for not having that as part of the schemas is that when the schemas were originally created there were no known means to validate that a list of dictionaries actually had certain keys in their dictionaries. I believe that more recent json schema version can handle that, and as soon as we add such restrictions to the schemas they will need to be updated also for link-only updates in the events.

Copy link
Member

@e-backmark-ericsson e-backmark-ericsson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think this PR looks very good. Job well done! I have a few requests for changes though. Most of them are easily fixed.

Regarding the comments on the copyright headers - I believe the correct way to handle those is to actually move the copyright headers out of the generated md files and instead provide them as comments in the yml files. There should really bee no need to make them part of the generated md files by adding them as a key with underscore to the yml file structure. The actual copyright should hold for the file that is manually edited (the yml file) and not for the generated files.

eiffel-syntax-and-usage/event-schemas.md Show resolved Hide resolved
eiffel-syntax-and-usage/event-schemas.md Outdated Show resolved Hide resolved
eiffel-syntax-and-usage/event-schemas.md Outdated Show resolved Hide resolved
eiffel-syntax-and-usage/event-schemas.md Show resolved Hide resolved
event_docs.md.j2 Outdated Show resolved Hide resolved
eiffel-vocabulary/EiffelIssueDefinedEvent.md Outdated Show resolved Hide resolved
definitions/EiffelIssueDefinedEvent/1.0.0.yml Outdated Show resolved Hide resolved
definitions/EiffelIssueDefinedEvent/2.0.0.yml Outdated Show resolved Hide resolved
definitions/EiffelIssueDefinedEvent/3.0.0.yml Outdated Show resolved Hide resolved
definitions/EiffelIssueDefinedEvent/3.1.0.yml Outdated Show resolved Hide resolved
@magnusbaeck
Copy link
Member Author

I'd say that the minor version of the event should be stepped if there is a link-only update to it, even though we (currently) don't check such things in the schemas.

Yeah, I agree. I'll update eiffel-syntax-and-usage/versioning.md to document this.

Regarding the comments on the copyright headers - I believe the correct way to handle those is to actually move the copyright headers out of the generated md files and instead provide them as comments in the yml files. There should really bee no need to make them part of the generated md files by adding them as a key with underscore to the yml file structure. The actual copyright should hold for the file that is manually edited (the yml file) and not for the generated files.

Yep. There was no good reason for omitting the copyright notice from the YAML files. We could generate a file header for the .md files that states the origin of the file, something like this:

<!--
This file was generated from ../definitions/EiffelCompositionDefinedEvent/3.2.0.yml.
See that file for a copyright notice.
-->

Copy link
Member

@e-backmark-ericsson e-backmark-ericsson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more whitespaces missing

eiffel-vocabulary/EiffelIssueDefinedEvent.md Outdated Show resolved Hide resolved
eiffel-vocabulary/EiffelIssueDefinedEvent.md Outdated Show resolved Hide resolved
eiffel-vocabulary/EiffelIssueDefinedEvent.md Outdated Show resolved Hide resolved
@magnusbaeck
Copy link
Member Author

Yeah, I agree. I'll update eiffel-syntax-and-usage/versioning.md to document this.

Done in 0fef421.

Yep. There was no good reason for omitting the copyright notice from the YAML files. We could generate a file header for the .md files that states the origin of the file, something like this:

Done in 28abb25.

Copy link
Member

@e-backmark-ericsson e-backmark-ericsson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@magnusbaeck magnusbaeck merged commit 4da1d54 into eiffel-community:master Oct 6, 2022
@magnusbaeck magnusbaeck deleted the new-schema-def branch October 6, 2022 15:18
@magnusbaeck
Copy link
Member Author

@z-sztrom, this PR has been merged now so feel free to rebase and upload your schema URL PR whenever you have time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Documentation and schemas should be generated from a common source
4 participants