Describe syntax with regex. #1229

ioggstream · 2022-05-16T07:55:27Z

I suggest

describing syntax using regexp
remove non-actionable informative content since it distracts the reader from the normative part (it can be added to a FAQ section)

Note

The spec already references ECMA regexp

* Change URIs to IRIs where appropriate

gregsdennis · 2022-05-16T08:23:33Z

If we change this I think I'd prefer ABNF instead, as it's more common in specifications for describing syntax.

awwright · 2022-05-16T19:33:04Z

What purpose does this achieve? Do you think this paragraph is actually confusing or could be improved? Could you elaborate on your experience a bit?

Like @gregsdennis points out, if the rationale is to make the description machine-readable, ABNF would be preferred. While we technically could use the ECMA RegExp syntax (they describe the same thing), the ABNF is more accessible to the standards community at large; and this usage of the ECMA syntax is too ambiguous (it lacks the slashes that quote the RegExp, and it lacks start and end anchors that are implicit in ABNF, but not in RegExp... we would want to write an actual line of ECMAScript).

And the implementation guidance should be left in-line; explanations like this are critical for helping readers make connections between related concepts before they move on to the next section. In this case, readers will make the connection that the syntax is the same as in XML; even if they haven't read the XML spec, they're likely to have seen this pattern before.

karenetheridge · 2022-05-16T19:37:39Z

This regex is not correct.

it must be anchored with ^ and $
"any number" can be zero, so + should be *
literal hyphen characters must be at the start or the end of the bracketed character class, so they are not interpreted as part of a range.

Therefore: ^[A-Za-z_][A-Za-z0-9._-]*$

ioggstream · 2022-05-17T21:00:07Z

I don't have preferences between abnf and regexp.
Here and elsewhere in this document, both are ok as long as it's not in prose :)

we would want to write an actual line of ECMAScript

+1 Since this spec doesn't use ABNF, I used ecma to avoid adding normative references.

About the reference to xml, I think it does not help that much since the regexp is easy.

In general I find this document very long: this results in readers just skipping this and just look for secondary sources :)

handrews · 2022-05-30T21:43:50Z

@ioggstream the point of the reference to XML is that the WC3 Best Practices for Fragment Identifiers and Media Type Definitions (which is also cited elsewhere in JSON Schema Core) references the XML NCName production as part of their best practices for plain name frag ids (see the large quote below for details).

There is more going on in this section than just how to represent the syntax (I agree with @gregsdennis and @awwright that ABNF is preferable). The point of this part of the spec is to tie JSON Schema's plain name fragments to the larger standards ecosystem of such things. That could be done more clearly and concisely, but it should not be removed.

Plain names are a common type of fragid structure. A plain name fragid is a fragid that is used to identify a named structure within a document, such as one identified by an @id attribute in HTML, a @xml:id attribute in XML or the name of a function within a Python program. These fragids are opaque to processors and as such they do not normally include punctuation characters, though this depends on the language: in XML, for example, they usually match the NCName production from XML Namespaces [XML-NAMES11] which means they can contain hyphens (-) and periods (.).

Plain name fragids are usually created by human authors but may also be generated by applications. They provide a good method of identifying content that is equivalent across content-negotiated variants of a document, for example paragraphs of text in French and Chinese that contain the same semantic content. Plain name fragids that do not identify a portion of a document are frequently used in Semantic Web applications as a way of providing an identifier for something described by the document.

Best Practice 3: Reserve Plain Name Fragids

If the media type includes structures that can be given local names or identifiers, plain name fragids should be reserved for addressing those structures.

ioggstream · 2022-06-01T14:57:34Z

... e WC3 Best Practices for Fragment Identifiers and Media Type Definitions (which is also cited elsewhere in JSON Schema Core) references the XML NCName

Ok, it wasn't clear to me when reading the section, and I think it should be clarified e.g. in a referenced Appendix (e.g. JSON Schema and XML Schema ...) so that the reader can better identify normative parts.

handrews · 2022-06-01T19:22:52Z

@ioggstream I don't think there's any need for an appendix here. All this section needs to do is:

xref the Best Practices document to explain why this syntax (XML's NCName) is used (not currently done in this section)
xref the XML Namesaces spec for the normative NCName production (currently done, just not clear why or whether it's normative)
Reproduce the NCName ABNF in a simplified form (without the layers of other ABNF that are irrelevant for JSON Schema) for informative convenience (currently done in prose, and presented as normative)

awwright · 2022-06-02T18:51:09Z

(not currently done in this section)

We do discuss it elsewhere though, in https://json-schema.org/draft/2020-12/json-schema-core.html#rfc.section.5.

just not clear why or whether it's normative

XML technically supports Unicode characters and for ease of getting the first draft out we opted not to support that. We can change that now.

Even if XML only specified the same ASCII set that we do, it might still be easier to explain "use this pattern ... which is selected to match other popular syntaxes including XML" rather than "refer to the XML definition."

handrews · 2022-06-02T21:29:22Z

@awwright

We do discuss it elsewhere though

yes, that's why I said "in this section" 🙂

Even if XML only specified the same ASCII set that we do, it might still be easier to explain "use this pattern ... which is selected to match other popular syntaxes including XML" rather than "refer to the XML definition."

Again, the important reference is the W3C Best Practices document, specifically the section on plain name fragments. It just so happens that that spec references the XML Namespaces specification, so there is a connection there. But the motivation here is not "align with XML." It is "align with W3C Best Practices regarding fragments. (I think it would be fine to expand that to Unicode because it's 2022 FFS).

The reference to XML Namespace's "NCName" production only makes sense in the context of "We are following the WC3 Best Practices for plain name fragments, which suggest using the NCName production from the XML namespace spec." (tidied up for clarity regarding what is normative, of course).

The reason to reference the XML Fragment spec is to make it clear what we are trying to do, so that if the ABNF in our spec does not match, someone will be able to tell that that was an unintentional error and not an intentional deviation.

handrews · 2022-06-02T21:30:50Z

And given the number of times various people have messed with the plain name syntax description, keeping that external reference (both of them, really) as a correctness check feels valuable to me.

jdesrosiers · 2022-07-08T15:29:48Z

The draft-next branch has been merged and is now closed. The merge target for this PR has been changed to main. Here are the recommended steps to get your branch reabsed properly.

Make sure your remote for the json-schema-org/json-schema-spec repo is up-to-date. (Example: git fetch upstream).
Rebase your commits onto main. (Example: git rebase --onto upstream/main abcd123~1 (replace abcd123 with the commit hash of the first commit in your PR)).
Force push the rebased branch to your fork. (Example: git push --force origin my-branch).

gregsdennis · 2022-08-01T04:55:31Z

I think for this particular change, I would prefer leaving the text but using a pattern (which uses regex) in the meta-schema. If the syntax really must be specified in some common format, I'd still prefer ABNF over regex, but I think it's overkill.

handrews · 2022-08-14T16:37:57Z

@ioggstream this PR doesn't seem to have a consensus around what is needed, and the change in branching has made it difficult to review. If you would still like something done here, could you please file an issue summarizing the options that have come up in this discussion so that we can debate it properly?

Since no one has spoken up in favor of a regex in the text, I'm going to go ahead and close this. If, after discussion in an issue, we agree that we should use a regex for this, it can be re-submitted.

jdesrosiers and others added 10 commits June 4, 2021 10:58

Allow contains to apply to objects as well as arrays

f5f6dec

Move contains to "other" applicator section

408ec65

Clarify that "length" applies to objects as well as arrays

40669d8

Add change log for bhutton-next

b5c4108

Add to change log "contains" applies to objects

1cec0c2

Document new branching process (json-schema-org#1115)

1ac6282

Add Code of Conduct badge

5f783b8

Add "contains" to keywords that effect "unevaluatedProperties"

13a58c7

Support for IRI references (json-schema-org#1137)

685d84e

* Change URIs to IRIs where appropriate

Describe syntax with regex.

2337ccf

fix suggested by Karen

19e0938

jdesrosiers changed the base branch from draft-next to main July 8, 2022 15:29

handrews closed this Aug 14, 2022

Uh oh!

Uh oh!

Describe syntax with regex. #1229

Describe syntax with regex. #1229

Uh oh!

Conversation

ioggstream commented May 16, 2022

I suggest

Note

Uh oh!

gregsdennis commented May 16, 2022

Uh oh!

awwright commented May 16, 2022

Uh oh!

karenetheridge commented May 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ioggstream commented May 17, 2022

Uh oh!

handrews commented May 30, 2022

Uh oh!

ioggstream commented Jun 1, 2022

Uh oh!

handrews commented Jun 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

awwright commented Jun 2, 2022

Uh oh!

handrews commented Jun 2, 2022

Uh oh!

handrews commented Jun 2, 2022

Uh oh!

jdesrosiers commented Jul 8, 2022

Uh oh!

gregsdennis commented Aug 1, 2022

Uh oh!

handrews commented Aug 14, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

karenetheridge commented May 16, 2022 •

edited

Loading

handrews commented Jun 1, 2022 •

edited

Loading