Is there a definitive version of a type array? #22

Zegnat · 2018-02-25T20:15:20Z

Say we have this HTML snippet:

<div class="h-entry h-cite h-entry"></div>

What would you expect the following step in the parsing spec to return?

type: [array of microformat "h-*" type(s) on the element],

The PHP parser gives us the array in alphabetical order:

"type": [
  "h-cite",
  "h-entry",
  "h-entry"
]

While the Go and Python parsers stick to the order as given:

"type": [
  "h-entry",
  "h-cite",
  "h-entry"
]

In addition to this I would want to ask if people expect this array to give unique classes only or not? Is there any use to returning [ "h-entry", "h-entry" ]?

Or maybe none of this needs to be defined and the answer to the question in the topic is just “an unordered list of classes starting in h-”.

The text was updated successfully, but these errors were encountered:

kartikprabhu · 2018-03-11T17:19:13Z

I don't see any use for this to be defined with this precision. I always have thought of it as “an unordered list of classes starting in h-”

Zegnat · 2018-03-14T08:33:13Z

I always have thought of it as “an unordered list of classes starting in h-”

Maybe that would be a better description then? The way it specifies “"h-*" type(s)” rather than e.g. classes made me think it meant unique values, but I may be the only one who read it that way. (Which in itself was a reason for me to open this issue.)

There is also the question of what you mean by “classes”. Do you mean anything in the class attribute on the HTML element, or everything in the DOM classList?

The DOM classList property will only list unique classes in source order and is very specific about that. This because classList returns a DOMTokenList which in its turn is an ordered set created through parsing the element’s class attribute. (My investment in the DOM spec may be another reason why I thought unique items would make sense.)

Here is a quick comparison between using the DOM method or doing your own string manipulation on the class attribute:

let output = []
for (let value of element.classList) {
  if (value.substr(0, 2) === 'h-') {
    output.push(value)
  }
}
// output === [ "h-entry", "h-cite" ]

let output = []
for (let value of element.getAttribute('class').split(/[\x09\x0A\x0C\x0D\x20]+/)) {
  if (value.substr(0, 2) === 'h-') {
    output.push(value)
  }
}
// output === [ "h-entry", "h-cite", "h-entry" ]

I feel like following the DOM specification and having the HTML parser handle parsing the attribute into a token list would be a good move for microformats. You are going to have to dive into the specification to figure out how to split the class attribute value anyway.

Following the spec also gives us ordered lists that should be the same between implementations. WHATWG specifically calls out interoperability as a reason for using ordered lists as often as possible (from the ordered set link above):

Almost all cases on the web platform require an ordered set, instead of an unordered one, since interoperability requires that any developer-exposed enumeration of the set’s contents be consistent between browsers. In those cases where order is not important, we still use ordered sets; implementations can optimize based on the fact that the order is not observable.

kartikprabhu · 2018-03-14T15:14:43Z

Again not really sure this is relevant in practice.

Zegnat · 2018-03-14T15:27:42Z

Again not really sure this is relevant in practice.

It probably isn’t relevant for parsers. But it is relevant for tests and things like a JSON schema for validating microformats in JSON (e.g. for Micropub). If the spec does not define what the type collection looks like, how are you going to know whether it is valid in the first place?

kartikprabhu · 2018-03-14T15:30:22Z

Maybe people who understand validating mf2 can give more input. I am not sure that validating mf2 outputs from parsers makes much sense.

Zegnat · 2018-03-19T18:58:55Z

As came up today, per the latest JSON spec RFC 8259:

An array is an ordered sequence of zero or more values.

As opposed to:

An object is an unordered collection of zero or more name/value pairs, […]

If we were to strictly compare the JSON output of different parsers and compare arrays with order intact, they will be in conflict with each other.

Zegnat · 2018-03-20T09:40:30Z

This issue is mostly superseded by #29 and #30. The first addresses order, the latter duplicates. Unlike this issue, they are about all microformats arrays rather than just type.

Zegnat · 2018-03-21T07:45:03Z

#29 and #30 have been closed and now define exactly how a type array should be returned: unique items, sorted alphabetically.

Zegnat mentioned this issue Mar 20, 2018

Define the order of items any time an array is used in the parsed output. #29

Closed

Zegnat closed this as completed Mar 21, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a definitive version of a type array? #22

Is there a definitive version of a type array? #22

Zegnat commented Feb 25, 2018 •

edited

kartikprabhu commented Mar 11, 2018

Zegnat commented Mar 14, 2018

kartikprabhu commented Mar 14, 2018

Zegnat commented Mar 14, 2018

kartikprabhu commented Mar 14, 2018

Zegnat commented Mar 19, 2018

Zegnat commented Mar 20, 2018

Zegnat commented Mar 21, 2018

Is there a definitive version of a type array? #22

Is there a definitive version of a type array? #22

Comments

Zegnat commented Feb 25, 2018 • edited

kartikprabhu commented Mar 11, 2018

Zegnat commented Mar 14, 2018

kartikprabhu commented Mar 14, 2018

Zegnat commented Mar 14, 2018

kartikprabhu commented Mar 14, 2018

Zegnat commented Mar 19, 2018

Zegnat commented Mar 20, 2018

Zegnat commented Mar 21, 2018

Zegnat commented Feb 25, 2018 •

edited