Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a definitive version of a type array? #22

Closed
Zegnat opened this issue Feb 25, 2018 · 8 comments
Closed

Is there a definitive version of a type array? #22

Zegnat opened this issue Feb 25, 2018 · 8 comments

Comments

@Zegnat
Copy link
Member

Zegnat commented Feb 25, 2018

Say we have this HTML snippet:

<div class="h-entry h-cite h-entry"></div>

What would you expect the following step in the parsing spec to return?

  • type: [array of microformat "h-*" type(s) on the element],

The PHP parser gives us the array in alphabetical order:

"type": [
  "h-cite",
  "h-entry",
  "h-entry"
]

While the Go and Python parsers stick to the order as given:

"type": [
  "h-entry",
  "h-cite",
  "h-entry"
]

In addition to this I would want to ask if people expect this array to give unique classes only or not? Is there any use to returning [ "h-entry", "h-entry" ]?

Or maybe none of this needs to be defined and the answer to the question in the topic is just “an unordered list of classes starting in h-”.

@kartikprabhu
Copy link
Member

I don't see any use for this to be defined with this precision. I always have thought of it as “an unordered list of classes starting in h-”

@Zegnat
Copy link
Member Author

Zegnat commented Mar 14, 2018

I always have thought of it as “an unordered list of classes starting in h-”

Maybe that would be a better description then? The way it specifies “"h-*" type(s)” rather than e.g. classes made me think it meant unique values, but I may be the only one who read it that way. (Which in itself was a reason for me to open this issue.)

There is also the question of what you mean by “classes”. Do you mean anything in the class attribute on the HTML element, or everything in the DOM classList?

The DOM classList property will only list unique classes in source order and is very specific about that. This because classList returns a DOMTokenList which in its turn is an ordered set created through parsing the element’s class attribute. (My investment in the DOM spec may be another reason why I thought unique items would make sense.)

Here is a quick comparison between using the DOM method or doing your own string manipulation on the class attribute:

let output = []
for (let value of element.classList) {
  if (value.substr(0, 2) === 'h-') {
    output.push(value)
  }
}
// output === [ "h-entry", "h-cite" ]
let output = []
for (let value of element.getAttribute('class').split(/[\x09\x0A\x0C\x0D\x20]+/)) {
  if (value.substr(0, 2) === 'h-') {
    output.push(value)
  }
}
// output === [ "h-entry", "h-cite", "h-entry" ]

I feel like following the DOM specification and having the HTML parser handle parsing the attribute into a token list would be a good move for microformats. You are going to have to dive into the specification to figure out how to split the class attribute value anyway.

Following the spec also gives us ordered lists that should be the same between implementations. WHATWG specifically calls out interoperability as a reason for using ordered lists as often as possible (from the ordered set link above):

Almost all cases on the web platform require an ordered set, instead of an unordered one, since interoperability requires that any developer-exposed enumeration of the set’s contents be consistent between browsers. In those cases where order is not important, we still use ordered sets; implementations can optimize based on the fact that the order is not observable.

@kartikprabhu
Copy link
Member

Again not really sure this is relevant in practice.

@Zegnat
Copy link
Member Author

Zegnat commented Mar 14, 2018

Again not really sure this is relevant in practice.

It probably isn’t relevant for parsers. But it is relevant for tests and things like a JSON schema for validating microformats in JSON (e.g. for Micropub). If the spec does not define what the type collection looks like, how are you going to know whether it is valid in the first place?

@kartikprabhu
Copy link
Member

Maybe people who understand validating mf2 can give more input. I am not sure that validating mf2 outputs from parsers makes much sense.

@Zegnat
Copy link
Member Author

Zegnat commented Mar 19, 2018

As came up today, per the latest JSON spec RFC 8259:

An array is an ordered sequence of zero or more values.

As opposed to:

An object is an unordered collection of zero or more name/value pairs, […]

If we were to strictly compare the JSON output of different parsers and compare arrays with order intact, they will be in conflict with each other.

@Zegnat
Copy link
Member Author

Zegnat commented Mar 20, 2018

This issue is mostly superseded by #29 and #30. The first addresses order, the latter duplicates. Unlike this issue, they are about all microformats arrays rather than just type.

@Zegnat
Copy link
Member Author

Zegnat commented Mar 21, 2018

#29 and #30 have been closed and now define exactly how a type array should be returned: unique items, sorted alphabetically.

@Zegnat Zegnat closed this as completed Mar 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants