New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parse HTML id attribute #44

Closed
sknebel opened this Issue Nov 18, 2018 · 6 comments

Comments

Projects
None yet
5 participants
@sknebel
Copy link
Member

sknebel commented Nov 18, 2018

In a few places, being able to consume the HTML id attribute would be useful.

use cases

  1. to be able to consume fragment links to identify the relevant microformats object
  2. For following pages with multiple feeds, it's necessary to find the same feed again, while the page author should be free to move elements around on the page

output format

I'd propose a new 'id' attribute on the microformats object (not a property)
i.e.

<div class="h-feed" id="updates">
<a class="u-author h-card" href="https://example.com">Max Mustermann</a>
<li class="h-entry">[...]</li>
[...]

would produce output like

{
    "items": [
        {
            "type": [ "h-feed"],
            "id": "updates",      <------------------
            "properties": {
                "author": ...
            },
            "children": [
                {
                    "type": [
                        "h-entry"
                    ],
                    ...
}

This format should be completely backwards compatible.

imply uid?

In the discussion in IRC and in microformats/php-mf2#206, it was also proposed to automatically imply a uid property based on the document URL and the id as a fragment.

I don't think this is a good idea for a few reasons:

  • I'm not confident that this will not interact weirdly with concepts like authorship, representative h-card, ... uid seems fairly core to the identity of an object, and I'd prefer leaving it to the author.
  • for the feed use case, it's not necessarily desirable to use the URL of the resulting document, which would be reflected in the uid, if redirects are involved. Feed consumers should follow HTTP 302/307, but not remember those URLs. As such, the correct thing to remember is not the URL of the resulting document + a fragment, but the URL the redirect was found at + the fragment. The parser can not construct this, since it isn't aware of that URL.
  • EDIT: also, implying the uid could be a problem if the author later adds one, e.g. because they added a dedicated for the feed that didn't exist before
@sknebel

This comment has been minimized.

Copy link
Member

sknebel commented Nov 18, 2018

spec change proposal

Extend http://microformats.org/wiki/microformats2-parsing#parse_a_document_for_microformats with the new last bullet point:

  • else if found, start parsing a new microformat
    • keep track of whether the root class name(s) was from backcompat
    • create a new { } structure with:
      • type: [array of unique microformat "h-*" type(s) on the element sorted alphabetically],
      • properties: { } - to be filled in when that element itself is parsed for microformats properties
      • if the element has a non-empty HTML id property:
        id: string value of the HTML id attribute of the element

EDIT: text clarified that id has to be non-empty (it being empty isn't valid HTML anyways).

@gRegorLove

This comment has been minimized.

Copy link
Member

gRegorLove commented Nov 19, 2018

Sounds like good reasoning and a reasonable spec update. I'm in favor and can implement in php-mf2 pretty easily.

@dshanske

This comment has been minimized.

Copy link

dshanske commented Dec 24, 2018

As a user of the php-mf2 parser in my Parse This library, I would find this useful.

@jalcine

This comment has been minimized.

Copy link

jalcine commented Dec 24, 2018

This could help out quite a bit with the Elixir implementation of Microformats2. I do see the potential issue with using u-uid and have been opting to use u-uid in Koype but this would make things more explicit (which is better).

@dshanske

This comment has been minimized.

Copy link

dshanske commented Dec 31, 2018

I implemented some changes to my post-processing of parser output to take the id now in the PHP-MF2 master branch and use it to create a url with fragment for each feed, which allowed me to individually enumerate the feeds. That will assist me in letting them be parsed as individual elements should someone request a specific feed.

@tantek

This comment has been minimized.

Copy link
Member

tantek commented Dec 31, 2018

Resolution: proposal accepted.

No objections in above discussion, and positive opinions (👍) from a few implementors on the proposal.

Proposal implementations in mf2py and phpmf2 parsers, and https://github.com/dshanske verification that phpmf2 implementation satisfies use-case for the issue is sufficient to demonstrate implementability and utility, all as noted/linked in issue thread.

Editing specification accordingly.

(Originally published at: http://tantek.com/2018/364/t3/)

@tantek tantek changed the title parse HTML id= property parse HTML id attribute Dec 31, 2018

@tantek tantek closed this Dec 31, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment