Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

should dt-* parsing do date and time parsing for all values? #12

Open
tantek opened this issue Sep 22, 2017 · 9 comments
Open

should dt-* parsing do date and time parsing for all values? #12

tantek opened this issue Sep 22, 2017 · 9 comments

Comments

@tantek
Copy link
Member

tantek commented Sep 22, 2017

Currently in http://microformats.org/wiki/microformats2-parsing#parsing_a_dt-_property special date and time parsing is only done as part of step one for VCP handling.

The proposal is to move (and thus extract from VCP and inline into mf2 parsing) that "date and time parsing rules" mentioned in step 1 to after all the value retrieval is done, before returning a value.

This would be a larger fix that should incorporate also accepting the proposals in issue #4 and #8 .

I don't have a specific real world example for this particular proposal, thus the issue title is a question. All feedback welcome, and especially real world examples that would be helped by this beyond the smaller fixes noted in #4 and #8.

Feedback explicitly requested from: @sknebel @gRegorLove @Zegnat. Thanks!

@tantek
Copy link
Member Author

tantek commented Sep 22, 2017

We can also leave this open longer, and just move forward with #4 and/or #8 until we have more evidence or consensus one way or the other.

@Zegnat
Copy link
Member

Zegnat commented Sep 23, 2017

My answer to the question in the title would be Yes.

I feel like dt-* handling should describe how a string gets turned into a datetime stamp. No matter where the string is coming from (textContent, attribute, VCP, …). I also think this would give parsers an easier job.

As I wrote in #8 and on IRC (emphasis added that is implicit to this issue):

is there some way we can generalise a vcp-to-string algo for dt-* and then generalise a string-to-valid-timestamp algo that works on the string value of the dt-*, then it no longer matters if that string value was obtained through regular parsing or through vcp.

@gRegorLove
Copy link
Member

Here is a real-world example we ran into today on http://indieweb.org/events.

<span class="h-event vevent">
	<span class="dt-start dtstart">
		<span class="value" title="August 1, 2018">2018-08-01</span>
		<span class="value" title="20:30">20:30<span style="display: none;">-5:00</span></span>
	</span><span class="dt-end dtend">22:00<span style="display: none;">-5:00</span></span> (-5:00 <abbr>UTC</abbr>):
<span class="p-content">An informal online get together for people new to blogging, building websites, or using IndieWeb plugins on WordPress.</span>
</span>

php-mf2 parse:

{
    "items": [
        {
            "type": [
                "h-event"
            ],
            "properties": {
                "content": [
                    "An informal online get together for people new to blogging, building websites, or using IndieWeb plugins on WordPress."
                ],
                "start": [
                    "2018-08-01 20:30-0500"
                ],
                "end": [
                    "22:00-5:00-0500"
                ]
            }
        }
    ],
    "rels": {},
    "rel-urls": {},
    "debug": {
        "package": "https://packagist.org/packages/mf2/mf2",
        "source": "https://github.com/indieweb/php-mf2",
        "version": "v0.4.5",
        "note": [
            "This output was generated from the php-mf2 library available at https://github.com/indieweb/php-mf2",
            "Please file any issues with the parser at https://github.com/indieweb/php-mf2/issues",
            "Using the Masterminds HTML5 parser"
        ]
    }
}

mf2py parse:

{
    "rels": {}, 
    "items": [
        {
            "type": [
                "h-event"
            ], 
            "properties": {
                "content": [
                    "An informal online get together for people new to blogging, building websites, or using IndieWeb plugins on WordPress."
                ], 
                "start": [
                    "2018-08-01"
                ], 
                "end": [
                    "22:00-5:00"
                ]
            }
        }
    ], 
    "rel-urls": {}, 
    "debug": {
        "source": "https://github.com/microformats/mf2py", 
        "version": "1.1.1", 
        "markup parser": "html5lib", 
        "description": "mf2py - microformats2 parser for python"
    }
}

@sknebel
Copy link
Member

sknebel commented Nov 19, 2018

I guess this makes sense. VCP and the HTML rules for the datetime attribute of the <time> element are probably good starting points of syntax to accept, with the latter maybe being the output format too?

@jalcine
Copy link

jalcine commented Jun 28, 2022

After having microformats/tests#29 confirmed and resolved, the lack of this being in the standard is the only thing preventing the Rust parser from being fully compliant, thus enabling this: #12 (comment)

(Originally published at: https://jacky.wtf/2022/6/yy8Z)

@gRegorLove
Copy link
Member

I found some more edge cases that this spec update should cover:

  • if the value has a specific ISO8601 date, time, and timezone, use those and stop looking for "value" elements.
<div class="h-event">
  <span class="dt-start">
    <span class="value">2022-07-05T17:30-08:00</span>
  </span>
</div>

This "value" is used as-is, no normalization to remove "T" or the colon in timezone offset:

{
 "items": [
  {
   "type": [
    "h-event"
   ], 
   "properties": {
    "start": [
     "2022-07-05T17:30-08:00"
    ], 
    "name": [
     "2022-07-05T17:30-08:00"
    ]
   }
  }
 ]
}

Similarly for:

  • if the value has both a specific ISO8601 date and time, use those
<div class="h-event">
  <span class="dt-start">
    <span class="value">2022-07-05T17:30</span>
  </span>
</div>
{
 "items": [
  {
   "type": [
    "h-event"
   ], 
   "properties": {
    "start": [
     "2022-07-05T17:30"
    ], 
    "name": [
     "2022-07-05T17:30"
    ]
   }
  }
 ]
}

@jalcine
Copy link

jalcine commented Oct 19, 2023

I mentioned before how this is a upstream blocker to get the Rust library fully compatible. That's changed but normalization would simplify the act of parsing (and testing) date values, thus me throwing my vote in favor of it and curious to hear if anyone else is in favor of that as well.

(Originally published at: https://jacky.wtf/2023/10/evyZ)

@JKingweb
Copy link

JKingweb commented Oct 19, 2023

I'm also in favour of normalizating date values everywhere, be it VCP or not. Parsers already have to perform normalization sometimes, so it adds no appreciable complexity to parsers, while simplifying things for consumers of the output of parsers.

My own parser already does this by default, for what it's worth.

@JKingweb
Copy link

While we're at it it might be worthwhile to drop : from time zones and transform Z to +0000 so that downstream consumers only have to deal with five formats in the JSON:

  • YYYY-MM-DD
  • YYYY-MM-DD hh:mm
  • YYYY-MM-DD hh:mm±0000
  • YYYY-MM-DD hh:mm:ss
  • YYYY-MM-DD hh:mm:ss±0000

It's a pretty straightforward application of Postel's law, with no information lost, and no new formats added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants