New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

p-name breaks on empty text #22

Closed
notenoughneon opened this Issue Jun 25, 2015 · 5 comments

Comments

Projects
None yet
3 participants
@notenoughneon

notenoughneon commented Jun 25, 2015

The example below is not parsing correctly. I would expect the entry "name" to be the empty string. Adding any non-whitespace text to the e-content causes it to revert to expected behavior.

<!DOCTYPE html>
<html lang="en">
<head>
</head>
<body>
    <div class="h-entry">
        <a href="http://this.site/photo" class="u-url"></a>
        <div class="e-content p-name"><img src="photo.jpg" class="u-photo"/></div>

        Some extraneous text

        <div class="h-cite">
            <a href="http://someother.site/like" class="u-url"></a>
            <a href="http://this.site/photo" class="u-like-of"></a>
            <div class="e-content p-name">liked this</div>
        </div>
    </div>
</body>
</html>
{ items: 
   [ { type: [ 'h-entry' ],
       properties: 
        { url: [ 'http://this.site/photo' ],
          content: [ { value: '', html: '<img src="photo.jpg" class="u-photo" />' } ],
          photo: [ 'photo.jpg' ],
          name: [ 'Some extraneous text\r\n\r\n        \r\n            \r\n            \r\n            liked this' ] },
       children: 
        [ { value: 'liked this',
            type: [ 'h-cite' ],
            properties: 
             { url: [ 'http://someother.site/like' ],
               'like-of': [ 'http://this.site/photo' ],
               content: [ { value: 'liked this', html: 'liked this' } ],
               name: [ 'liked this' ] } } ] } ],
  rels: {},
  'rel-urls': {} }
@glennjones

This comment has been minimized.

Show comment
Hide comment
@glennjones

glennjones Jun 25, 2015

Owner

Hi Emma

The parsing rules I am following here are:

If a property (p-name) is empty do not add it to the output. In this case "empty" is classed as not containing any non-whitespace text. As far as I known there is no guidance on how to handle "empty" properties in microfomats paring rules, so I followed the conventions of JSON API's not to return "empty" properties.

The side effect of the above is that p-name also has a number of "implied rules". The "implied rules" try to automatically fill properties like p-name if there is no defined value. In your example it uses the full text content of the parent h-entry.

I can see why the resulting different outputs would seem a little unexpected.

This is not really a bug, but a valid questions about how the parsing rules should work:

  • Should I return empty properties if the author of the HTML has added property p-name to an element with no text content
  • Should I execute the "implied rules" where there is a author defined "empty" property

I am going to have to post this issue to microfomats IRC and see if we can define the rules a bit more clearly for your use case. Once we have an agreed approached I can update the parser.

Personally I would recommend to any author of microformats to always add a p-name with some text to every h-*. Also with my parser try setting the options to {'textFormat': 'normalised'} you may find the resulting text more useful.

Owner

glennjones commented Jun 25, 2015

Hi Emma

The parsing rules I am following here are:

If a property (p-name) is empty do not add it to the output. In this case "empty" is classed as not containing any non-whitespace text. As far as I known there is no guidance on how to handle "empty" properties in microfomats paring rules, so I followed the conventions of JSON API's not to return "empty" properties.

The side effect of the above is that p-name also has a number of "implied rules". The "implied rules" try to automatically fill properties like p-name if there is no defined value. In your example it uses the full text content of the parent h-entry.

I can see why the resulting different outputs would seem a little unexpected.

This is not really a bug, but a valid questions about how the parsing rules should work:

  • Should I return empty properties if the author of the HTML has added property p-name to an element with no text content
  • Should I execute the "implied rules" where there is a author defined "empty" property

I am going to have to post this issue to microfomats IRC and see if we can define the rules a bit more clearly for your use case. Once we have an agreed approached I can update the parser.

Personally I would recommend to any author of microformats to always add a p-name with some text to every h-*. Also with my parser try setting the options to {'textFormat': 'normalised'} you may find the resulting text more useful.

@notenoughneon

This comment has been minimized.

Show comment
Hide comment
@notenoughneon

notenoughneon Jun 30, 2015

Thanks for clarifying. It sounds like the implied property rule breaks the "note type algorithm" and recommended practice of including a p-name in notes, if the note happens to be a photo with no text. Should this be documented on http://microformats.org/wiki/microformats2-parsing-issues?

notenoughneon commented Jun 30, 2015

Thanks for clarifying. It sounds like the implied property rule breaks the "note type algorithm" and recommended practice of including a p-name in notes, if the note happens to be a photo with no text. Should this be documented on http://microformats.org/wiki/microformats2-parsing-issues?

@glennjones

This comment has been minimized.

Show comment
Hide comment
@glennjones

glennjones Jul 3, 2015

Owner

I have add this problem to http://microformats.org/wiki/microformats2-parsing-issues. Sorry its taken a little while, but I wanted to go through my code carefully to make sure it was not an issue with my parser.

Please feel free to add your own view on how this should be dealt with to the wiki page.

Owner

glennjones commented Jul 3, 2015

I have add this problem to http://microformats.org/wiki/microformats2-parsing-issues. Sorry its taken a little while, but I wanted to go through my code carefully to make sure it was not an issue with my parser.

Please feel free to add your own view on how this should be dealt with to the wiki page.

@kylewm

This comment has been minimized.

Show comment
Hide comment
@kylewm

kylewm Jul 3, 2015

Interesting question! mf2py and php-mf2 will both happily include empty string values in the output and not generate an implied name. Added comments/votes to the wiki.

kylewm commented Jul 3, 2015

Interesting question! mf2py and php-mf2 will both happily include empty string values in the output and not generate an implied name. Added comments/votes to the wiki.

@glennjones

This comment has been minimized.

Show comment
Hide comment
@glennjones

glennjones Sep 8, 2015

Owner

Hi Emma, Its been a while but your view of what the output should be got agreed and I have now updated all my javascript parser code. You can try out the html in your example in http://glennjones.net/tools/microformats and the p-name should now return as an empty string.

Owner

glennjones commented Sep 8, 2015

Hi Emma, Its been a while but your view of what the output should be got agreed and I have now updated all my javascript parser code. You can try out the html in your example in http://glennjones.net/tools/microformats and the p-name should now return as an empty string.

@glennjones glennjones closed this Sep 8, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment