New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

image alt text is lost during parsing #2

Open
aaronpk opened this Issue Jul 12, 2016 · 33 comments

Comments

Projects
None yet
10 participants
@aaronpk
Member

aaronpk commented Jul 12, 2016

This example illustrates the loss of image alt text during microformats parsing.

<div class="h-entry">
  <p class="p-name e-content">Hello World</p>
  <img class="u-photo" src="globe.gif" alt="spinning globe animation">
</div>
    {
      "type": [
        "h-entry"
      ],
      "properties": {
        "name": [
          "Hello World"
        ],
        "photo": [
          "http://example.com/globe.gif"
        ],
        "content": [
          {
            "html": "Hello World",
            "value": "Hello World"
          }
        ]
      }
    }

This will occur any time the <img> tag appears outside of other microformats properties.

This means it's impossible for a consumer of the parsed h-entry to reconstruct a representation of the post that includes the alt text.

This is blocking w3c/Micropub#34

@voxpelli

This comment has been minimized.

Show comment
Hide comment
@voxpelli

voxpelli Jul 12, 2016

This sounds similar to the language parsing brainstorm at: http://microformats.org/wiki/microformats2-parsing-brainstorming#Parse_language_information

lang like alt is additional data that needs to be carried through, although a more complex one since it's inherited while this one isn't inherited.

Continuing on the brainstorming around how to include languages one could imagine:

<img class="u-photo" src="globe.gif" alt="spinning globe animation" lang="en">

Parsed as:

{
  "photo": [
    {
      "value": "http://example.com/globe.gif",
      "alt": "spinning globe animation",
      "lang": "en"
    }
  ]
}

As all implementations should already have the expectation of receiving an object rather than a string and to use the value of that object rather than the string, so adding such an additional alt value would be totally backwards compatible.

Important for parsing libraries to also distinguish between an empty alt and a unspecified alt as that has significantly different meanings.

I know @glennjones has already implemented experimental lang parsing: glennjones/microformat-shiv#22

And @gRegorLove made a lang PR for php-mf2 parser: microformats/php-mf2#97

So there's something to build upon there experience wise.

voxpelli commented Jul 12, 2016

This sounds similar to the language parsing brainstorm at: http://microformats.org/wiki/microformats2-parsing-brainstorming#Parse_language_information

lang like alt is additional data that needs to be carried through, although a more complex one since it's inherited while this one isn't inherited.

Continuing on the brainstorming around how to include languages one could imagine:

<img class="u-photo" src="globe.gif" alt="spinning globe animation" lang="en">

Parsed as:

{
  "photo": [
    {
      "value": "http://example.com/globe.gif",
      "alt": "spinning globe animation",
      "lang": "en"
    }
  ]
}

As all implementations should already have the expectation of receiving an object rather than a string and to use the value of that object rather than the string, so adding such an additional alt value would be totally backwards compatible.

Important for parsing libraries to also distinguish between an empty alt and a unspecified alt as that has significantly different meanings.

I know @glennjones has already implemented experimental lang parsing: glennjones/microformat-shiv#22

And @gRegorLove made a lang PR for php-mf2 parser: microformats/php-mf2#97

So there's something to build upon there experience wise.

@voxpelli

This comment has been minimized.

Show comment
Hide comment
@voxpelli

voxpelli Jul 12, 2016

After some discussion I'm not as sure anymore on the similarity in parsing – could be that this is rather a special case of fallback content for embedded content: https://html.spec.whatwg.org/multipage/dom.html#fallback-content

One should maybe consider <video>, <audio>, <object>and other embeddable content in addition to <img> when solving this.

The resulting value from whatever parsing one ends up with could probably though be represented similarly as has been suggested for lang– as an alt, fallback or similarly named key on an object similarly constructed as those for e-* type properties.

voxpelli commented Jul 12, 2016

After some discussion I'm not as sure anymore on the similarity in parsing – could be that this is rather a special case of fallback content for embedded content: https://html.spec.whatwg.org/multipage/dom.html#fallback-content

One should maybe consider <video>, <audio>, <object>and other embeddable content in addition to <img> when solving this.

The resulting value from whatever parsing one ends up with could probably though be represented similarly as has been suggested for lang– as an alt, fallback or similarly named key on an object similarly constructed as those for e-* type properties.

@kevinmarks

This comment has been minimized.

Show comment
Hide comment
@kevinmarks

kevinmarks Jul 12, 2016

Member

To preserve alt text (and indeed all accessibility markup) you can use e-content.

Member

kevinmarks commented Jul 12, 2016

To preserve alt text (and indeed all accessibility markup) you can use e-content.

@tantek

This comment has been minimized.

Show comment
Hide comment
@tantek

tantek Jul 18, 2016

Member

First, I think "can use e-content" is not solving the problem, but rather "kicking the can down the road". It is not a solution for the parsing of alt text problem, but instead a way of procrastinating responsibility of parsing for alt text to every microformats JSON consuming application, which is unreasonable since the reason a microformats JSON consuming application is using microformats JSON in the first place is because they do not want to have to parse the HTML. Thus saying "just parse the HTML from e-content" (which is essentially what saying "you can use e-content ... To preserve alt text (and indeed all accessibility markup)" is saying is ignoring the very context of incentives of the microformats JSON consuming application in the first place.

Second, lang and alt are similar in that they are both extra information on the element, but the resemblance stops there. "lang" is both rarely used (in comparison to "alt"), and can often be auto-implied from the content, whereas "alt" can nearly never be implied, and is thus more important to solve. That being said, if a solution for "alt" works for "lang", that would be a nice side effect (but it's not a "must have").

Member

tantek commented Jul 18, 2016

First, I think "can use e-content" is not solving the problem, but rather "kicking the can down the road". It is not a solution for the parsing of alt text problem, but instead a way of procrastinating responsibility of parsing for alt text to every microformats JSON consuming application, which is unreasonable since the reason a microformats JSON consuming application is using microformats JSON in the first place is because they do not want to have to parse the HTML. Thus saying "just parse the HTML from e-content" (which is essentially what saying "you can use e-content ... To preserve alt text (and indeed all accessibility markup)" is saying is ignoring the very context of incentives of the microformats JSON consuming application in the first place.

Second, lang and alt are similar in that they are both extra information on the element, but the resemblance stops there. "lang" is both rarely used (in comparison to "alt"), and can often be auto-implied from the content, whereas "alt" can nearly never be implied, and is thus more important to solve. That being said, if a solution for "alt" works for "lang", that would be a nice side effect (but it's not a "must have").

@tantek

This comment has been minimized.

Show comment
Hide comment
@tantek

tantek Jul 18, 2016

Member

I'm not sure how much to brainstorm in a GH issue and how much to recommend a specific course of action. Feels weird to brainstorm in a threaded medium (GitHub issue) which is the opposite of what you want (collaborative iteration in-place on a brainstorm). @aaronpk suggested a hybrid approach of collborative iterative brainstorming on the wiki.

Here is a start on some specific ideas for approaches (and problems therein):
http://microformats.org/wiki/microformats2-parsing-brainstorming#Parse_img_alt

Member

tantek commented Jul 18, 2016

I'm not sure how much to brainstorm in a GH issue and how much to recommend a specific course of action. Feels weird to brainstorm in a threaded medium (GitHub issue) which is the opposite of what you want (collaborative iteration in-place on a brainstorm). @aaronpk suggested a hybrid approach of collborative iterative brainstorming on the wiki.

Here is a start on some specific ideas for approaches (and problems therein):
http://microformats.org/wiki/microformats2-parsing-brainstorming#Parse_img_alt

@bear

This comment has been minimized.

Show comment
Hide comment
@bear

bear Aug 1, 2016

The change as described in the brainstorm conversation here:
http://microformats.org/wiki/microformats2-parsing-brainstorming#Parse_img_alt

Any implementation of this change would (should) be paired with a major version # change to give consumers a chance to adjust their consuming code

bear commented Aug 1, 2016

The change as described in the brainstorm conversation here:
http://microformats.org/wiki/microformats2-parsing-brainstorming#Parse_img_alt

Any implementation of this change would (should) be paired with a major version # change to give consumers a chance to adjust their consuming code

@aaronpk

This comment has been minimized.

Show comment
Hide comment
@aaronpk

aaronpk Aug 1, 2016

Member

Of the current options in the brainstorming section, everyone who has commented there agrees on the following:

  • If a u-* property is parsed on an element with a non-empty 'alt' attribute, then:
    • Create a structure similar to the e-content nested structure that provides the "value" as the URL, and an "alt" as the text alternative.

The original example I gave would end up looking like this:

<div class="h-entry">
  <p class="p-name e-content">Hello World</p>
  <img class="u-photo" src="globe.gif" alt="spinning globe animation">
</div>
    {
      "type": [
        "h-entry"
      ],
      "properties": {
        "name": [
          "Hello World"
        ],
        "photo": [
          {
            "value": "http://example.com/globe.gif",
            "alt": "spinning globe animation"
          }
        ],
        "content": [
          {
            "html": "Hello World",
            "value": "Hello World"
          }
        ]
      }
    }
Member

aaronpk commented Aug 1, 2016

Of the current options in the brainstorming section, everyone who has commented there agrees on the following:

  • If a u-* property is parsed on an element with a non-empty 'alt' attribute, then:
    • Create a structure similar to the e-content nested structure that provides the "value" as the URL, and an "alt" as the text alternative.

The original example I gave would end up looking like this:

<div class="h-entry">
  <p class="p-name e-content">Hello World</p>
  <img class="u-photo" src="globe.gif" alt="spinning globe animation">
</div>
    {
      "type": [
        "h-entry"
      ],
      "properties": {
        "name": [
          "Hello World"
        ],
        "photo": [
          {
            "value": "http://example.com/globe.gif",
            "alt": "spinning globe animation"
          }
        ],
        "content": [
          {
            "html": "Hello World",
            "value": "Hello World"
          }
        ]
      }
    }
@kartikprabhu

This comment has been minimized.

Show comment
Hide comment
@kartikprabhu

kartikprabhu Sep 6, 2016

Member

If there is no non-empty alt attribute should the original parsed format be used?

Secondly, does this not in some way conflict with the use of "value" in e-* type parsing where value is a plaintext representation and html is the actual representation?

Member

kartikprabhu commented Sep 6, 2016

If there is no non-empty alt attribute should the original parsed format be used?

Secondly, does this not in some way conflict with the use of "value" in e-* type parsing where value is a plaintext representation and html is the actual representation?

@tantek

This comment has been minimized.

Show comment
Hide comment
@tantek

tantek Sep 15, 2016

Member

@kartikprabhu wrote:

If there is no non-empty alt attribute

Then existing behavior.

does this not in some way conflict with the use of "value" in e-* type parsing where value is a plaintext representation and html is the actual representation?

I don't see what you are talking about. Can you provide a code example that demonstrates this conflict?

Member

tantek commented Sep 15, 2016

@kartikprabhu wrote:

If there is no non-empty alt attribute

Then existing behavior.

does this not in some way conflict with the use of "value" in e-* type parsing where value is a plaintext representation and html is the actual representation?

I don't see what you are talking about. Can you provide a code example that demonstrates this conflict?

@kartikprabhu

This comment has been minimized.

Show comment
Hide comment
@kartikprabhu

kartikprabhu Sep 16, 2016

Member

@tantek Consider the following example

<div class="h-entry">
  <p class="p-name e-content"><span>Hello World</span></p>
  <img class="u-photo" src="globe.gif" alt="spinning globe animation">
</div>

which under the new rules would give the parsed mf2 as

{
      "type": [
        "h-entry"
      ],
      "properties": {
        "name": [
          "Hello World"
        ],
        "photo": [
          {
            "value": "http://example.com/globe.gif",
            "alt": "spinning globe animation"
          }
        ],
        "content": [
          {
            "html": "<span>Hello World</span>",
            "value": "Hello World"
          }
        ]
      }
    }

from the above one can see that for e-content the plain-text alternative is in the value but for u-photo value is not the plain-text alternative but is the URL while the alt attribute gives the plain-text.

Member

kartikprabhu commented Sep 16, 2016

@tantek Consider the following example

<div class="h-entry">
  <p class="p-name e-content"><span>Hello World</span></p>
  <img class="u-photo" src="globe.gif" alt="spinning globe animation">
</div>

which under the new rules would give the parsed mf2 as

{
      "type": [
        "h-entry"
      ],
      "properties": {
        "name": [
          "Hello World"
        ],
        "photo": [
          {
            "value": "http://example.com/globe.gif",
            "alt": "spinning globe animation"
          }
        ],
        "content": [
          {
            "html": "<span>Hello World</span>",
            "value": "Hello World"
          }
        ]
      }
    }

from the above one can see that for e-content the plain-text alternative is in the value but for u-photo value is not the plain-text alternative but is the URL while the alt attribute gives the plain-text.

@aaronpk

This comment has been minimized.

Show comment
Hide comment
@aaronpk

aaronpk Sep 16, 2016

Member

I remember @notenoughneon built a system that uses HTML files with Microformats as a data store: PURR I'd love to get her feedback on whether this new data structure would cause any problems with that model.

Member

aaronpk commented Sep 16, 2016

I remember @notenoughneon built a system that uses HTML files with Microformats as a data store: PURR I'd love to get her feedback on whether this new data structure would cause any problems with that model.

@gRegorLove

This comment has been minimized.

Show comment
Hide comment
@gRegorLove

gRegorLove Sep 16, 2016

Member

That's interesting, @kartikprabhu. I had not really thought of it as an alternative, but more of a default. For content I think the default makes sense as plaintext. For photo I think the default makes sense as a URL. Consumers can then delve into properties like alt if they want more information.

Member

gRegorLove commented Sep 16, 2016

That's interesting, @kartikprabhu. I had not really thought of it as an alternative, but more of a default. For content I think the default makes sense as plaintext. For photo I think the default makes sense as a URL. Consumers can then delve into properties like alt if they want more information.

@aaronpk

This comment has been minimized.

Show comment
Hide comment
@aaronpk

aaronpk Sep 16, 2016

Member

My understanding of the parsing rules was that value is supposed to be what the property would have been if it were not an object. So for content, p-content results in a plaintext value, but e-content turns it into an object where value is the plaintext and html is the special parsed version. It follows that for images, typically u-photo results in the single string value, and if there is alt text, value holds that plain string.

Basically as a consumer, you can always use the value in value as a fallback if you don't understand the object as a whole.

Member

aaronpk commented Sep 16, 2016

My understanding of the parsing rules was that value is supposed to be what the property would have been if it were not an object. So for content, p-content results in a plaintext value, but e-content turns it into an object where value is the plaintext and html is the special parsed version. It follows that for images, typically u-photo results in the single string value, and if there is alt text, value holds that plain string.

Basically as a consumer, you can always use the value in value as a fallback if you don't understand the object as a whole.

@kartikprabhu

This comment has been minimized.

Show comment
Hide comment
@kartikprabhu

kartikprabhu Sep 16, 2016

Member

@gRegorLove @aaronpk good points. I guess I was thinking of value in a different way. If @aaronpk 's interpretation of value is documented somewhere then my objection is resolved.

Member

kartikprabhu commented Sep 16, 2016

@gRegorLove @aaronpk good points. I guess I was thinking of value in a different way. If @aaronpk 's interpretation of value is documented somewhere then my objection is resolved.

@tantek

This comment has been minimized.

Show comment
Hide comment
@tantek

tantek Mar 7, 2017

Member

It sounds like we have a fairly good consensus around a particular proposal, and any apparent conflicts have been explained or resolved. Would someone like to take a crack at suggested minimal spec edits to implement the proposal?

Member

tantek commented Mar 7, 2017

It sounds like we have a fairly good consensus around a particular proposal, and any apparent conflicts have been explained or resolved. Would someone like to take a crack at suggested minimal spec edits to implement the proposal?

@tantek

This comment has been minimized.

Show comment
Hide comment
@tantek

tantek Mar 7, 2017

Member

Re: @voxpelli point / question / counterproposal for "fallback", this isn't about "fallback" this is about capturing what the author authored, specifically on the element with the microformats property name being parsed.

re: audio & video - they don't do content based fallback, their contents are only for older browsers that have no support for those elements at all.

re: object - it's a different case entirely since its contents allow rich markup. if you want an object's contents, can already get them with an "e-*" property on the object.

if there are others with specific use-cases, we can address them as necessary.

Member

tantek commented Mar 7, 2017

Re: @voxpelli point / question / counterproposal for "fallback", this isn't about "fallback" this is about capturing what the author authored, specifically on the element with the microformats property name being parsed.

re: audio & video - they don't do content based fallback, their contents are only for older browsers that have no support for those elements at all.

re: object - it's a different case entirely since its contents allow rich markup. if you want an object's contents, can already get them with an "e-*" property on the object.

if there are others with specific use-cases, we can address them as necessary.

@voxpelli

This comment has been minimized.

Show comment
Hide comment
@voxpelli

voxpelli Mar 7, 2017

@tantek I'm not really against the solution, it was after all what I proposed initially.

The discussion I referenced above, but failed to link, was this one: https://chat.indieweb.org/microformats/2016-07-12#t1468345415448000

After there having "considered the difference" I concluded that the difference between lang and alt is that lang is a global attribute while alt is the img-specific implementation of fallback content – "content that is to be used when the external resource cannot be used".

It specifically says the following in that spec about alt on img:

the value of the alt attribute provides equivalent content for those who cannot process images or who have image loading disabled (i.e. it is the img element's fallback content)

So fallback content is still about what the author has authored – if the author has given specific fallback content then that fallback content should be forwarded – we are talking about the same thing..

In practice it probably makes sense to use alt as the name.

I still do wonder though why it wouldn't work to just say that a u-* that has specified fallback content should include that fallback content as an alt? So that the following two should result in the same parsed result:

<img class="u-photo" src="foo.svg" alt="A pink flower" />
<object class="u-photo" data="foo.svg">A pink flower</object>

And actually even this:

<object class="u-photo" data="foo.svg"><img src="foo.png" alt="A pink flower" /></object>

Don't they all convey the very same thing from the perspective of HTML?

voxpelli commented Mar 7, 2017

@tantek I'm not really against the solution, it was after all what I proposed initially.

The discussion I referenced above, but failed to link, was this one: https://chat.indieweb.org/microformats/2016-07-12#t1468345415448000

After there having "considered the difference" I concluded that the difference between lang and alt is that lang is a global attribute while alt is the img-specific implementation of fallback content – "content that is to be used when the external resource cannot be used".

It specifically says the following in that spec about alt on img:

the value of the alt attribute provides equivalent content for those who cannot process images or who have image loading disabled (i.e. it is the img element's fallback content)

So fallback content is still about what the author has authored – if the author has given specific fallback content then that fallback content should be forwarded – we are talking about the same thing..

In practice it probably makes sense to use alt as the name.

I still do wonder though why it wouldn't work to just say that a u-* that has specified fallback content should include that fallback content as an alt? So that the following two should result in the same parsed result:

<img class="u-photo" src="foo.svg" alt="A pink flower" />
<object class="u-photo" data="foo.svg">A pink flower</object>

And actually even this:

<object class="u-photo" data="foo.svg"><img src="foo.png" alt="A pink flower" /></object>

Don't they all convey the very same thing from the perspective of HTML?

@tantek

This comment has been minimized.

Show comment
Hide comment
@tantek

tantek Mar 7, 2017

Member

It makes sense to use "alt" as the name because it's a 1:1 mapping of the value of the alt attribute.

<object class="u-photo" data="foo.svg">A pink flower</object>

Is an artificial example, not real world, you would just use an img.

<object class="u-photo" data="foo.svg"><img src="foo.png" alt="A pink flower" /></object>

Would be properly marked up by putting u-photo on both photos provided:

<object class="u-photo" data="foo.svg"><img class="u-photo" src="foo.png" alt="A pink flower" /></object>

which would then provide the alt for the second photo.

Member

tantek commented Mar 7, 2017

It makes sense to use "alt" as the name because it's a 1:1 mapping of the value of the alt attribute.

<object class="u-photo" data="foo.svg">A pink flower</object>

Is an artificial example, not real world, you would just use an img.

<object class="u-photo" data="foo.svg"><img src="foo.png" alt="A pink flower" /></object>

Would be properly marked up by putting u-photo on both photos provided:

<object class="u-photo" data="foo.svg"><img class="u-photo" src="foo.png" alt="A pink flower" /></object>

which would then provide the alt for the second photo.

@voxpelli

This comment has been minimized.

Show comment
Hide comment
@voxpelli

voxpelli Mar 7, 2017

I'm okay with just doing the img alt parsing as it makes for a simpler mf2 parsing spec. I still don't fully understand the criticism in regards to the alt text not being fallback content, but let's leave that.

(The object tag linking to an SVG is not an artificial example but one usually brought up as one of the major ways to include SVG. See eg: https://css-tricks.com/using-svg/#article-header-id-11)

voxpelli commented Mar 7, 2017

I'm okay with just doing the img alt parsing as it makes for a simpler mf2 parsing spec. I still don't fully understand the criticism in regards to the alt text not being fallback content, but let's leave that.

(The object tag linking to an SVG is not an artificial example but one usually brought up as one of the major ways to include SVG. See eg: https://css-tricks.com/using-svg/#article-header-id-11)

@snarfed

This comment has been minimized.

Show comment
Hide comment
@snarfed

snarfed Jul 10, 2017

Member

i currently have a use case, snarfed/bridgy#756, that's blocked on this. the composite object "photo": [{"value": ..., "alt": ...}] approach works for me!

Member

snarfed commented Jul 10, 2017

i currently have a use case, snarfed/bridgy#756, that's blocked on this. the composite object "photo": [{"value": ..., "alt": ...}] approach works for me!

@gRegorLove

This comment has been minimized.

Show comment
Hide comment
@gRegorLove

gRegorLove Mar 11, 2018

Member

Would someone like to take a crack at suggested minimal spec edits to implement the proposal?

On http://microformats.org/wiki/index.php?title=microformats2-parsing&oldid=66695#parsing_a_u-_property

Replace:

  • else if img.u-x[src] or audio.u-x[src] or video.u-x[src] or source.u-x[src], then get the src attribute

With:

  • else if img.u-x[src][alt]:not([alt=""])
    • return a dictionary with two keys:
      • value: the src attribute of the img
      • alt: the alt attribute of the img
  • else if img.u-x[src] or audio.u-x[src] or video.u-x[src] or source.u-x[src], then get the src attribute
Member

gRegorLove commented Mar 11, 2018

Would someone like to take a crack at suggested minimal spec edits to implement the proposal?

On http://microformats.org/wiki/index.php?title=microformats2-parsing&oldid=66695#parsing_a_u-_property

Replace:

  • else if img.u-x[src] or audio.u-x[src] or video.u-x[src] or source.u-x[src], then get the src attribute

With:

  • else if img.u-x[src][alt]:not([alt=""])
    • return a dictionary with two keys:
      • value: the src attribute of the img
      • alt: the alt attribute of the img
  • else if img.u-x[src] or audio.u-x[src] or video.u-x[src] or source.u-x[src], then get the src attribute
@kartikprabhu

This comment has been minimized.

Show comment
Hide comment
@kartikprabhu

kartikprabhu Mar 11, 2018

Member

Absence of [alt] is different from [alt=""] https://developer.mozilla.org/en-US/docs/Web/HTML/Element/img#Omitting%20this%20attribute (fragmentioned URL)

So I suggest the following modification to @gRegorLove 's suggestion

  • else if img.u-x[src][alt]
    • return a dictionary with two keys:
      • value: the src attribute of the img
      • alt: the alt attribute of the img
  • else if img.u-x[src] or audio.u-x[src] or video.u-x[src] or source.u-x[src], then get the src attribute
Member

kartikprabhu commented Mar 11, 2018

Absence of [alt] is different from [alt=""] https://developer.mozilla.org/en-US/docs/Web/HTML/Element/img#Omitting%20this%20attribute (fragmentioned URL)

So I suggest the following modification to @gRegorLove 's suggestion

  • else if img.u-x[src][alt]
    • return a dictionary with two keys:
      • value: the src attribute of the img
      • alt: the alt attribute of the img
  • else if img.u-x[src] or audio.u-x[src] or video.u-x[src] or source.u-x[src], then get the src attribute
@gRegorLove

This comment has been minimized.

Show comment
Hide comment
@gRegorLove

gRegorLove Mar 11, 2018

Member

LGTM. Think my only addition now is to ensure the src attribute in the dictionary gets normalized to an absolute URL:

  • value: get the src attribute of the img and use the normalized absolute URL of it, following the containing document's language's rules for resolving relative URLs (e.g. in HTML, use the current URL context as determined by the page, and first <base> element, if any).
Member

gRegorLove commented Mar 11, 2018

LGTM. Think my only addition now is to ensure the src attribute in the dictionary gets normalized to an absolute URL:

  • value: get the src attribute of the img and use the normalized absolute URL of it, following the containing document's language's rules for resolving relative URLs (e.g. in HTML, use the current URL context as determined by the page, and first <base> element, if any).
@kartikprabhu

This comment has been minimized.

Show comment
Hide comment
@kartikprabhu

kartikprabhu May 27, 2018

Member

as proof of concept, this has been implemented in experimental version of mf2py for explicit u-photo parsing.

Example 0

<div class="h-entry">
   <p class="p-name e-content"><span>Hello World</span></p>
   <img class="u-photo" src="globe.gif">
</div>

has h-entry.properties.photo as

[
    "globe.gif"

]

Example 1

<div class="h-entry">
   <p class="p-name e-content"><span>Hello World</span></p>
   <img class="u-photo" src="globe.gif" alt="spinning globe animation">
</div>

has h-entry.properties.photo as

[
    {
        "alt": "spinning globe animation",
        "value": "globe.gif"
    }
]

Example 2

<div class="h-entry">
   <p class="p-name e-content"><span>Hello World</span></p>
   <img class="u-photo" src="globe.gif" alt="">
</div>

has h-entry.properties.photo as

[
    {
        "alt": "",
        "value": "globe.gif"
    }
]
Member

kartikprabhu commented May 27, 2018

as proof of concept, this has been implemented in experimental version of mf2py for explicit u-photo parsing.

Example 0

<div class="h-entry">
   <p class="p-name e-content"><span>Hello World</span></p>
   <img class="u-photo" src="globe.gif">
</div>

has h-entry.properties.photo as

[
    "globe.gif"

]

Example 1

<div class="h-entry">
   <p class="p-name e-content"><span>Hello World</span></p>
   <img class="u-photo" src="globe.gif" alt="spinning globe animation">
</div>

has h-entry.properties.photo as

[
    {
        "alt": "spinning globe animation",
        "value": "globe.gif"
    }
]

Example 2

<div class="h-entry">
   <p class="p-name e-content"><span>Hello World</span></p>
   <img class="u-photo" src="globe.gif" alt="">
</div>

has h-entry.properties.photo as

[
    {
        "alt": "",
        "value": "globe.gif"
    }
]
@Zegnat

This comment has been minimized.

Show comment
Hide comment
@Zegnat

Zegnat May 27, 2018

Member

as proof of concept, this has been implemented in experimental version of mf2py for explicit u-photo parsing.

Is there a specific reason why this change shouldn’t also be applied to implied photos? Haven’t seen this mentioned in the discussion yet, but if a spec edit is coming up, this might be worth addressing? It wasn’t too long ago implied properties were updated to better match the parsing algo of their explicit counterparts.

Member

Zegnat commented May 27, 2018

as proof of concept, this has been implemented in experimental version of mf2py for explicit u-photo parsing.

Is there a specific reason why this change shouldn’t also be applied to implied photos? Haven’t seen this mentioned in the discussion yet, but if a spec edit is coming up, this might be worth addressing? It wasn’t too long ago implied properties were updated to better match the parsing algo of their explicit counterparts.

@kartikprabhu

This comment has been minimized.

Show comment
Hide comment
@kartikprabhu

kartikprabhu May 27, 2018

Member

@Zegnat I don't see any reason not to apply this to implied photo too. However, this was not discussed so Ieft it out.

Also, currently it is only for a u-photo and not any u-*. Can easily update once we reach some sort of consensus.

Member

kartikprabhu commented May 27, 2018

@Zegnat I don't see any reason not to apply this to implied photo too. However, this was not discussed so Ieft it out.

Also, currently it is only for a u-photo and not any u-*. Can easily update once we reach some sort of consensus.

@Zegnat

This comment has been minimized.

Show comment
Hide comment
@Zegnat

Zegnat May 27, 2018

Member

I don’t see any reason not to apply this to implied photo too.

Neither do I, but I didn’t want to assume as I haven’t been part of the conversation.

I think we should try not to introduce too many differences between implied an explicit properties, by which I mean that if I add the u-photo class explicitly I should not see my output change if it was previously picked up as an implied photo. If the u- parsing step for images gets changed, I would like to see the exact same change mirrored for implied photo.

Member

Zegnat commented May 27, 2018

I don’t see any reason not to apply this to implied photo too.

Neither do I, but I didn’t want to assume as I haven’t been part of the conversation.

I think we should try not to introduce too many differences between implied an explicit properties, by which I mean that if I add the u-photo class explicitly I should not see my output change if it was previously picked up as an implied photo. If the u- parsing step for images gets changed, I would like to see the exact same change mirrored for implied photo.

@tantek

This comment has been minimized.

Show comment
Hide comment
@tantek

tantek May 27, 2018

Member

I would disagree with applying this only to explicit u-photo, I think that would result in a surprise to web authors. The simpler model is to handle "alt" for u-photo regardless of whether it is implicit or explicit.

In addition, why shouldn’t it apply to any use of u-* with an img?

E.g. "u-featured" on an img should also pick up any alt attribute.

In short, I’d rather NOT go through multiple proposal/consensus/prototype/changes to get "alt" to work properly. I’d rather we figure out how "alt" should work and change the parsing spec once to handle it.

Note the issue name "image alt text is lost during parsing" is not specific to u-photo. Let’s fix this for any use of any image (img) tags in the parsing spec.

(Originally published at: http://tantek.com/2018/147/t1/)

Member

tantek commented May 27, 2018

I would disagree with applying this only to explicit u-photo, I think that would result in a surprise to web authors. The simpler model is to handle "alt" for u-photo regardless of whether it is implicit or explicit.

In addition, why shouldn’t it apply to any use of u-* with an img?

E.g. "u-featured" on an img should also pick up any alt attribute.

In short, I’d rather NOT go through multiple proposal/consensus/prototype/changes to get "alt" to work properly. I’d rather we figure out how "alt" should work and change the parsing spec once to handle it.

Note the issue name "image alt text is lost during parsing" is not specific to u-photo. Let’s fix this for any use of any image (img) tags in the parsing spec.

(Originally published at: http://tantek.com/2018/147/t1/)

@kartikprabhu

This comment has been minimized.

Show comment
Hide comment
@kartikprabhu

kartikprabhu May 28, 2018

Member

Here are the proposed changes to the spec to account for alt attribute.

Add a new section 1.5 with title "parse an img element for src and alt" with the steps

  • if img[alt]
    • return a new {} structure with
      • value: the src attribute of the img as a normalized absolute URL, following the containing document's language's rules for resolving relative URLs (e.g. in HTML, use the current URL context as determined by the page, and first <base> element, if any).
      • alt: the alt attribute of the img
  • else
    • return the src attribute as a normalized absolute URL, following the containing document's language's rules for resolving relative URLs (e.g. in HTML, use the current URL context as determined by the page, and first <base> element, if any).`

in http://microformats.org/wiki/microformats2-parsing#parsing_a_u-_property break the second step into the following

  • if img.u-x[src] return the result of "parse an img element for src and alt" (see Sec.1.5)
  • else if audio.u-x[src] or video.u-x[src] or source.u-x[src], then get the src attribute

in http://microformats.org/wiki/microformats2-parsing#parsing_for_implied_properties for implied photo change
the step 1 to

  • if img.h-x[src], then use the result of "parse an img element for src and alt" (see Sec.1.5)

step 3 to

  • else if .h-x>img[src]:only-of-type:not[.h-*] then use the result of "parse an img element for src and alt" (see Sec.1.5) for that img

step 5 to

  • else if .h-x>:only-child:not[.h-*]>img[src]:only-of-type:not[.h-*], then use the result of "parse an img element for src and alt" (see Sec.1.5) for that img
Member

kartikprabhu commented May 28, 2018

Here are the proposed changes to the spec to account for alt attribute.

Add a new section 1.5 with title "parse an img element for src and alt" with the steps

  • if img[alt]
    • return a new {} structure with
      • value: the src attribute of the img as a normalized absolute URL, following the containing document's language's rules for resolving relative URLs (e.g. in HTML, use the current URL context as determined by the page, and first <base> element, if any).
      • alt: the alt attribute of the img
  • else
    • return the src attribute as a normalized absolute URL, following the containing document's language's rules for resolving relative URLs (e.g. in HTML, use the current URL context as determined by the page, and first <base> element, if any).`

in http://microformats.org/wiki/microformats2-parsing#parsing_a_u-_property break the second step into the following

  • if img.u-x[src] return the result of "parse an img element for src and alt" (see Sec.1.5)
  • else if audio.u-x[src] or video.u-x[src] or source.u-x[src], then get the src attribute

in http://microformats.org/wiki/microformats2-parsing#parsing_for_implied_properties for implied photo change
the step 1 to

  • if img.h-x[src], then use the result of "parse an img element for src and alt" (see Sec.1.5)

step 3 to

  • else if .h-x>img[src]:only-of-type:not[.h-*] then use the result of "parse an img element for src and alt" (see Sec.1.5) for that img

step 5 to

  • else if .h-x>:only-child:not[.h-*]>img[src]:only-of-type:not[.h-*], then use the result of "parse an img element for src and alt" (see Sec.1.5) for that img
@kartikprabhu

This comment has been minimized.

Show comment
Hide comment
@kartikprabhu

kartikprabhu May 29, 2018

Member

experimental mf2py now implements the above algorithm under the flag img_with_alt. Feel free to try it out at https://kartikprabhu.com/connection/mfparser

cc: @snarfed

Member

kartikprabhu commented May 29, 2018

experimental mf2py now implements the above algorithm under the flag img_with_alt. Feel free to try it out at https://kartikprabhu.com/connection/mfparser

cc: @snarfed

@snarfed

This comment has been minimized.

Show comment
Hide comment
@snarfed

snarfed May 30, 2018

Member

woo, can't wait to try it!

Member

snarfed commented May 30, 2018

woo, can't wait to try it!

@sknebel

This comment has been minimized.

Show comment
Hide comment
@sknebel

sknebel Sep 17, 2018

Member

This has been in mf2py for a while now, and used by granary/bridgy. @snarfed, any feedback on it?

For reference, here's the granary diff:
snarfed/granary@05a7818

I noticed the need for the type check in snarfed/granary@05a7818#diff-7c6b8da7f499d633036e0bcdd9819a95R445 since only images with alt get a nested structure - would consuming be easier if all images were in an object?

Member

sknebel commented Sep 17, 2018

This has been in mf2py for a while now, and used by granary/bridgy. @snarfed, any feedback on it?

For reference, here's the granary diff:
snarfed/granary@05a7818

I noticed the need for the type check in snarfed/granary@05a7818#diff-7c6b8da7f499d633036e0bcdd9819a95R445 since only images with alt get a nested structure - would consuming be easier if all images were in an object?

@snarfed

This comment has been minimized.

Show comment
Hide comment
@snarfed

snarfed Sep 17, 2018

Member

thanks for the nudge @sknebel! yup, granary and bridgy are using this feature happily. details in snarfed/bridgy#756. here's a recent example of a bridgy publish to twitter with alt text:

fine by me to close this issue if you all want!

Member

snarfed commented Sep 17, 2018

thanks for the nudge @sknebel! yup, granary and bridgy are using this feature happily. details in snarfed/bridgy#756. here's a recent example of a bridgy publish to twitter with alt text:

fine by me to close this issue if you all want!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment