Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: drop img src (and alt) from implied p-name #35

Closed
snarfed opened this issue May 31, 2018 · 10 comments
Closed

proposal: drop img src (and alt) from implied p-name #35

snarfed opened this issue May 31, 2018 · 10 comments

Comments

@snarfed
Copy link
Member

snarfed commented May 31, 2018

right now, img src and alt values often get included in implied p-names (spec; added in #17).

names are usually expected to be relatively short and human readable, but image URLs are often long, not very human-readable, and add little to no value. aaronpk/XRay#53 is related and has details. also, as a downstream user myself, this recently surprised me when it was added to mf2py 1.1.0, and is undesirable enough that i can't realistically upgrade bridgy or granary to it with this behavior.

alt text is usually human readable, but often fairly disconnected and out of context from the rest of the implied name, which results in similarly poor quality text. #16 is related and has details.

a brief straw poll of #microformats this morning found many people who agree, and no one who wanted to keep src URLs or alt values in implied p-name. should we consider dropping that part of the spec?

as an example, this html:

<p class="h-card">
  My Name
  <img src="http://xyz" />
</p>

currently gets parsed to:

{
  "type": ["h-card"],
  "properties": {
    "name": ["My Name\n   http://xyz"],
    "photo": ["http://xyz"],
  }
}
@sknebel
Copy link
Member

sknebel commented May 31, 2018

Agreed. I believe it was a mistake on my part to mention it in #17 (presumably I just looked at the name parsing), and nobody caught that this expanded the scope of the issue.

@Zegnat
Copy link
Member

Zegnat commented May 31, 2018

To summarise the previous change: we normalised text parsing to use the exact same wording for p-*, e-*’s value, and implied name. Originally to make sure <script> and <style> tags would get correctly dropped everywhere, but in the end also included mf2’s <img> replacement step. Parsers interpret @snarfed’s example HTML the same when it does and does not include a p-name class wrapping the whole thing:

<p class="h-card"><span class="p-name">
  My Name
  <img src="http://xyz" />
</span></p>

I previously thought this was fine and would limit the amount of “surprises” for people when expanding from implied to explicit parsing, etc. But seeing how there now seem to be multiple objects in the wild that introduce breakage specifically on implied names, it will probably correct to change this.

As @snarfed said, it doesn’t look like anyone is against this. I would propose one of the following spec changes:

  1. Remove the line that says to replace <img> elements completely from the textContent sub-step on implied names.

  2. Change the line to still allow replacing it with the matching alt attribute, but never its URL. Leaving only:

    replacing any nested <img> elements with their alt attribute, if present;

@Zegnat
Copy link
Member

Zegnat commented May 31, 2018

So I wasn’t convinced about dropping alt completely. Also to keep support for something like:

<a class="h-card" href="/">
  <img src="photo.jpg" alt="Zegnat">
</a>

But it turns out that the alt will always be used if the <img> is the only content of an element. This leaves only cases where part of the name is expressed by an image while another part is expressed by text. And I am not seeing the use-case for that.

Thanks @sknebel for walking my thoughts through this. 10 minutes of IRC chat log, starting here. I am now also fully on board for dropping the line about replacing <img> completely.

@snarfed
Copy link
Member Author

snarfed commented May 31, 2018

there now seem to be multiple objects in the wild that introduce breakage specifically on implied names

to clarify, i'm against including src (and probably alt) in both explicit and implicit p-name. i don't want and couldn't use an mf2 parser that includes the URL in the parsed name for your example HTML with explicit p-name either.

if that means i should change the issue title, or otherwise changes the request, let me know. happy to discuss more!

@Zegnat
Copy link
Member

Zegnat commented May 31, 2018

to clarify, i'm against including src (and probably alt) in both explicit and implicit p-name.

For some context, reversing <img> replacement in implied names changes a spec addition made earlier this year and only removes a part of it that could be labelled accidental. I have no issues with doing that right now.

Dropping <img> from all explicit p-* parsing reverses a May 2013 spec revision. I am really surprised that this behaviour wasn’t already fully implemented in all parsers.

For explicit mark-up, I agree with @sknebel on IRC, spin that to another separate discussion. To quote:

  • the bigger discussion, check how that relates to aaronpks issue and/or make a new one

  • because that changes a lot

@sknebel
Copy link
Member

sknebel commented Jul 4, 2018

Apparently this was discussed at the IndieWebSummit. Could someone please look up the conclusion and document it here?

(@Zegnat has volunteered to listen to the session recording and do this)

@gRegorLove
Copy link
Member

We agreed to @Zegnat's proposal from May 31: #35 (comment)

2+ implementor votes and mf2py implements this now, so I think we're good to update the spec.

https://youtu.be/0OUy7jjUN5k?t=2h3m35s

@sknebel
Copy link
Member

sknebel commented Jul 6, 2018

Martijn clearly proposed two solutions, and from the change in mf2py the second one, which only partially reverts the change @snarfed has an issue with, got picked. Why that and not remove the entire line? Did people find a reason why keeping the alt case was important?

I'm happy to accept it if it is consensus, but I'd like to understand why this one was picked, and in the youtube video it seems only to be said "Accept Martijn's proposal", not which one.

@gRegorLove
Copy link
Member

gRegorLove commented Jul 6, 2018

Oops, I should have linked the video earlier. I've transcribed some of it below for easier reference. We discussed and agreed on the second proposal.

Topic starts https://youtu.be/0OUy7jjUN5k?t=2h1m20s, then at 2:02:23:

Tantek: so we're just going to use the alt attribute then and not the source
Ryan: for implied
Tantek: only for implied?
Ryan: Martijn's proposal was only for implied
[Brief tangential discussion about implied p-name]
Tantek: I think that proposed resolution is fine. It's a conservative change, so if we want to open a separate issue for getting rid of src in general, but let's solve the particular implied problem immediately

So I believe the agreed update is:

Under implied p-name parsing rules, replace:

replacing any nested <img> elements with their alt attribute, if present; otherwise their src attribute, if present, adding a space at the beginning and end, resolving the URL if it’s relative;

With:

replacing any nested <img> elements with their alt attribute, if present;

@gRegorLove
Copy link
Member

Updated in spec with this revision: http://microformats.org/wiki/index.php?title=microformats2-parsing&oldid=66871

This issue can be closed now.

@snarfed snarfed closed this as completed Aug 8, 2018
snarfed added a commit to snarfed/bridgy-fed that referenced this issue Oct 14, 2018
problem is that explicit p-name includes `<img>` `src` value, but implicit doesn't. background in microformats/microformats2-parsing#35 etc.

```
======================================================================
FAIL: test_activitypub_create_author_only_url (tests.test_webmention.WebmentionTest)
Mf2 author property is just a URL. We should run full authorship.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/ryan/src/bridgy-fed/local/lib/python2.7/site-packages/mock/mock.py", line 1305, in patched
    return func(*args, **keywargs)
  File "tests/test_webmention.py", line 383, in test_activitypub_create_author_only_url
    self.assert_equals(repost_as2, kwargs['json'])
  File "/Users/ryan/src/bridgy-fed/local/lib/python2.7/site-packages/oauth_dropins/webutil/testutil.py", line 182, in assert_equals
    %s""" % (msg, ''.join(e.args), expected, actual))
AssertionError: None: [actor] [name] u'Ms. \u2615 Baz' != u'Ms. \u2615 Baz  http://orig/pic'
- Ms. \u2615 Baz
+ Ms. \u2615 Baz  http://orig/pic

Expected value:
{u'@context': u'https://www.w3.org/ns/activitystreams',
 u'actor': {u'icon': ({u'type': u'Image', u'url': u'http://orig/pic'},),
            u'id': u'http://localhost/orig',
            u'image': ({u'type': u'Image', u'url': u'http://orig/pic'},),
            u'name': u'Ms. \u2615 Baz',
            u'preferredUsername': u'orig',
            u'type': u'Person',
            u'url': u'http://orig'},
 u'cc': [u'https://www.w3.org/ns/activitystreams#Public',
         u'http://orig/author',
         u'http://orig/recipient',
         u'http://orig/bystander'],
 u'id': u'http://a/repost',
 u'name': u'reposted!',
 u'object': u'tag:orig,2017:as2',
 u'type': u'Announce',
 u'url': u'http://a/repost'}
Actual value:
{u'@context': u'https://www.w3.org/ns/activitystreams',
 u'actor': {u'icon': [{u'type': u'Image', u'url': u'http://orig/pic'}],
            u'id': u'http://localhost/orig',
            u'image': [{u'type': u'Image', u'url': u'http://orig/pic'}],
            u'name': u'Ms. \u2615 Baz  http://orig/pic',
            u'preferredUsername': u'orig',
            u'type': u'Person',
            u'url': u'http://orig'},
 u'cc': [u'https://www.w3.org/ns/activitystreams#Public',
         u'http://orig/author',
         u'http://orig/recipient',
         u'http://orig/bystander'],
 u'id': u'http://a/repost',
 u'name': u'reposted!',
 u'object': u'tag:orig,2017:as2',
 u'type': u'Announce',
 u'url': u'http://a/repost'}
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants