Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bridgy Publish to Twitter mishandles whitespace cr by removing it #530

Closed
tantek opened this issue Oct 28, 2015 · 9 comments
Closed

Bridgy Publish to Twitter mishandles whitespace cr by removing it #530

tantek opened this issue Oct 28, 2015 · 9 comments
Assignees
Labels

Comments

@tantek
Copy link
Contributor

tantek commented Oct 28, 2015

https://twitter.com/t/status/659200761427980288 was published by Bridgy Publish from http://tantek.com/2015/300/t1/social-web-session-w3c-tpac2015 and as you can see from the first hyperlink on the tweet:

Note the "/Social Web" - that should be "/ Social Web", and there's a return (cr) in the original that somehow Bridgy Publish removed, and thus errantly appended "Social" to the end of that IG URL

Bridgy Publish should be respecting whitespace by collapsing/keeping it as a single space at minimum, but ideally it should be preserving and passing along ALL whitespace from the original, since Twitter actually supports whitespace formatting!

https://indiewebcamp.com/note#Whitespace

@snarfed
Copy link
Owner

snarfed commented Oct 28, 2015

👍 thanks for the report!

@tantek
Copy link
Contributor Author

tantek commented Oct 28, 2015

Note that the Bridgy Publish POSSE copy to Facebook does not have these problems, and more to the point does preserve the line breaks!
https://www.facebook.com/photo.php?fbid=10101948228396473
Thus hopefully Bridgy Publish to Twitter can "just" be fixed to be like Bridgy Publish to Facebook!

@snarfed
Copy link
Owner

snarfed commented Oct 28, 2015

huh, interesting. that's a good point. i definitely don't understand what's going on then. odd.

@kylewm
Copy link
Contributor

kylewm commented Oct 28, 2015

This looks to be a problem with post type detection. Bridgy thinks this is an article so it generates the publish content from the AS "displayName", which has html tags stripped.

(Unfortunately, this bug was hidden because real linebreaks are preserved, so if the original content had linebreaks and <br>'s, then it would appear to work properly)

    {
      "object": {
        "updated": "2015-10-27T19:48:00-0700", 
        "displayName": "https://instagram.com/p/9XVBIRA9cj/Social Web session @W3C #TPAC2015 in Sapporo, Hokkaido, Japan.", 
        "author": {
          "url": "http://tantek.com/", 
          "image": {
            "url": "http://tantek.com/logo.jpg"
          }, 
          "displayName": "Tantek \u00c7elik", 
          "objectType": "person"
        }, 
        "url": "http://tantek.com/2015/300/t1/social-web-session-w3c-tpac2015", 
        "image": {
          "url": "https://igcdn-photos-b-a.akamaihd.net/hphotos-ak-xaf1/t51.2885-15/e35/12145332_1662314194043465_2009449288_n.jpg"
        }, 
        "content": "<a class=\"auto-link figure\" href=\"https://igcdn-photos-b-a.akamaihd.net/hphotos-ak-xaf1/t51.2885-15/e35/12145332_1662314194043465_2009449288_n.jpg\"><img alt=\"a photo\" class=\"auto-embed u-photo\" src=\"https://igcdn-photos-b-a.akamaihd.net/hphotos-ak-xaf1/t51.2885-15/e35/12145332_1662314194043465_2009449288_n.jpg\"/></a> <a class=\"auto-link\" href=\"https://instagram.com/p/9XVBIRA9cj/\">https://instagram.com/p/9XVBIRA9cj/</a><br class=\"auto-break\"/><br class=\"auto-break\"/>Social Web session <a class=\"auto-link h-x-username\" href=\"https://twitter.com/W3C\">@W3C</a> #TPAC2015 in Sapporo, Hokkaido, Japan.", 
        "published": "2015-10-27T19:48:00-0700", 
        "id": "http://tantek.com/2015/300/t1/social-web-session-w3c-tpac2015", 
        "objectType": "article"
      }
    }

@kylewm kylewm self-assigned this Oct 29, 2015
kylewm referenced this issue in snarfed/granary Oct 30, 2015
mostly this is useful for better determination of article vs. note.
h-as-* explicit class types can override the discovered post type
but in general this change makes them less important/necessary
@snarfed snarfed added the now label Nov 2, 2015
@snarfed
Copy link
Owner

snarfed commented Nov 2, 2015

two possible solutions, off the top of my head:

  • switch to mf2util's post type detection
  • when we strip html tags, replace each one with a space, then collapse spaces

thoughts?

@kylewm
Copy link
Contributor

kylewm commented Nov 2, 2015

Oh actually this was fixed in snarfed/granary@fe6addf .. I hadn't closed yet because I still wanted to add a test for this exact post

@snarfed
Copy link
Owner

snarfed commented Nov 2, 2015

thanks @kylewm! confirmed manually too, the fix works in a preview of tantek's post:

screen shot 2015-11-02 at 8 45 28 am

@tantek
Copy link
Contributor Author

tantek commented Nov 7, 2015

Confirmed with live test: https://twitter.com/t/status/663016908623511552 published with Bridgy Publish from http://tantek.com/2015/311/t3/indiewebcamp-mit-the-album

@kylewm
Copy link
Contributor

kylewm commented Nov 9, 2015

👍 thanks t!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants