New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Granary is generating "quotation-of" from content in the tweet that is not a quotation #155

Closed
aaronpk opened this Issue Jul 14, 2018 · 8 comments

Comments

Projects
None yet
3 participants
@aaronpk

aaronpk commented Jul 14, 2018

This tweet is an example: https://twitter.com/oktadev/status/1018179594367782913

I'm guessing it's picking up on the fact that the tweet ends in a URL. But there is no additional information about that URL available, so the information in the quotation-of property ends up being not useful.

I think it should limit to generating the quotation-of property only if there is an actual tweet.

Twitter doesn't include the quoted tweet URL in the parent tweet text anymore either https://twittercommunity.com/t/updating-how-urls-are-rendered-in-the-quote-tweet-payload/105473

Here's the HTML Granary is generating for the tweet above:

<article class="h-entry">
  <span class="p-uid">tag:twitter.com:1018179594367782913</span>
  
  <time class="dt-published" datetime="2018-07-14T17:05:06+00:00">2018-07-14T17:05:06+00:00</time>
  
  <span class="p-author h-card">
    <data class="p-uid" value="tag:twitter.com:oktadev"></data>
<data class="p-numeric-id" value="786323471877836800"></data>
    <a class="p-name u-url" href="http://developer.okta.com">OktaDev</a>
<a class="u-url" href="https://developer.okta.com/blog"></a>
<a class="u-url" href="https://devforum.okta.com"></a>
    <span class="p-nickname">oktadev</span>
    <img class="u-photo" src="https://pbs.twimg.com/profile_images/1006555705082384384/izi1LTo4.jpg" alt="" />
  </span>

  <a class="u-url" href="https://twitter.com/oktadev/status/1018179594367782913">https://twitter.com/oktadev/status/1018179594367782913</a>
  <div class="e-content p-name">
  
  Three Developer Tools I'm Thankful For <a href="https://developer.okta.com/blog/2017/11/22/three-developer-tools-im-thankful-for">developer.okta.com/blog/2017/11/2…</a>
  </div>





<article class="u-quotation-of h-cite">
  <span class="p-uid"></span>
  
  
  

  <a class="p-name u-url" href="https://developer.okta.com/blog/2017/11/22/three-developer-tools-im-thankful-for">developer.okta.com/blog/2017/11/2…</a>
  <div class="">
  
  
  </div>

</article>
@snarfed

This comment has been minimized.

Owner

snarfed commented Jul 14, 2018

makes sense! thanks for filing.

this is actually a bit opaque even on Twitter itself. you're right, only trailing tweet urls become quote tweets, but trailing web urls sometimes become similar quote-like cards if they have card markup...but that occasionally fails too, maybe due to timed out fetches etc.

regardless, this is all academic since granary doesn't fetch the url and generate its own card/preview. will fix.

@snarfed snarfed added the now label Jul 14, 2018

@aaronpk

This comment has been minimized.

aaronpk commented Jul 14, 2018

It's pretty explicit in the API. Twitter doesn't treat the "cards" like they do quote tweets, so you should be able to just look for the quoted_status property and only generate the quotation-of then. Here's how XRay does it.

@snarfed

This comment has been minimized.

Owner

snarfed commented Aug 14, 2018

wow, this has been a rabbit hole. ok. here we go, mostly for my own record...

i'm on board with limiting quotation-of to explicit quote tweets. i'd ideally also like to do something better with trailing URLs that twitter renders as cards. for example, this tweet:

image

...has a trailing URL in its content, https://www.brainstuffshow.com/podcasts/what-is-the-museum-of-broken-relationships.htm. twitter promotes that URL to a card and hides in the rendered content...but afaict there's no way to tell that explicitly from the API object. notably, display_text_range includes the trailing URL.

{
  "id_str" : "1021255310013427712",
  "in_reply_to_status_id_str" : null,
  "is_quote_status" : false,
  "full_text" : "\"For as long as there's been love, there's been heartbreak and pain. But perhaps...it's in the liminal space between love and loss that we find our shared humanity, and discover our capacity for empathy.\"\n- Brain Stuff, \"The Museum of Broken Relationships\" https://t.co/VNm0LYmXJO",
  "display_text_range" : [0, 280],
  "entities" : {
    "urls" : [{
      "url" : "https://t.co/VNm0LYmXJO",
      "indices" : [257, 280],
      "display_url" : "brainstuffshow.com/podcasts/what-…",
      "expanded_url" : "https://www.brainstuffshow.com/podcasts/what-is-the-museum-of-broken-relationships.htm"
    }],
  },
  "..."
}

tweets with trailing URLs that don't become cards, like your example here, https://twitter.com/oktadev/status/1018179594367782913, look the exact same in the API. display_text_range similarly includes the trailing URL.

(i could have sworn the twitter API put trailing card URLs outside display_text_range at some point during the extended tweet (etc) rollout, but i haven't found any record of that, so i'm probably misremembering.)

@snarfed

This comment has been minimized.

Owner

snarfed commented Aug 14, 2018

anyway. i dropped the quotation-of property from non-quote-tweet URLs this morning. feel free to try.

however, @aaronpk i suspect you were hoping to lose the URL citation block at the bottom entirely, and not just that property?

snarfed added a commit that referenced this issue Aug 14, 2018

@aaronpk

This comment has been minimized.

aaronpk commented Aug 14, 2018

not sure what you mean by the URL citation block at the bottom. The way that Monocle renders posts is it includes the quoted post in the UI if quotation-of is present. That ends up incorrectly looking like a QT in cases like this.

screenshot 2018-08-14 11 10 26

@snarfed

This comment has been minimized.

Owner

snarfed commented Aug 14, 2018

discussed in #indieweb-dev. tldr, "this is fine," probably. 😆

@gRegorLove

This comment has been minimized.

gRegorLove commented Aug 17, 2018

I just noticed this, testing out my notes feed. For this post, granary generates:

<p><span class="p-summary">Twitter officially welcomes bigotry now.</span></p>

<blockquote class="u-quotation-of h-cite">
<p class="p-content">“We welcome everyone to express themselves on our service. Sometimes these expressions may be offensive, controversial, and/or bigoted. We prohibit targeted behavior that harasses, threatens, or uses fear to silence others and take action when they violate our policies.”</p>

<p>— <a class="p-author h-card" href="https://safety.twitter.com">@TwitterSafety</a>'s <a class="u-url" href="https://twitter.com/TwitterSafety/status/1026979628475248640">tweet</a></p>
</blockquote>

<p>I think I’m done letting them have my content and my clicks. I have feeds on my homepage for articles and one on my <a href="/notes/">notes page</a> that you can subscribe to in a feed reader. I am thinking about setting up an email newsletter if people want to subscribe and keep up with my posts that way. More details to come.</p>

<blockquote>
<a class="p-name u-url" href="https://safety.twitter.com">@TwitterSafety</a>: “We welcome everyone to express themselves on our service. Sometimes these expressions may be offensive, controversial, and/or bigoted. We prohibit targeted behavior that harasses, threatens, or uses fear to silence others and take action when they violate our policies.”
</blockquote>

It's entirely possible I'm mis-using quotation-of since the post isn't primarily a quotation? Not a big issue for me, but it's another data point.

@snarfed

This comment has been minimized.

Owner

snarfed commented Aug 17, 2018

@gRegorLove thanks! i think that's a bit different. the entire quotation is inside your e-content, which is why it's duplicated: http://mf2.pin13.net/mf2/?url=https%3A%2F%2Fgregorlove.com%2F2018%2F08%2Ftwitter-officially-welcomes-bigotry-now%2F

detecting that kind of duplication is harder than it seems, so granary only currently tries for name, not other properties. feel free to file a(nother) feature request though!

snarfed added a commit that referenced this issue Aug 26, 2018

mf2: if a tag has startIndex/length, don't emit an mf2 child for it
gets rid of the "URL citation block at the bottom" mentioned in #155 (comment)

@snarfed snarfed closed this Aug 26, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment