New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

distinguish POSSE posts vs non-POSSE mentions and handle accordingly #51

Closed
snarfed opened this Issue Jan 31, 2014 · 51 comments

Comments

Projects
None yet
8 participants
@snarfed
Owner

snarfed commented Jan 31, 2014

this would be nice for catching when other people post a link to your post in a silo.

i did this for a while in mid 2012, before bridgy's re-release with webmentions. i stopped because the POSSEd posts showed up as comments on the original posts, and i kept that decision in the re-release because i didn't see enough people using rel-syndication links, which meant i couldn't prevent the same thing happening to them.

on the other hand, we've been thinking more about de-duping and similar issues recently, and @tantek proposed that this kind of noise might help motivate people to make their mention handling smarter. worth a thought.

@snarfed

This comment has been minimized.

Show comment
Hide comment
@snarfed

snarfed Jan 31, 2014

Owner

concretely, these would only differ from current webmentions in that they wouldn't have an in-reply-to, since they truly are "mentions."

Owner

snarfed commented Jan 31, 2014

concretely, these would only differ from current webmentions in that they wouldn't have an in-reply-to, since they truly are "mentions."

@snarfed snarfed changed the title from send webmentions for original POSSE silo posts to send webmentions for posts as well as responses Apr 14, 2014

@snarfed

This comment has been minimized.

Show comment
Hide comment
@snarfed

snarfed Apr 14, 2014

Owner

two possible approaches for distinguishing the original author's POSSEd posts:

  • don't bother. ideally, webmention handlers would detect them and filter them out, or whatever they want. (@tantek advocates this.)
  • omit original silo posts from the author, but not from other people.

both are reasonable, and this would be a good feature. promoting to now.

Owner

snarfed commented Apr 14, 2014

two possible approaches for distinguishing the original author's POSSEd posts:

  • don't bother. ideally, webmention handlers would detect them and filter them out, or whatever they want. (@tantek advocates this.)
  • omit original silo posts from the author, but not from other people.

both are reasonable, and this would be a good feature. promoting to now.

@snarfed

This comment has been minimized.

Show comment
Hide comment
@snarfed

snarfed Aug 26, 2014

Owner

lots of discussion about this on IRC today.

summary: when tweet links to a post, but isn't the official POSSE tweet of that post, responses are backfed and rendered as if they were responses to the original post. two examples. some people like this somewhat (e.g. @snarfed, @kevinmarks, maybe @kylewm); others don't (@aaronpk, @tantek).

it's hard to prevent this. @tantek correctly notes that we can use rel=me to identify the original author, and only treat their tweets as POSSE candidates. that's a good step.

however, the common case is that the original author later links to their post from a different (non-POSSE) tweet. we could use u-syndication and permashortcitations to distinguish that from the original POSSE tweet, but both of those have low adoption rates among bridgy users, so we'd end up muzzling the majority of responses, which i don't want to do.

@kevinmarks suggests that we use time as a heuristic. if the author links to their post over 24h after it's originally posted, don't consider that a POSSE. definitely a good idea!

(i'd re-emphasize that this is all tradeoffs. given real world usage, i don't see a single best answer so far, and leaving the current behavior is on the table. good to hash through options though!)

Owner

snarfed commented Aug 26, 2014

lots of discussion about this on IRC today.

summary: when tweet links to a post, but isn't the official POSSE tweet of that post, responses are backfed and rendered as if they were responses to the original post. two examples. some people like this somewhat (e.g. @snarfed, @kevinmarks, maybe @kylewm); others don't (@aaronpk, @tantek).

it's hard to prevent this. @tantek correctly notes that we can use rel=me to identify the original author, and only treat their tweets as POSSE candidates. that's a good step.

however, the common case is that the original author later links to their post from a different (non-POSSE) tweet. we could use u-syndication and permashortcitations to distinguish that from the original POSSE tweet, but both of those have low adoption rates among bridgy users, so we'd end up muzzling the majority of responses, which i don't want to do.

@kevinmarks suggests that we use time as a heuristic. if the author links to their post over 24h after it's originally posted, don't consider that a POSSE. definitely a good idea!

(i'd re-emphasize that this is all tradeoffs. given real world usage, i don't see a single best answer so far, and leaving the current behavior is on the table. good to hash through options though!)

@snarfed snarfed changed the title from send webmentions for posts as well as responses to send webmentions for (non-POSSE) posts as well as responses Aug 26, 2014

@snarfed snarfed added now and removed now labels Aug 26, 2014

@snarfed snarfed removed the later label Sep 4, 2014

@snarfed

This comment has been minimized.

Show comment
Hide comment
@snarfed

snarfed Jan 8, 2015

Owner

current proposal from @tantek in IRC today: only consider a link to be the original copy if it's on a domain in the user's silo profile. sounds ok to me, we could consider implementing it.

Owner

snarfed commented Jan 8, 2015

current proposal from @tantek in IRC today: only consider a link to be the original copy if it's on a domain in the user's silo profile. sounds ok to me, we could consider implementing it.

@tantek

This comment has been minimized.

Show comment
Hide comment
@kylewm

This comment has been minimized.

Show comment
Hide comment
@kylewm

kylewm Apr 13, 2015

Collaborator

In case it's useful, here's an example where Bridgy is being overly aggressive in assuming a tweet is the POSSE copy of an original.

here's the original: https://adactio.com/journal/8710
here's a tweet from someone else (another bridgy user) linking to the original: https://twitter.com/jgarber/status/587245857034133504

and then a bunch of RT's of that tweet are backfed to the original as if they are RTs of the original. e.g., https://brid-gy.appspot.com/repost/twitter/jgarber/587245857034133504/587680705938907136

Collaborator

kylewm commented Apr 13, 2015

In case it's useful, here's an example where Bridgy is being overly aggressive in assuming a tweet is the POSSE copy of an original.

here's the original: https://adactio.com/journal/8710
here's a tweet from someone else (another bridgy user) linking to the original: https://twitter.com/jgarber/status/587245857034133504

and then a bunch of RT's of that tweet are backfed to the original as if they are RTs of the original. e.g., https://brid-gy.appspot.com/repost/twitter/jgarber/587245857034133504/587680705938907136

@snarfed

This comment has been minimized.

Show comment
Hide comment
@snarfed

snarfed Apr 13, 2015

Owner

thanks @kylewm!

one way to mitigate: when the post's domain isn't one of the tweet author's domains, demote to u-mention.

Owner

snarfed commented Apr 13, 2015

thanks @kylewm!

one way to mitigate: when the post's domain isn't one of the tweet author's domains, demote to u-mention.

@snarfed

This comment has been minimized.

Show comment
Hide comment
@snarfed

snarfed Aug 28, 2015

Owner

some new thoughts from #452:

here's a concrete example. i recently tweeted this:

My silly privacy antics landed me in a @vice @Motherboard article on prepaid credit cards. Fun, mildly embarrassing. http://motherboard.vice.com/read/the-simple-trick-ashley-madisons-users-could-have-used-to-protect-themselves

with this new feature, we'd attempt to send a webmention with this tweet as the source and the motherboard.vice.com link as the target. of course, the source wouldn't actually be the twitter.com permalink, it'd be the bridgy proxy URL that renders the tweet as mf2.

one interesting question is whether to do consider this part of "listen" or "publish." ie should we start doing this when you sign up for backfeed? or only when you enable publish? it's not clear to me which one it belongs to. i'm leaning toward listen (backfeed), but not sure.

also, a catch: POSSE/PESOSed silo posts would end up sending multiple wms, one from the original post and one from each silo post, so the target would end up showing duplicates. bridgy already causes this for POSSEd comments/likes/reposts, though, so it's not a new problem, and we've pretty much agreed that it's the recipient's job to use syndication links, etc to de-dupe.

Owner

snarfed commented Aug 28, 2015

some new thoughts from #452:

here's a concrete example. i recently tweeted this:

My silly privacy antics landed me in a @vice @Motherboard article on prepaid credit cards. Fun, mildly embarrassing. http://motherboard.vice.com/read/the-simple-trick-ashley-madisons-users-could-have-used-to-protect-themselves

with this new feature, we'd attempt to send a webmention with this tweet as the source and the motherboard.vice.com link as the target. of course, the source wouldn't actually be the twitter.com permalink, it'd be the bridgy proxy URL that renders the tweet as mf2.

one interesting question is whether to do consider this part of "listen" or "publish." ie should we start doing this when you sign up for backfeed? or only when you enable publish? it's not clear to me which one it belongs to. i'm leaning toward listen (backfeed), but not sure.

also, a catch: POSSE/PESOSed silo posts would end up sending multiple wms, one from the original post and one from each silo post, so the target would end up showing duplicates. bridgy already causes this for POSSEd comments/likes/reposts, though, so it's not a new problem, and we've pretty much agreed that it's the recipient's job to use syndication links, etc to de-dupe.

@snarfed

This comment has been minimized.

Show comment
Hide comment
@snarfed

snarfed Aug 28, 2015

Owner

an idea for expanding this: search silos for any posts, from anyone, that link to the user's domain(s), and send wms for them too. these are effectively mentions.

silo support for this is mixed:

moved this to #456

Owner

snarfed commented Aug 28, 2015

an idea for expanding this: search silos for any posts, from anyone, that link to the user's domain(s), and send wms for them too. these are effectively mentions.

silo support for this is mixed:

moved this to #456

@snarfed

This comment has been minimized.

Show comment
Hide comment
@snarfed

snarfed Aug 29, 2015

Owner

added the full set of OPD heuristics to the IWC wiki. the important part for implementing is:

When considering a backlink in a silo post, use most or all of these heuristics to determine whether it's a POSSE:

  • The backlink must be at or near the end. (Allow e.g. a close paren after the link.)
  • The backlink must point to one of the user's domains, as determined by rel-me and links in their silo profile.
  • The silo post must be published within 24h of the original post.
  • New: compare the silo post's text and the original post's name, summary, and/or content, taking prefixes if they're meaningfully longer. (If the silo post has an ellipsis at or near the end, that's a strong hint to use a prefix.) The edit distance should be below a certain threshold, disregarding common differences like @-usernames in silo posts vs human names in original posts (e.g. this OP vs this POSSE).

current plan is to skip the last one due to complexity. i think the first three get us 80-95% of the value.

Owner

snarfed commented Aug 29, 2015

added the full set of OPD heuristics to the IWC wiki. the important part for implementing is:

When considering a backlink in a silo post, use most or all of these heuristics to determine whether it's a POSSE:

  • The backlink must be at or near the end. (Allow e.g. a close paren after the link.)
  • The backlink must point to one of the user's domains, as determined by rel-me and links in their silo profile.
  • The silo post must be published within 24h of the original post.
  • New: compare the silo post's text and the original post's name, summary, and/or content, taking prefixes if they're meaningfully longer. (If the silo post has an ellipsis at or near the end, that's a strong hint to use a prefix.) The edit distance should be below a certain threshold, disregarding common differences like @-usernames in silo posts vs human names in original posts (e.g. this OP vs this POSSE).

current plan is to skip the last one due to complexity. i think the first three get us 80-95% of the value.

@snarfed

This comment has been minimized.

Show comment
Hide comment
@snarfed

snarfed Sep 1, 2015

Owner

reorganizing this slightly. this issue will cover implementing the algorithm above for determining whether a silo post is a POSSE. if it is, we won't send a wm from it to the original post, but we will send its responses. if it isn't a POSSE, we'll send wms to each link in its text (and attachments, etc), as mentions, but we won't send wms for its responses anywhere.

@kylewm @tantek @kevinmarks @aaronpk @kartikprabhu i know this has been controversial for a while now. does that sound like the ideal behavior?

i'm opening a new issue for the feature to search all silo posts for links to users' sites and send mentions for those: #456

Owner

snarfed commented Sep 1, 2015

reorganizing this slightly. this issue will cover implementing the algorithm above for determining whether a silo post is a POSSE. if it is, we won't send a wm from it to the original post, but we will send its responses. if it isn't a POSSE, we'll send wms to each link in its text (and attachments, etc), as mentions, but we won't send wms for its responses anywhere.

@kylewm @tantek @kevinmarks @aaronpk @kartikprabhu i know this has been controversial for a while now. does that sound like the ideal behavior?

i'm opening a new issue for the feature to search all silo posts for links to users' sites and send mentions for those: #456

@snarfed snarfed changed the title from send webmentions for (non-POSSE) posts as well as responses to distinguish POSSE posts vs non-POSSE mentions and handle accordingly Sep 1, 2015

@snarfed snarfed added the now label Sep 1, 2015

@kevinmarks

This comment has been minimized.

Show comment
Hide comment
@kevinmarks

kevinmarks Sep 2, 2015

Not sure that is ideal - the pattern I get currently is that I quote an old post, my link to it is assumed to be POSSE, and so it isn't shown, but replies are. If it shows my non-pOSEE link, the follow-ups are often interesting too, with that context.

Not sure that is ideal - the pattern I get currently is that I quote an old post, my link to it is assumed to be POSSE, and so it isn't shown, but replies are. If it shows my non-pOSEE link, the follow-ups are often interesting too, with that context.

@snarfed

This comment has been minimized.

Show comment
Hide comment
@snarfed

snarfed Sep 2, 2015

Owner

@kevinmarks thanks for reviewing, and good point! ok, so for non-POSSE mentions, we backfeed replies, but not likes or reposts. sound good?

Owner

snarfed commented Sep 2, 2015

@kevinmarks thanks for reviewing, and good point! ok, so for non-POSSE mentions, we backfeed replies, but not likes or reposts. sound good?

@snarfed

This comment has been minimized.

Show comment
Hide comment
@snarfed

snarfed Sep 2, 2015

Owner

@kevinmarks on second thought, comparing to pure indieweb behavior...if i include a link in a post, I'd send a mention to it, but i wouldn't also send wms to it for each comment i get on my post, nor would i expect the commenters to send wms directly from their comment posts, since they're not replying to or mentioning that link. so... maybe we shouldn't backfeed replies to mentions after all?

Owner

snarfed commented Sep 2, 2015

@kevinmarks on second thought, comparing to pure indieweb behavior...if i include a link in a post, I'd send a mention to it, but i wouldn't also send wms to it for each comment i get on my post, nor would i expect the commenters to send wms directly from their comment posts, since they're not replying to or mentioning that link. so... maybe we shouldn't backfeed replies to mentions after all?

@kylewm

This comment has been minimized.

Show comment
Hide comment
@kylewm

kylewm Sep 2, 2015

Collaborator

I agree with that last bit -- Instead of backfeeding only the responses to a mention, it should only backfeed the mention itself. Replies to a mention are not replies to the original.

Unfortunately that means it matters even more that Bridgy guess correctly that something is a mention rather than a syndication (or err on the side of assuming syndication unless proven otherwise)... @snarfed in particular often rewords the silo copy so that I don't think edit distance would find them very similar at all, even though all the same information is contained (e.g. https://snarfed.org/2015-08-26_15313).

Collaborator

kylewm commented Sep 2, 2015

I agree with that last bit -- Instead of backfeeding only the responses to a mention, it should only backfeed the mention itself. Replies to a mention are not replies to the original.

Unfortunately that means it matters even more that Bridgy guess correctly that something is a mention rather than a syndication (or err on the side of assuming syndication unless proven otherwise)... @snarfed in particular often rewords the silo copy so that I don't think edit distance would find them very similar at all, even though all the same information is contained (e.g. https://snarfed.org/2015-08-26_15313).

@kylewm

This comment has been minimized.

Show comment
Hide comment
@kylewm

kylewm Sep 2, 2015

Collaborator

I suppose u-syndication could always represent a stronger claim on the posse copy. If publishers are having trouble with Bridgy classifying their dissimilar posts as mentions, they could start publishing u-syndication links.

Collaborator

kylewm commented Sep 2, 2015

I suppose u-syndication could always represent a stronger claim on the posse copy. If publishers are having trouble with Bridgy classifying their dissimilar posts as mentions, they could start publishing u-syndication links.

@snarfed

This comment has been minimized.

Show comment
Hide comment
@snarfed

snarfed Sep 2, 2015

Owner

right! syndication links override all of this. and as kevin mentioned in our initial IRC discussion, occasional false positives for high edit distances can probably be forgiven. deleting an occasional unwanted comment here and there generally shouldn't be too hard.

Owner

snarfed commented Sep 2, 2015

right! syndication links override all of this. and as kevin mentioned in our initial IRC discussion, occasional false positives for high edit distances can probably be forgiven. deleting an occasional unwanted comment here and there generally shouldn't be too hard.

@kylewm

This comment has been minimized.

Show comment
Hide comment
@kylewm

kylewm Sep 2, 2015

Collaborator

occasional false positives for high edit distances can probably be forgiven

if we adopt the convention of not backfeeding replies-to-mentions though, a false positive (true posse copy that bridgy thinks is a mere mention), it'd mean losing all replies to that post though :(

maybe that's a good argument in favor of backfeeding replies to mentions

Collaborator

kylewm commented Sep 2, 2015

occasional false positives for high edit distances can probably be forgiven

if we adopt the convention of not backfeeding replies-to-mentions though, a false positive (true posse copy that bridgy thinks is a mere mention), it'd mean losing all replies to that post though :(

maybe that's a good argument in favor of backfeeding replies to mentions

@kartikprabhu

This comment has been minimized.

Show comment
Hide comment
@kartikprabhu

kartikprabhu Sep 2, 2015

How does this interact with "salmentions"? https://indiewebcamp.com/salmention Are salmentions to be sent only for reply posts? not mentions, likes etc...?

How does this interact with "salmentions"? https://indiewebcamp.com/salmention Are salmentions to be sent only for reply posts? not mentions, likes etc...?

@snarfed

This comment has been minimized.

Show comment
Hide comment
@snarfed

snarfed Sep 2, 2015

Owner

@kartikprabhu good timing! #458 may be relevant to your interests. :P

short answer: bridgy already kinda does its own salmentions for silo posts, and i'm not sure we have a concrete use case yet where bridgy responses need to interoperate with real salmentions. i'd love to see one!

Owner

snarfed commented Sep 2, 2015

@kartikprabhu good timing! #458 may be relevant to your interests. :P

short answer: bridgy already kinda does its own salmentions for silo posts, and i'm not sure we have a concrete use case yet where bridgy responses need to interoperate with real salmentions. i'd love to see one!

@kartikprabhu

This comment has been minimized.

Show comment
Hide comment
@kartikprabhu

kartikprabhu Sep 2, 2015

But this issue seems like salmention from silo via bridgy. If someone only mentions a link on Twitter shouldn't the replies be sent to the link too like salmentions?

But this issue seems like salmention from silo via bridgy. If someone only mentions a link on Twitter shouldn't the replies be sent to the link too like salmentions?

@snarfed

This comment has been minimized.

Show comment
Hide comment
@snarfed

snarfed Sep 2, 2015

Owner

i don't know if they should. we don't really have a concrete spec for expected salmention behavior afaik (@kylewm @acegiak @dissolve correct me if i'm wrong). https://indiewebcamp.com/comment-propagation only talks about direct replies/comments, not mentions, but it's not clear if that's intentional.

we're working hard enough here just to getting the POSSE-or-mention? logic right and agreeing on the expected behavior in each case. i'm inclined to punt discussion of salmention interop to #458 or elsewhere, if that's ok.

Owner

snarfed commented Sep 2, 2015

i don't know if they should. we don't really have a concrete spec for expected salmention behavior afaik (@kylewm @acegiak @dissolve correct me if i'm wrong). https://indiewebcamp.com/comment-propagation only talks about direct replies/comments, not mentions, but it's not clear if that's intentional.

we're working hard enough here just to getting the POSSE-or-mention? logic right and agreeing on the expected behavior in each case. i'm inclined to punt discussion of salmention interop to #458 or elsewhere, if that's ok.

@kartikprabhu

This comment has been minimized.

Show comment
Hide comment
@kartikprabhu

kartikprabhu Sep 2, 2015

Of course. They seemed to be related and so I brought this up.

Of course. They seemed to be related and so I brought this up.

@snarfed

This comment has been minimized.

Show comment
Hide comment
@snarfed

snarfed Sep 4, 2015

Owner

i'm hoping to start working on this over the weekend. i know people have felt strongly about this, so i'd love to hear more of you weigh in on #51 (comment) before i start, even if it's just "sounds good" or "not so sure, let's discuss more first." thanks in advance!

Owner

snarfed commented Sep 4, 2015

i'm hoping to start working on this over the weekend. i know people have felt strongly about this, so i'd love to hear more of you weigh in on #51 (comment) before i start, even if it's just "sounds good" or "not so sure, let's discuss more first." thanks in advance!

@kartikprabhu

This comment has been minimized.

Show comment
Hide comment
@kartikprabhu

kartikprabhu Sep 4, 2015

@snarfed looks good enough to try out and see if additional issues crop up

@snarfed looks good enough to try out and see if additional issues crop up

@snarfed

This comment has been minimized.

Show comment
Hide comment
@snarfed

snarfed Sep 11, 2015

Owner

current estimate of cost of implementing this, across all commits here and many others that didn't get attached: >1kloc. whee!

Owner

snarfed commented Sep 11, 2015

current estimate of cost of implementing this, across all commits here and many others that didn't get attached: >1kloc. whee!

@snarfed snarfed self-assigned this Sep 11, 2015

snarfed added a commit that referenced this issue Sep 12, 2015

bump app version for OPD changes (#469, #51)
also rename test due to review feedback

snarfed added a commit to snarfed/granary that referenced this issue Sep 12, 2015

Merge pull request #37 from snarfed/opd2
second half of big OPD refactoring for snarfed/bridgy#51

snarfed added a commit that referenced this issue Sep 12, 2015

Merge pull request #469 from snarfed/opd2
second half of big OPD refactoring for #51
@snarfed

This comment has been minimized.

Show comment
Hide comment
@snarfed

snarfed Sep 12, 2015

Owner

just fyi all, the first pass at this is running in prod. the two key changes are that we only interpret links as original posts if they're on one of the user's domain(s), and we only backfeed likes/reposts/rsvps to original posts, not mentions.

please let me know if you see anything that seems wrong!

Owner

snarfed commented Sep 12, 2015

just fyi all, the first pass at this is running in prod. the two key changes are that we only interpret links as original posts if they're on one of the user's domain(s), and we only backfeed likes/reposts/rsvps to original posts, not mentions.

please let me know if you see anything that seems wrong!

@snarfed

This comment has been minimized.

Show comment
Hide comment
@snarfed

snarfed Sep 13, 2015

Owner

@armingrewe reported in #470 that this is making him miss some backfeed since he POSSES from a number of different web sites and doesn't have all of them in his silo profiles. may be one real world counterexample to the domain check.

Owner

snarfed commented Sep 13, 2015

@armingrewe reported in #470 that this is making him miss some backfeed since he POSSES from a number of different web sites and doesn't have all of them in his silo profiles. may be one real world counterexample to the domain check.

@armingrewe

This comment has been minimized.

Show comment
Hide comment
@armingrewe

armingrewe Sep 13, 2015

Thing is, certainly for Twitter you can only have one site in your profile, not sure how I could add the others?
For Google+ I've re-linked the account, it seems to have picked up the other sites now. Fingers crossed that will fix it.

Thing is, certainly for Twitter you can only have one site in your profile, not sure how I could add the others?
For Google+ I've re-linked the account, it seems to have picked up the other sites now. Fingers crossed that will fix it.

@voxpelli

This comment has been minimized.

Show comment
Hide comment
@voxpelli

voxpelli Sep 13, 2015

For Twitter one would probably have to look up the rel-me-links of the linked-to profile and include them as if they were linked to directly by Twitter (ideally maybe resolve the entire identity graph, but that would require something like relspider which would be a first practical use for such a graph in the community – not even IndieAuth uses it yet – and as such pretty experimental).

As the Twitter account claims to have the same identity as the webpage that has the rel-me links any link there can safely be assumed to also be a claimed identity of the Twitter account (although of course not the reverse – the Twitter account can not be safely assumed to be a claimed identity of those pages unless they somehow have a verified chain back to it by eg. linking back with rel-me to the original site or by themselves linking to the Twitter account)

For Twitter one would probably have to look up the rel-me-links of the linked-to profile and include them as if they were linked to directly by Twitter (ideally maybe resolve the entire identity graph, but that would require something like relspider which would be a first practical use for such a graph in the community – not even IndieAuth uses it yet – and as such pretty experimental).

As the Twitter account claims to have the same identity as the webpage that has the rel-me links any link there can safely be assumed to also be a claimed identity of the Twitter account (although of course not the reverse – the Twitter account can not be safely assumed to be a claimed identity of those pages unless they somehow have a verified chain back to it by eg. linking back with rel-me to the original site or by themselves linking to the Twitter account)

@snarfed

This comment has been minimized.

Show comment
Hide comment
@snarfed

snarfed Sep 13, 2015

Owner

@armingrewe we actually pull urls from all text in twitter profiles, including the description, so you can put others there. same with other silos.

@voxpelli good points about rel-me links! we don't currently look at them now, but we definitely could.

Owner

snarfed commented Sep 13, 2015

@armingrewe we actually pull urls from all text in twitter profiles, including the description, so you can put others there. same with other silos.

@voxpelli good points about rel-me links! we don't currently look at them now, but we definitely could.

@armingrewe

This comment has been minimized.

Show comment
Hide comment
@armingrewe

armingrewe Sep 13, 2015

@snarfed ah, thanks for that, hadn't realised that. Updated and relinked my profile, I'll watch out if that works now.

@snarfed ah, thanks for that, hadn't realised that. Updated and relinked my profile, I'll watch out if that works now.

@armingrewe

This comment has been minimized.

Show comment
Hide comment
@armingrewe

armingrewe Sep 14, 2015

Just to confirm, as far as I can tell the Twitter and G+ mentions are now flowing through again. On the blog with the most activity I usually post my morning (UK, ~6:30 GMT/BST) and the majority of mentions come over the next few hours. All fine so far.

Just to confirm, as far as I can tell the Twitter and G+ mentions are now flowing through again. On the blog with the most activity I usually post my morning (UK, ~6:30 GMT/BST) and the majority of mentions come over the next few hours. All fine so far.

@snarfed

This comment has been minimized.

Show comment
Hide comment
@snarfed

snarfed Sep 14, 2015

Owner

thanks for the update @armingrewe! glad to hear it.

btw Facebook should work in general too, but I know you mentioned it hasn't for you. feel free to post details if you want!

Owner

snarfed commented Sep 14, 2015

thanks for the update @armingrewe! glad to hear it.

btw Facebook should work in general too, but I know you mentioned it hasn't for you. feel free to post details if you want!

@armingrewe

This comment has been minimized.

Show comment
Hide comment
@armingrewe

armingrewe Sep 14, 2015

Facebook was fine all the time ;-) There might be something where bridgy isn't picking up something when I post via WordPress, but I need to look at that before I can be sure if there's an issue.

Facebook was fine all the time ;-) There might be something where bridgy isn't picking up something when I post via WordPress, but I need to look at that before I can be sure if there's an issue.

@snarfed

This comment has been minimized.

Show comment
Hide comment
@snarfed

snarfed Sep 15, 2015

Owner

i've updated the discussion of these OPD heuristics in https://indiewebcamp.com/original-post-discovery#Brainstorming . tldr: there are four, and we've hit real world counterexamples for all of them in bridgy, so none are ideal.

  • user's domain
  • within 24h
  • near the end of the silo post
  • nearly the same text as the silo post, ie edit distance is below a given threshold
Owner

snarfed commented Sep 15, 2015

i've updated the discussion of these OPD heuristics in https://indiewebcamp.com/original-post-discovery#Brainstorming . tldr: there are four, and we've hit real world counterexamples for all of them in bridgy, so none are ideal.

  • user's domain
  • within 24h
  • near the end of the silo post
  • nearly the same text as the silo post, ie edit distance is below a given threshold
@kylewm

This comment has been minimized.

Show comment
Hide comment
@kylewm

kylewm Sep 15, 2015

Collaborator

few random thoughts...

Another possible heuristic: have we already seen a POSSE for this post on this service? if so, it's more likely that subsequent links are mentions. It's not that strong of a criteria because many people will tweet links to the same piece throughout the day (e.g. Dave Winer), and of course tweets are deleted and reposted as edits.

It's much more costly to incorrectly identify a POSSE copy as a mention, i.e. no backfeed for that post. So the threshold for qualifying as a POSSE copy should probably be way lower, maybe matching some subset of the criteria, like off the top of my head:

* any two of the first three
* any one of the first three + lower than 50% edit distance
* lower than 30% edit distance

It's very difficult to correctly categorize the "Kevin tweets a link to his post within 24h" case without throwing out a lot of legitimate POSSEs. In the specific case on the wiki, we could say it looks like he is tweeting at someone but the original isn't in-reply-to anything...wonder if that applies more generally to self-mentions.

Collaborator

kylewm commented Sep 15, 2015

few random thoughts...

Another possible heuristic: have we already seen a POSSE for this post on this service? if so, it's more likely that subsequent links are mentions. It's not that strong of a criteria because many people will tweet links to the same piece throughout the day (e.g. Dave Winer), and of course tweets are deleted and reposted as edits.

It's much more costly to incorrectly identify a POSSE copy as a mention, i.e. no backfeed for that post. So the threshold for qualifying as a POSSE copy should probably be way lower, maybe matching some subset of the criteria, like off the top of my head:

* any two of the first three
* any one of the first three + lower than 50% edit distance
* lower than 30% edit distance

It's very difficult to correctly categorize the "Kevin tweets a link to his post within 24h" case without throwing out a lot of legitimate POSSEs. In the specific case on the wiki, we could say it looks like he is tweeting at someone but the original isn't in-reply-to anything...wonder if that applies more generally to self-mentions.

@snarfed

This comment has been minimized.

Show comment
Hide comment
@snarfed

snarfed Sep 15, 2015

Owner

thanks @kylewm! interesting idea to record inferred POSSE links and check them later. kind of an extension of the way we already store syndication links. and you're right, the standard way to handle a complicated inference like this based on heuristics is to combine them with weights into a score... and that in this case, false negatives hurt much more than false positives. (I've always described bridgy as deliberately "promiscuous." :P)

I'm already second guessing all this added complexity, though, and it looks like the domain check is comfortably the strongest so far, so I'm kind of leaning toward just that. meh.

Owner

snarfed commented Sep 15, 2015

thanks @kylewm! interesting idea to record inferred POSSE links and check them later. kind of an extension of the way we already store syndication links. and you're right, the standard way to handle a complicated inference like this based on heuristics is to combine them with weights into a score... and that in this case, false negatives hurt much more than false positives. (I've always described bridgy as deliberately "promiscuous." :P)

I'm already second guessing all this added complexity, though, and it looks like the domain check is comfortably the strongest so far, so I'm kind of leaning toward just that. meh.

@kylewm

This comment has been minimized.

Show comment
Hide comment
@kylewm

kylewm Sep 15, 2015

Collaborator

I'm already second guessing all this added complexity, though, and it looks like the domain check is comfortably the strongest so far, so I'm kind of leaning toward just that. meh.

I would support that too. Fight that sunk cost fallacy!

Collaborator

kylewm commented Sep 15, 2015

I'm already second guessing all this added complexity, though, and it looks like the domain check is comfortably the strongest so far, so I'm kind of leaning toward just that. meh.

I would support that too. Fight that sunk cost fallacy!

@tinokremer

This comment has been minimized.

Show comment
Hide comment
@tinokremer

tinokremer Sep 22, 2015

I'm not sure if it's this issue. I came here when searching for the "No post links found" message in this repository. For me Bridgy behaves a bit odd. I have posted my links as usual to Google+ (manually from my Known instance) and the favorites are feeded back to my site as normal, but the replies are not with the message "No post links found". I checked my Google+ profile and https://stream.tinokremer.nl is mentioned. On my own Known instance, my Google+ profile is mentioned too and IndieAuth sees it as normal.

I'm puzzled why Bridgy cannot see post links, can you shed light on that @snarfed ?

2015-09-22_184303

I'm not sure if it's this issue. I came here when searching for the "No post links found" message in this repository. For me Bridgy behaves a bit odd. I have posted my links as usual to Google+ (manually from my Known instance) and the favorites are feeded back to my site as normal, but the replies are not with the message "No post links found". I checked my Google+ profile and https://stream.tinokremer.nl is mentioned. On my own Known instance, my Google+ profile is mentioned too and IndieAuth sees it as normal.

I'm puzzled why Bridgy cannot see post links, can you shed light on that @snarfed ?

2015-09-22_184303

@snarfed

This comment has been minimized.

Show comment
Hide comment
@snarfed

snarfed Sep 23, 2015

Owner

@tinokremer sorry for the trouble! you're right, it probably is due to this. current status: trying to track down the memory leak in #456 (comment), which is blocking further fixes here. wish me luck!

Owner

snarfed commented Sep 23, 2015

@tinokremer sorry for the trouble! you're right, it probably is due to this. current status: trying to track down the memory leak in #456 (comment), which is blocking further fixes here. wish me luck!

@tinokremer

This comment has been minimized.

Show comment
Hide comment
@tinokremer

tinokremer Sep 23, 2015

Memory leaks are the hardest issues to solve and I'm a C# .Net developer. The reference system and garbage collector cleans up most of my mess. Good luck indeed!

Memory leaks are the hardest issues to solve and I'm a C# .Net developer. The reference system and garbage collector cleans up most of my mess. Good luck indeed!

snarfed added a commit to snarfed/granary that referenced this issue Sep 23, 2015

snarfed added a commit to snarfed/granary that referenced this issue Sep 24, 2015

add include_redirect_sources kwarg to Source.original_post_discovery()
matches same kwarg in bridgy's original_post_discovery.discover(). for snarfed/bridgy#51, snarfed/bridgy#485

snarfed added a commit that referenced this issue Sep 24, 2015

on redirects, only include final URLs in webmention targets, not init…
…ial ones

uses new include_redirect_sources kwarg in Source.original_post_discovery(). for #51, #485
@snarfed

This comment has been minimized.

Show comment
Hide comment
@snarfed

snarfed Sep 26, 2015

Owner

tentatively closing. this has been running in prod and stable for a few days. I'm sure there are more bugs left to fix, but we can open new issues for them.

Owner

snarfed commented Sep 26, 2015

tentatively closing. this has been running in prod and stable for a few days. I'm sure there are more bugs left to fix, but we can open new issues for them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment