GDPR: Prevent saving data from crawled web mentions #168

yatil · 2018-05-06T09:03:43Z

When services like Brid.gy get webmentions from silos, the users there have no idea that their comments or likes, along with their name and profile picture, are shown on the article site. This is problematic for privacy reasons.

In the spirit of GDPR and avoiding collecting and storing data in the first place, we should anonymize the data before it is stored in the database. Once the webmention is verified and comes from bridgy, it should result in a comment like “Someone liked this on twitter.com” with no identifiable information of the person who liked it.

pfefferle · 2018-05-06T09:37:42Z

Sure, but we shouldn't limit it to bird.gy. A general way to handle this, would be to check if the referrer is a different domain as the source URL. If so, it should be anonymized.

pfefferle · 2018-05-06T09:38:31Z

I haven't implemented something like that, because I am not sure if it is better to disable brid.gy completely.

yatil · 2018-05-06T09:41:08Z

I agree for not limiting it to brid.gy – I would totally like to keep the functionality of knowing that a blog post got traction without knowing exactly who it is, but it might just be overhead. Another idea would be to add something to the Webmention spec that allows the mention to specify the level of data that you are allowed to save about the Webmention. I tried (badly) to summarize my thoughts here: w3c/webmention#96

pfefferle · 2018-05-06T09:45:02Z

Sure, but the problem is, that the spec says nearly nothing about the parsing/handling of the Webmention at all. The whole microformats part is discussed under the umbrella of the IndieWeb community.

pfefferle · 2018-05-06T09:48:23Z

That's why I splitted the functionality into two plugins. If you are using only the Webmention plugin (which implements simply the webmention spec), you get exactly what you expect: "This Article was mentioned on twitter.com". The Semantic-Linkbacks plugin tries to make it human-readable following the IndieWeb principles.

yatil · 2018-05-06T09:59:35Z

Ah, that makes sense!

pfefferle · 2018-05-06T10:00:50Z

Perhaps we should also rethink bird.gy. Why should I, as a site owner, register at bird.gy to get likes to my tweets. Why not build a service where twitter users can register to send pings to sites, they like or tweet? Something like flattr tried some years ago https://blog.flattr.com/2013/04/twitter-is-forcing-us-to-drop-users-ability-to-flattr-creators-by-favoriting-their-tweets/

pfefferle · 2018-05-06T10:06:16Z

@yatil but if you deactivate the Semantic Linkbacks plugin the "classic Webmentions" are also looking like that: "This Article was mention on yatil.de"

pfefferle · 2018-05-06T10:19:40Z

Additional informations: https://sebastiangreger.net/2018/05/indieweb-privacy-challenge-webmentions-backfeeds-gdpr/

dshanske · 2018-05-06T11:40:16Z

I will assist in this, but I would not use it.

snarfed · 2018-05-06T17:48:09Z

Why should I, as a site owner, register at bird.gy to get likes to my tweets. Why not build a service where twitter users can register to send pings to sites, they like or tweet?

bridgy does this too! it backfeeds responses to your tweets, but it also tries to send outgoing webmentions to every link you put in your tweets.

A general way to handle this, would be to check if the referrer is a different domain as the source URL. If so, it should be anonymized.

do we have other common examples of this? I'd be reluctant to generalize just yet if bridgy is the only common one so far.

pfefferle · 2018-05-06T18:26:56Z

we have manual Webmentions using the comment-forms or the endpoint forms...

pfefferle · 2018-05-06T18:29:49Z

bridgy does this too! it backfeeds responses to your tweets, but it also tries to send outgoing webmentions to every link you put in your tweets.

wasn't aware of that, thanks for the info!

snarfed · 2018-05-06T22:42:05Z

honestly, i think the indieweb community has been way overthinking GDPR. we've been wringing our hands about it a ton, but i really don't expect it will affect us very much at all. in practice, almost all of what we're doing here is tiny personal web sites, which realistically will never get sued or anything similar.

at the risk of stereotyping: engineers and technical types like us like to think about laws because they're big fine grained complicated sets of rules, which we're comfortable with and attracted to...but that's not usually a good use of our time. i think the vast majority of our time spent on GDPR would be better spent on UX etc instead.

for bridgy specifically, here's our current thinking about its compliance: https://brid.gy/about#gdpr

(also, i'm just talking about laws specifically here, not ethics. ethics is worth thinking about! eg sgreger's great post, and the discussion in its comments. legal compliance though, meh. not a high priority for us in practice.)

dshanske · 2018-05-07T00:05:32Z

I lean with @snarfed on this. It's why all of my suggestions involve UI and documentation. You don't want to use X... turn it off.

yatil · 2018-05-07T07:43:37Z

As a tech freelancer, none of my websites is considered “tiny personal” by the GDPR law. There is already an industry in Germany that sends Abmahnungen for the smallest of violations.

dshanske · 2018-05-07T11:36:17Z

@yatil That is why it becomes a settings issue. I am fine with adding settings to allow you to adjust the granularity of the response. But for those who take a different view for whatever reason, they should have the option as well.

yatil · 2018-05-07T11:48:05Z

Sure +1 for a setting :-)

dshanske · 2018-05-07T11:49:51Z

@yatil I've stopped my other plans to help build these settings for all Indieweb plugins because I know it is a community concern to at least do some of this before May 25th.

armingrewe · 2018-05-07T12:00:38Z

I'm with @dshanske, it should be a setting, not a mandatory thing forced on everyone.

yatil · 2018-05-07T12:03:17Z

I did not want to suggest it would be mandatory for everyone, sorry if it came across like that.

dshanske · 2018-05-07T12:42:58Z

I didn't think you did, but wanted to define it

snarfed · 2018-06-01T22:54:18Z

fwiw, this still seems to potentially apply to all webmentions, not just bridgy or other proxies.

for example, if a site links to you, but it doesn't send a wm, someone else can still send a wm, and that site's author name, picture, will end up in your responses. if they don't want that, due to GDPR or whatever, the concern here still applies.

so if we want an "anonymize" option, ok, but we probably want to make it global.

metbril · 2018-08-20T05:33:06Z

Why bother at all? If someone posts a like, reply or retweet publicly, he/she cannot prevent from this being sent/copied etc. all over the net. This should already be in the privacy statement of the original service. Just like Wordpress does for it's native comments. I wouldn't use a fully anonymized plugin. It defeats the purpose of webmention for me, social interaction. An option to enable it would be fine however.

yatil · 2018-08-20T05:53:33Z

@metbril Because GDPR does not allow it. If you save the data, then you're responsible for it, no matter if the user posted it publicly or not. It is personal information you keep and the person saving the data is responsible for informing the user that it is collected (which does not happen with 99% of the websites, so one cannot expect the user to know) and give them options to remove the data from your website. Some might argue that this needs explicit consent before saving any data (I don’t think so). OR you can avoid trouble by anonymizing the data immediately.

metbril · 2018-08-20T07:12:34Z

@yatil

the person saving the data is responsible for informing the user that it is collected (which does not happen with 99% of the websites, so one cannot expect the user to know)

Would this imply that for example Google is in violation of GDPR by indexing the Twitter website and storing personal identifiable information on the way?

pfefferle · 2018-08-20T07:53:15Z

not violation, but they have to be transparent what they save and have to give you the option to request the information they have about you and the option to delete the informations.

The other aspect of Google is, that they only "cite" informations... We build a context and send texts to other pages, that can be commented in an other context.

pfefferle · 2018-08-20T07:54:36Z

It is not about good/bad or correct/wrong, it's more about transparency, so that the user has a choice.

yatil · 2018-08-20T10:22:12Z

@metbril No, as when you delete a tweet and google reindexes the page, the search result will also be deleted. Same when you put your profile in private mode. There’s nothing that would reflect that change via webmentions. (If Google would continue to show the entry, they would be in violation.)

pfefferle · 2018-08-20T11:18:11Z

@yatil kind of... If you delete a post, the Webmention plugins also sends a delete request... But in the end it depends on the other party to support the deletion...

github-actions · 2024-07-23T01:53:41Z

This issue is stale because it has been open 365 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions bot added the Stale label Jul 23, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GDPR: Prevent saving data from crawled web mentions #168

GDPR: Prevent saving data from crawled web mentions #168

yatil commented May 6, 2018

pfefferle commented May 6, 2018

pfefferle commented May 6, 2018

yatil commented May 6, 2018

pfefferle commented May 6, 2018

pfefferle commented May 6, 2018 •

edited

Loading

yatil commented May 6, 2018

pfefferle commented May 6, 2018 •

edited

Loading

pfefferle commented May 6, 2018

pfefferle commented May 6, 2018

dshanske commented May 6, 2018

snarfed commented May 6, 2018

pfefferle commented May 6, 2018

pfefferle commented May 6, 2018

snarfed commented May 6, 2018 •

edited

Loading

dshanske commented May 7, 2018

yatil commented May 7, 2018

dshanske commented May 7, 2018

yatil commented May 7, 2018

dshanske commented May 7, 2018

armingrewe commented May 7, 2018

yatil commented May 7, 2018

dshanske commented May 7, 2018

snarfed commented Jun 1, 2018

metbril commented Aug 20, 2018

yatil commented Aug 20, 2018

metbril commented Aug 20, 2018

pfefferle commented Aug 20, 2018

pfefferle commented Aug 20, 2018

yatil commented Aug 20, 2018

pfefferle commented Aug 20, 2018

github-actions bot commented Jul 23, 2024

GDPR: Prevent saving data from crawled web mentions #168

GDPR: Prevent saving data from crawled web mentions #168

Comments

yatil commented May 6, 2018

pfefferle commented May 6, 2018

pfefferle commented May 6, 2018

yatil commented May 6, 2018

pfefferle commented May 6, 2018

pfefferle commented May 6, 2018 • edited Loading

yatil commented May 6, 2018

pfefferle commented May 6, 2018 • edited Loading

pfefferle commented May 6, 2018

pfefferle commented May 6, 2018

dshanske commented May 6, 2018

snarfed commented May 6, 2018

pfefferle commented May 6, 2018

pfefferle commented May 6, 2018

snarfed commented May 6, 2018 • edited Loading

dshanske commented May 7, 2018

yatil commented May 7, 2018

dshanske commented May 7, 2018

yatil commented May 7, 2018

dshanske commented May 7, 2018

armingrewe commented May 7, 2018

yatil commented May 7, 2018

dshanske commented May 7, 2018

snarfed commented Jun 1, 2018

metbril commented Aug 20, 2018

yatil commented Aug 20, 2018

metbril commented Aug 20, 2018

pfefferle commented Aug 20, 2018

pfefferle commented Aug 20, 2018

yatil commented Aug 20, 2018

pfefferle commented Aug 20, 2018

github-actions bot commented Jul 23, 2024

pfefferle commented May 6, 2018 •

edited

Loading

pfefferle commented May 6, 2018 •

edited

Loading

snarfed commented May 6, 2018 •

edited

Loading