Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GDPR: Prevent saving data from crawled web mentions #168

Closed
yatil opened this issue May 6, 2018 · 31 comments
Closed

GDPR: Prevent saving data from crawled web mentions #168

yatil opened this issue May 6, 2018 · 31 comments
Labels

Comments

@yatil
Copy link

yatil commented May 6, 2018

When services like Brid.gy get webmentions from silos, the users there have no idea that their comments or likes, along with their name and profile picture, are shown on the article site. This is problematic for privacy reasons.

In the spirit of GDPR and avoiding collecting and storing data in the first place, we should anonymize the data before it is stored in the database. Once the webmention is verified and comes from bridgy, it should result in a comment like “Someone liked this on twitter.com” with no identifiable information of the person who liked it.

@pfefferle
Copy link
Owner

Sure, but we shouldn't limit it to bird.gy. A general way to handle this, would be to check if the referrer is a different domain as the source URL. If so, it should be anonymized.

@pfefferle
Copy link
Owner

I haven't implemented something like that, because I am not sure if it is better to disable brid.gy completely.

@yatil
Copy link
Author

yatil commented May 6, 2018

I agree for not limiting it to brid.gy – I would totally like to keep the functionality of knowing that a blog post got traction without knowing exactly who it is, but it might just be overhead. Another idea would be to add something to the Webmention spec that allows the mention to specify the level of data that you are allowed to save about the Webmention. I tried (badly) to summarize my thoughts here: w3c/webmention#96

@pfefferle
Copy link
Owner

Sure, but the problem is, that the spec says nearly nothing about the parsing/handling of the Webmention at all. The whole microformats part is discussed under the umbrella of the IndieWeb community.

@pfefferle
Copy link
Owner

pfefferle commented May 6, 2018

That's why I splitted the functionality into two plugins. If you are using only the Webmention plugin (which implements simply the webmention spec), you get exactly what you expect: "This Article was mentioned on twitter.com". The Semantic-Linkbacks plugin tries to make it human-readable following the IndieWeb principles.

@yatil
Copy link
Author

yatil commented May 6, 2018

Ah, that makes sense!

@pfefferle
Copy link
Owner

pfefferle commented May 6, 2018

Perhaps we should also rethink bird.gy. Why should I, as a site owner, register at bird.gy to get likes to my tweets. Why not build a service where twitter users can register to send pings to sites, they like or tweet? Something like flattr tried some years ago https://blog.flattr.com/2013/04/twitter-is-forcing-us-to-drop-users-ability-to-flattr-creators-by-favoriting-their-tweets/

@pfefferle
Copy link
Owner

@yatil but if you deactivate the Semantic Linkbacks plugin the "classic Webmentions" are also looking like that: "This Article was mention on yatil.de"

@pfefferle
Copy link
Owner

@dshanske
Copy link
Collaborator

dshanske commented May 6, 2018

I will assist in this, but I would not use it.

@snarfed
Copy link
Contributor

snarfed commented May 6, 2018

Why should I, as a site owner, register at bird.gy to get likes to my tweets. Why not build a service where twitter users can register to send pings to sites, they like or tweet?

bridgy does this too! it backfeeds responses to your tweets, but it also tries to send outgoing webmentions to every link you put in your tweets.

A general way to handle this, would be to check if the referrer is a different domain as the source URL. If so, it should be anonymized.

do we have other common examples of this? I'd be reluctant to generalize just yet if bridgy is the only common one so far.

@pfefferle
Copy link
Owner

we have manual Webmentions using the comment-forms or the endpoint forms...

@pfefferle
Copy link
Owner

bridgy does this too! it backfeeds responses to your tweets, but it also tries to send outgoing webmentions to every link you put in your tweets.

wasn't aware of that, thanks for the info!

@snarfed
Copy link
Contributor

snarfed commented May 6, 2018

honestly, i think the indieweb community has been way overthinking GDPR. we've been wringing our hands about it a ton, but i really don't expect it will affect us very much at all. in practice, almost all of what we're doing here is tiny personal web sites, which realistically will never get sued or anything similar.

at the risk of stereotyping: engineers and technical types like us like to think about laws because they're big fine grained complicated sets of rules, which we're comfortable with and attracted to...but that's not usually a good use of our time. i think the vast majority of our time spent on GDPR would be better spent on UX etc instead.

for bridgy specifically, here's our current thinking about its compliance: https://brid.gy/about#gdpr

(also, i'm just talking about laws specifically here, not ethics. ethics is worth thinking about! eg sgreger's great post, and the discussion in its comments. legal compliance though, meh. not a high priority for us in practice.)

@dshanske
Copy link
Collaborator

dshanske commented May 7, 2018

I lean with @snarfed on this. It's why all of my suggestions involve UI and documentation. You don't want to use X... turn it off.

@yatil
Copy link
Author

yatil commented May 7, 2018

As a tech freelancer, none of my websites is considered “tiny personal” by the GDPR law. There is already an industry in Germany that sends Abmahnungen for the smallest of violations.

@dshanske
Copy link
Collaborator

dshanske commented May 7, 2018

@yatil That is why it becomes a settings issue. I am fine with adding settings to allow you to adjust the granularity of the response. But for those who take a different view for whatever reason, they should have the option as well.

@yatil
Copy link
Author

yatil commented May 7, 2018

Sure +1 for a setting :-)

@dshanske
Copy link
Collaborator

dshanske commented May 7, 2018

@yatil I've stopped my other plans to help build these settings for all Indieweb plugins because I know it is a community concern to at least do some of this before May 25th.

@armingrewe
Copy link

I'm with @dshanske, it should be a setting, not a mandatory thing forced on everyone.

@yatil
Copy link
Author

yatil commented May 7, 2018

I did not want to suggest it would be mandatory for everyone, sorry if it came across like that.

@dshanske
Copy link
Collaborator

dshanske commented May 7, 2018

I didn't think you did, but wanted to define it

@snarfed
Copy link
Contributor

snarfed commented Jun 1, 2018

fwiw, this still seems to potentially apply to all webmentions, not just bridgy or other proxies.

for example, if a site links to you, but it doesn't send a wm, someone else can still send a wm, and that site's author name, picture, will end up in your responses. if they don't want that, due to GDPR or whatever, the concern here still applies.

so if we want an "anonymize" option, ok, but we probably want to make it global.

@metbril
Copy link

metbril commented Aug 20, 2018

Why bother at all? If someone posts a like, reply or retweet publicly, he/she cannot prevent from this being sent/copied etc. all over the net. This should already be in the privacy statement of the original service. Just like Wordpress does for it's native comments. I wouldn't use a fully anonymized plugin. It defeats the purpose of webmention for me, social interaction. An option to enable it would be fine however.

@yatil
Copy link
Author

yatil commented Aug 20, 2018

@metbril Because GDPR does not allow it. If you save the data, then you're responsible for it, no matter if the user posted it publicly or not. It is personal information you keep and the person saving the data is responsible for informing the user that it is collected (which does not happen with 99% of the websites, so one cannot expect the user to know) and give them options to remove the data from your website. Some might argue that this needs explicit consent before saving any data (I don’t think so). OR you can avoid trouble by anonymizing the data immediately.

@metbril
Copy link

metbril commented Aug 20, 2018

@yatil

the person saving the data is responsible for informing the user that it is collected (which does not happen with 99% of the websites, so one cannot expect the user to know)

Would this imply that for example Google is in violation of GDPR by indexing the Twitter website and storing personal identifiable information on the way?

@pfefferle
Copy link
Owner

not violation, but they have to be transparent what they save and have to give you the option to request the information they have about you and the option to delete the informations.

The other aspect of Google is, that they only "cite" informations... We build a context and send texts to other pages, that can be commented in an other context.

@pfefferle
Copy link
Owner

It is not about good/bad or correct/wrong, it's more about transparency, so that the user has a choice.

@yatil
Copy link
Author

yatil commented Aug 20, 2018

@metbril No, as when you delete a tweet and google reindexes the page, the search result will also be deleted. Same when you put your profile in private mode. There’s nothing that would reflect that change via webmentions. (If Google would continue to show the entry, they would be in violation.)

@pfefferle
Copy link
Owner

@yatil kind of... If you delete a post, the Webmention plugins also sends a delete request... But in the end it depends on the other party to support the deletion...

Copy link

This issue is stale because it has been open 365 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Jul 23, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants