Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gatsby collects personal usage data by default #12922

Closed
doolittle opened this issue Mar 28, 2019 · 7 comments
Closed

Gatsby collects personal usage data by default #12922

doolittle opened this issue Mar 28, 2019 · 7 comments

Comments

@doolittle
Copy link

doolittle commented Mar 28, 2019

Description

Developer usage tracking should be opt-in, not opt-out.

#12758 adds on-by-default usage tracking to Gatsby. Any developer that updates his or her build is enrolled in tracking without further action.

The CLI does notify the developer with an easily overlooked note:

Screenshot 2019-03-28 12 39 46

The notification, while clearly worded, is quickly subsumed by dozens of lines of logs and easily overlooked.

The Telemetry documentation notes that it collects personal information such as: unique machine id, hardware specifications, timestamp, working directory, and command run. The Telemetry code is here.

Steps to reproduce

  • install gatsby v2.3.3
  • run gatsby new foo

Expected result

  • CLI prompts user to enable anonymous usage collection.
  • Operation halts until user makes a choice.
  • If user simply pushes enter to move forward, anonymous usage is disabled (i.e. the default is to not collect information)

Actual result

  • CLI notifies user anonymous collection is enabled
  • Executes command, logging many lines of code and usually enough to push data collection notification off screen
  • User is opted in to data collection by default, and may not be aware of it.

Environment

Any.

Why it's important

On-by-default usage tracking is surprising and unexpected for open source software.

It is undeniably useful to collect usage information for product development, and those that want to contribute to product development by virtue of their usage should be able to opt-into that feature.

@DSchau
Copy link
Contributor

DSchau commented Mar 28, 2019

Hi Brad!

Thanks for opening this--we appreciate you taking the time to do so!

This feature was discussed for a few weeks in the RFC. Everyone can see how we've implemented it in this PR if interested.

That being said--let me address your specific comments and ideas. There's definitely areas we can learn from here, so thank you!

The notification, while clearly worded, is quickly subsumed by dozens of lines of logs and easily overlooked.

Agreed on the easily overlooked nature. I've opened an issue here to make it more obvious.

On-by-default usage tracking is surprising and unexpected for open source software.

Gatsby telemetry data is opt-out (like other open-source tools, e.g. VSCode, Homebrew, etc.) because we are using this anonymized, non-traceable data to improve our product and your experience with using Gatsby.

We've tried to roll out this telemetry data in as transparent a way as possible, and so we've also detailed the type of data we are collecting, why we're collecting data, and of course, how to opt out, if you so choose.

I appreciate your concern here--more than you know!--so thanks for opening this.

We'll happily discuss this further if there are more actionable ways we can improve the clarity and communication strategy of this feature.

Thank you!

@DSchau DSchau closed this as completed Mar 28, 2019
@doolittle
Copy link
Author

While VSCode also has opt-out, rather than opt-in, data collection, their notification persists until the user clicks out of it. While that is not opt-out, it at least requires deliberate action.

VSCode and Homebrew are also software installed by deliberate, individual action (VSCode by download, Homebrew by curl script) whereas Gatsby, distributed via npm, can be installed (or updated) without deliberate action by the individual being tracked.

I appreciate the RFC process but, like what I would assume to be the overwhelming majority of your developer audience, I first heard of data tracking in Gatsby through the small notification (which I happened to not overlook) after a teammate updated our package.json.

From your response, it seems that Gatsby, Inc. would like to normalize opt-out data tracking.

Can you provide examples of popular npm packages–i.e. software normally installed by developers via package.json–that do similar opt-out data tracking?

@Yomguithereal
Copy link

I also agree that this kind of data collection should really be opt-in, rather than opt-out while I totally agree with the following from @doolittle:

It is undeniably useful to collect usage information for product development, and those that want to contribute to product development by virtue of their usage should be able to opt-into that feature.

I know that you are on the edge of what can be allowed by GDPR here, but my institution cannot take this kind of risk and this needlessly complexifies ops code to make sure your telemetry cannot run in our production environments.

I'll choose to trust @doolittle on this statement:

The Telemetry documentation notes that it collects personal information such as: unique machine id, hardware specifications, timestamp, working directory, and command run. The Telemetry code is here.

and if so, this is a bit distressing from an operational security standpoint.

Sorry for the negativity, thanks for your outstanding work on Gatsby, and tell me if I can be of any help.

@j127
Copy link
Contributor

j127 commented May 27, 2019

This should definitely be opt-in only. VS Code is not a good model. Microsoft's privacy policy is appalling, and I've removed vscode from my computer because of it. This kind of tracking behavior should not be normalized.

Edit: today is my first time installing Gatsby.

@Bradcomp
Copy link

I just ran into this last night. I have used Gatsby for work before, and was going to use it to set up my blog. I was shocked when the notification about telemetry popped up, and promptly uninstalled and started to do a little research into the background of the change, which led me here.

As we know from many years of security vulnerabilities, the definition of 'anonymized, non-traceable data' changes over time. Due to our inability to predict the future, this means there is at least some risk whenever a company is collecting user data.

This update doesn't appear to be something being requested by the users, but instead is being driven by the organization. Most OS projects use issues and PRs to measure how people are interacting with the software. While this is certainly not ideal, it works, and it respects the users - both by not sending their data to remote servers, and by allowing them the opportunity to participate in the shaping of the project directly.

I don't use VSCode, but I do use Homebrew, with analytics turned off. Looking at the page you linked, there are a number of differences from the Gatsby rollout, in addition to the very valid comments made by @doolittle above:

  • Homebrew is entirely run by volunteers
  • Homebrew doesn't collect the data directly, but sends it to a third party (Google Analytics)
  • Related, Google Analytics doesn't give access to the unique identifiers
  • Homebrew appears to collect less data, though I am not sure on this point
  • Homebrew developers don't have access to the raw data
  • Homebrew publishes the analytics it collects publicly
  • Homebrew publishes their data retention policy for the telemetry data
  • Homebrew has a link to their analytics data on their homepage

All of these increase my trust that Homebrew is trying to do right, and recognizes the risk they are introducing to their users. Even so, I suspect Homebrew is in the minority when it comes to most volunteer OS projects, and that no telemetry, or opt-in telemetry, is the norm.

We'll happily discuss this further if there are more actionable ways we can improve the clarity and communication strategy of this feature.

Ideally the answer to this is, of course, opt-in telemetry. This appears to be off the table unfortunately, given the existing discussion. As for transparency, it looks like there was no blog post about the rollout, nor am I able to navigate to the telemetry page without searching for it, the link appears to be well hidden. Further, the Privacy Policy has not been updated since the rollout. There wasn't even a major version bump! Fixing these would improve the communication strategy moving forward, but some things - like the version bump - are water under the bridge. In addition, phrases like "The access to the raw data is highly controlled, and we cannot identify individual users from the dataset" are unverifiable by the users and rely on an implicit trust of the company handling our data.

I would imagine it's tricky to build a company around an open source project, and I get the need to figure out what the users want. I think Gatsby is a great tool, and I appreciate the work that has gone into it. It was certainly my first choice for creating a static site up until I got the notification. I would like to be able to use GatsbyJS - the open source project - without having to deal with Gatsby - the private company. Without that ability I would prefer to just find a more user-friendly alternative.

Thanks for your time.

@lostpebble
Copy link

Installed Gatsby today. The message about tracking popped up for literally 3 seconds, I could read the first couple bits and then it disappeared completely and was replaced by all the other logging.

To not have that message actually remain in the console so that people can actually read it is a bit ridiculous and even comes off as malicious. I hope that is not the case and this was completely unintended and this is a bug in the latest version. But still, how many people have unknowingly installed Gatsby without being aware of any tracking going on at all? I feel lucky that I was actually focused on the console during that time and actually caught it before it disappeared. I then searched on Google and found this issue.

You have a responsibility to your users to very clearly let them know about tracking or you could face serious repercussions, especially in EU states. In fact, it's becoming illegal to have default opt-in, and putting the onus on users to purposefully opt-out, for many things concerning privacy / tracking now.

This is not a good look. I will give the benefit of the doubt and say that perhaps this is all new and you set things up naively believing it to be in the best interest of users. But you have been informed by multiple users now of this issue, so please rectify it.

You actually don't even give users an easy way to opt-out at the moment. The only way is to dig into documentation and find the config settings, which I've now done.

By far the best way to handle this would be simple question upon installing Gatsby / your first site:

Gatsby contains a telemetry feature that collects anonymous usage
information that is used to help improve Gatsby for all users.

Do you want to enable telemetry? (Y/N):

@polarathene
Copy link
Contributor

Chiming in with additional gripe, likely a bug and may have been fixed since my last update of the CLI, but I just looked into why my system was being sluggish and found 37 node processes using a constant 1% of CPU all from gatsby-telemetry/send.js. Presumably something went wrong in that code each time I ran a gatsby command that triggered the telemetry to run, and the process didn't exit properly.

As a user, there's a lot of CLI output sometimes and I've not recalled seeing anything about telemetry in my projects while running gatsby commands. Just checked and see nothing in develop, build, serve, info. Apparently I should be seeing something about telemetry in output? (perhaps the error hanging those node processes is why I don't)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants