Skip to content
This repository has been archived by the owner on Feb 22, 2022. It is now read-only.

Allowing community charts to collect analytics by default #4697

Closed
bacongobbler opened this issue Apr 4, 2018 · 14 comments
Closed

Allowing community charts to collect analytics by default #4697

bacongobbler opened this issue Apr 4, 2018 · 14 comments
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@bacongobbler
Copy link
Member

bacongobbler commented Apr 4, 2018

Is this a BUG REPORT or FEATURE REQUEST? (choose one): neither; it's a discussion point for how we should handle vendors asking to collect analytics by default in their charts.

Some background context can be found in these issues:

NOTE: this discussion is not about how the Helm core maintainers or the Helm Charts maintainers collect analytics, but to allow the charts themselves to collect analytics for the vendor, enabled by default.

@prydonius
Copy link
Member

From #4450:

It sounds like we need to have a deeper discussion around the use of tracking in charts. I suggest we have this discussion at the next Office Hours (next Tues 10th April @ 9am PT) cc @kubernetes/charts-maintainers.

@tamalsaha
Copy link
Collaborator

Sorry, I can't attend zoom meeting. Please have this discussion in github issues or mailing list. This will allow anyone else to follow/respond this issue on their own pace.

Regarding what analytics data we collect, this is the actual logic, https://github.com/appscode/stash/blob/master/pkg/cmds/root.go#L36 .

Say, my cli has a command mycli get pods. Then we compute a client_id by taking md5 hash of cluster's master ip. Then we send an event to GA where category=myapp, action=get/apps, version=git-version . Obviously GA can see ip address from where this event is coming. But that info is not available to us in raw format, as far as I know.

Obviously Helm/Charts has to decide a more general question analytics collection question. I am happy to explain any further question regarding this.

@bacongobbler
Copy link
Member Author

I think the discussion so far has been:

  1. we want to have a discussion face-to-face, as the conversation is more suited to a face-to-face discussion
  2. we'll circle the thoughts and ideas back here to loop in the community/anyone who was unable to attend so everyone's on the same page

The plan isn't to make a final call about the issue but to have a deeper discussion around it for those that are interested. Everyone else who cannot make the call can happily put in their 2 cents here and follow at their own pace.

@mattfarina
Copy link
Contributor

There are a few things to know. First, we are already capturing information about every request to get a chart from the stable and incubator repos. @viglesiasce enabled this some time ago.

This lead us to a problem where we couldn't tell what data came from where. For example, was it kubeapps pulling the charts, which is does periodically, or some other place? Are there CI systems in the loop pulling things regularly? This is what lead us to add user agent strings to helm and kubeapps.

Scrolling through the user agent strings we still see mostly Go and axios user agents. @prydonius do you know anything about the axios user agent?

It would be useful to identify the kinds of data we want given the variety of sources.

Note, while I have access to view the tables of data (in bigquery) I can't query them or share. I don't have those permissions.

@mattfarina
Copy link
Contributor

I see, there is a tangential question on the analytics I'd previous worked with. We should talk face to face and I don't have an opinion at this moment. Need to go off and form one myself.

@tamalsaha
Copy link
Collaborator

tamalsaha commented Apr 4, 2018

Thanks @mattfarina . Since I won't be able to make the zoom meeting, I am going to put my thoughts here.

To me this is akin to how app developers collect analytics on Apple AppStore or Google Play. I am no authority on those marketplaces but this is what I understand. I see app store provider as a facilitator between app developers and app users. So, I think stable charts can say that to publish charts here, app publisher must clearly tell users how analytics is collected. And if user opts out, analytics must not be collected. In fact, stable chart might say that app publishers must have a "analytics policy" like Homebrew project or like code-of-conduct thing. But whether this is a default opt-in or opt-out in a "contract" between app developer and their users and as facilitator stable charts has business in dictating the setting there.

I would also point out how "advertiser id" concept works in Google play : https://support.google.com/googleplay/android-developer/answer/6048248?hl=en

Edit: It could be even a field in Charts.yaml like "privacy-policy" etc.

@viglesiasce
Copy link
Contributor

I don't think we should be in the business of policing whether a chart developer collects data or not. Implicitly any hosted service is collecting data from its users and we host some of those. We also have the spartakus chart in our repo whose explicit goal is to collect data.

From a "least surprise" perspective, we can recommend that they add a warning to the NOTES.txt to let their users know that it has been enabled.

@timstoop
Copy link
Contributor

timstoop commented Apr 5, 2018

To be perfectly clear, my problem is not with gathering the metrics from an application. I'm well aware of how helpful that can be and how benign it usually is. My problem is with enabling it by default. I rather see a construction where the end-user is simply forced to make the decision to send the metrics or not, instead of defaulting to yes. The issue itself is not with gathering metrics, it's with "what are sane defaults". And I don't consider "you can gather my metrics" a good default.

That said, I rather just have a stance from the project on this, so we can move on. I'm not much of a privacy advocate, as long as we make it perfectly clear to the end user what to expect from the charts that we (as a project) curate.

@bacongobbler
Copy link
Member Author

bacongobbler commented Apr 10, 2018

Totally agree with @viglesiasce's point about not being in the business of policing what chart developers want to do with their charts. Regardless of my feelings on the subject, it should be up to the vendor to make that choice for themselves.

However, I do believe there is a need to give vendors a way to expose this information back to the user, either in NOTES.txt or otherwise. I feel very strongly about this especially in regards to today's world where there's constant growing concerns around how companies handle PII and what they can do to protect themselves from those companies collecting their data. To @tamalsaha's point, this is also how applications on the Android app store expose this information: before hitting "Install", it's quite clear to the user that data (and from what sensors e.g. the camera) is being collected, as well as in-app prompts to enable certain sensors (in a sense, do you want this app to collect data from your camera or not).

One problem I mentioned to @prydonius last week with this information being present in NOTES.txt is that the notes will not be displayed if the chart was included as a subchart. @timstoop's approach sounds good to me, however that requires extra work on our end to co-ordinate the effort to prompt for a y/N answer before installing a chart. That being said, this feature might also be useful for other cases (like accepting Minecraft's EULA, for example).

@tamalsaha
Copy link
Collaborator

tamalsaha commented Apr 11, 2018

Was this discussed at the meeting today? What's the verdict?

@prydonius
Copy link
Member

It wasn't discussed at the office hours, but I will bring it up at next Tuesday's call, unless we can come to some consensus before then.

I'm also in agreement with @viglesiasce, but the issue @bacongobbler brings up with NOTES.txt is a problem. Unfortunately I'm not sure if there's a good way around that right now. We could at least try to ensure any charts depending on a chart that is known to collect data adds a note to their NOTES.txt so it does get seen, but I imagine that will be hard to keep track of.

@omkensey
Copy link
Contributor

omkensey commented Apr 17, 2018

I would like to see something like what CoreOS did, where all metrics-gathering apps use the same value flag with the same syntax to allow or disallow metrics. It would be hard to actually police, but something that says "if your chart is for an application that gathers metrics, it must allow the user to turn them all off by setting this global value to false" would at least help keep honest people honest. We could even say that charts for apps that gather metrics that can't be turned off must not install the app if the designated flag is set to false (similar to what the Minecraft chart does if the EULA-acceptance value is not set to true).

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 16, 2018
@stale
Copy link

stale bot commented Aug 8, 2018

This issue is being automatically closed due to inactivity.

@stale stale bot closed this as completed Aug 8, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

9 participants