Proposal: Include very basic tracking of usage by default #55

aronchick · 2017-12-20T20:58:55Z

Using something like Spartakus (https://github.com/kubernetes-incubator/spartakus), ping back to a central server information about the Kubeflow deployment once per day. It should be absolutely anonymous, with zero PII. Just how many components are deployed, and how many pods are running - with a unique identifier to track deployments that last for more than one day.

We should also enable opting out with a single flag, something like --report-metrics=false during ksonnet deployment.

jlewi · 2017-12-20T23:15:54Z

This looks pretty easy to setup.

Setup a GKE cluster running the collector
Add the volunteer component to our ksonnet core package with an option to disable it.

jlewi · 2018-02-15T00:16:58Z

Created the project kubeflow.org/kubeflow-usage

Create the cluster

gcloud container clusters create --project=kubeflow-usage reporting --zone=us-central1-c

Reserve a static IP

gcloud compute --project=kubeflow-usage addresses create stats-collector --global

jlewi · 2018-02-15T00:18:34Z

Created a DNS record to associate stats-collector.kubeflow.org with the static IP.

erikerlandson · 2018-02-22T23:20:41Z

I feel obligated to mention that modern ML technology (irony!) has demonstrated the ability to infer PII from patterns in data that have no literal PII in them. To be clear, when I look at the information currently broadcast by spartakus, I can't off the top of my head imagine a scenario for how that would happen here. OTOH that's what ML is good at, exploiting patterns humans can't directly perceive.

And yes, users can opt out :)

elmiko · 2018-02-22T23:23:27Z

And yes, users can opt out :)

+1

erikerlandson · 2018-02-22T23:25:44Z

Is there a writeup anywhere that gives examples of the various stats that spartakus will collect, and how we plan to use those to improve Kubeflow roadmapping?

jlewi · 2018-02-23T00:18:37Z

@erikerlandson https://github.com/kubernetes-incubator/spartakus describes the basic metrics collected; these are all generic K8s metrics that aren't Kubeflow specific.

So I think the immediate use for these metrics is so that contributors to Kubeflow can demonstrate impact and justify further investment.

I think the next step would be to collect more specific Kubeflow metrics to see which components are being used.

erikerlandson · 2018-02-23T17:45:14Z

@jlewi so iiuc, the idea is to demonstrate that Kubeflow is being used in the wild? As in "our metrics show that xxx Kubeflow clusters are reporting in, and here is a plot of Kubeflow cluster reports over time"

If I'm reading the report definitions right, it's reporting total resources available on nodes in a cluster. Like "here is a node that has 1TB of RAM" as opposed to "here is a pod using 200MB of RAM"

aronchick · 2018-02-23T18:31:07Z

Correct - getting clouds in use and total resources makes a huge difference to how we prioritize (e.g. Hey, did you see there are a bunch of 20 node clusters running on OpenShift? Are we supporting everything properly?)

…

On Fri, Feb 23, 2018 at 7:45 AM Erik Erlandson ***@***.***> wrote: @jlewi <https://github.com/jlewi> so iiuc, the idea is to demonstrate that Kubeflow is being used in the wild? As in "our metrics show that xxx Kubeflow clusters are reporting in, and here is a plot of Kubeflow cluster reports over time" If I'm reading the report definitions right, it's reporting total resources available on nodes in a cluster. Like "here is a node that has 1TB of RAM" as opposed to "here is a pod using 200MB of RAM" — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#55 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AADIdSGHclOhUzZ1fltNvYHcDQrdh2Dqks5tXvkrgaJpZM4RI9BN> .

jlewi · 2018-02-24T00:30:32Z

@erikerlandson An obvious metric to track would be deployments of different versions of Kubeflow. This will help us making informed decisions about breaking changes and how much effort to spend supporting older versions.

mattf · 2018-02-24T00:31:58Z

three things should be present for something like this to work.

data donated to the community, for the benefit of the community, needs to be available to the community. for instance, data readily available to anyone going to kubeflow.org.
there should be a clear value proposition for the community. for instance, being able to connect with others who are using similar projects or are in similar locations, or clear use of the data for improvement of the project, which may take some time to demonstrate.
it should be opt-in.

the first two go to the social contract established.

the last is my personal position, and i'm usually mollified by a strong social contract, clear indication that the data is collected, a trivial opt-out option.

jlewi · 2018-02-24T01:11:54Z

100% on board with the first 2. One of the main reasons we want to collect this data is to build trust in Kubeflow by showing that companies/individuals investing in Kubeflow are extending their reach.

I'm strongly in favor of starting with opt out and seeing what users think. We're still in alpha/experimental so I think that's very reasonable.

If we're opt out we'll get much higher participation just because its the default option.

aronchick · 2018-02-25T04:56:14Z

+1 with 100% about the first two - this should absolutely be available and build trust.

I think we're saying the same thing on #3 - specifically, Matthew has said (which I support), that we have a strong social contract and trivial opt out.

Trivial opt out is done (just one command, and it's gone). What does a clear social contract look like?

mattf · 2018-02-26T19:38:25Z

the social contract is embodied in doing (0) and (1).

gsunner · 2018-02-27T10:43:50Z

I agree with trust and transparency should be the main goal.

We are also looking to get some basic usage tracking on our project seldon-core using spartakus.
We also have the same issue of whether to have the usage tracking on by default with an easy opt-out.

As we are in the process of integrating Seldon and Kubeflow, we would also want to take advantage of any global flag for an 'opt-out' of all tracking.

Also as you are proposing to share collected data with the community - we may not need to collect the same data as long as usage of Kubeflow related components such as Seldon is also available.

jlewi · 2018-02-27T15:01:35Z

It seems like the consensus is that collecting metrics is a good thing.

Let start with ~~opt in~~ opt out and see what users say. If people would strongly prefer opt out we can change.

@gsunner My hope is that in follow on PRs we can include additional metadata to break down usage by component.

Does someone want to approve the actual PR?

aronchick · 2018-02-27T17:05:41Z

On the record, I would prefer opt-out (e.g. you have to execute a command to NOT report). We really are collecting almost nothing, and it's SO trivial to turn off (one command!). But, as always, this is a community decision.

…

On Tue, Feb 27, 2018 at 7:01 AM Jeremy Lewi ***@***.***> wrote: It seems like the consensus is that collecting metrics is a good thing. Let start with opt in and see what users say. If people would strongly prefer opt out we can change. @gsunner <https://github.com/gsunner> My hope is that in follow on PRs we can include additional metadata to break down usage by component. Does someone want to approve the actual PR? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#55 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AADIdRqm_FckRx9LMRS0piY17YShI-Otks5tZBjQgaJpZM4RI9BN> .

jlewi · 2018-02-27T17:54:51Z

@aronchick That was a typo on my part. I agree with you about making it opt out by default.

mattf · 2018-02-28T13:04:39Z

opt-in is my personal view.

i agree that opt-out is a reasonable starting point for the community, especially if we make it clear we're collecting, make it clear how to opt out, share the data with the community, and demonstrate ways we use the data to benefit the community.

i don't think all those things must, or even can, be done before proceeding.

let's proceed in good faith.

the kubeflow-discuss post has given this heightened attention for a week now. i propose this be on the agenda for the next community meeting and give until the following day for comments before proceeding w/ opt-out.

aronchick · 2018-02-28T17:37:27Z

Works for me! Adding to the agenda.

…

On Wed, Feb 28, 2018 at 5:04 AM Matthew Farrellee ***@***.***> wrote: opt-in is my personal view. i agree that opt-out is a reasonable starting point for the community, especially if we make it clear we're collecting, make it clear how to opt out, share the data with the community, and demonstrate ways we use the data to benefit the community. i don't think all those things must, or even can, be done before proceeding. let's proceed in good faith. the kubeflow-discuss post has given this heightened attention for a week now. i propose this be on the agenda for the next community meeting and give until the following day for comments before proceeding w/ opt-out. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#55 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AADIdfn4nmomiDKX11tYheubirKKDaLoks5tZU7ngaJpZM4RI9BN> .

jlewi · 2018-03-01T03:41:05Z

@mattf Sounds good. I've updated the PR to make it opt in for now and updated the instructions to include the commands to opt in (and make it clear you can skip them).

elmiko · 2018-03-01T11:36:02Z

opt-in is my personal view.

same for me, thanks for updating the PR @jlewi

Use Kubernetes reporting tool (spartakus) to report anonymous statistics about Kubeflow usage such as basic cluster stats. This is optional and we're making it opt in for now. One current limitation is that there's no easy way to give each kubeflow deployment a unique, random id, so it will be hard to distinguish different deployments. users can manually assign a unique id. We could potentially modify spartakus (or the Docker image) to generate a random id. The one downside of this is that the id would be regenerated if the pod restarts. Related to #55

jlewi · 2018-03-05T20:12:53Z

PR has been submitted with opt in.

I have created a group
data-analysts@kubeflow.org
to give access to the data in BigQuery to folks preparing reports. I've given access to @chrisheecho who's been doing some of our data analysis and who I'm going to ask to prepare some initial reports.

I can share access with other folks who will be working on preparing reports for the community.

I'll also open up an issue on whether we should make the raw data open to all.

inc0 · 2018-03-06T16:40:28Z

As I said on meeting, even opt-in is iffy for me. This can be security risk and well, damages from these can be hard to recover from. Another thing would be usefulness of this data. We can see scale of cluster people use, but how much of it is kubeflow? We can add footnote that if you're willing to run spartacus, that's our endpoint and thank you:)

I'd rather create google doc (?) questionnaire that we can modify and ask open questions tailored to actually improve our project. If we put scale brackets rather than number of nodes, that's easier to convince operators to share this info etc.

mhausenblas · 2018-03-06T16:48:37Z

I'm for opt-in (with very clear strong red-blink notice at install time) and while a questionnaire like suggested by @inc0 sounds nice I believe the point is automation so I don't think the folks who want the data for planning or whatever reasons would prefer that option (understandably so).

mhausenblas · 2018-03-06T20:10:45Z

After having reviewed the kubernetes-incubator/spartakus source code now I do have a question: given that it has a hard dependency on BigQuery, how are folks supposed to use this behind a firewall, in an on-premises setup?

Don't get me wrong I love and admire BigQuery—heck, a long time ago I even contributed to the open source version of the underlying engine called Dremel, that is, Apache Drill—but I really wouldn't know how I'd explain someone who wanted to set up Kubeflow in a stand-alone fashion that in order to do so she needs a BigQuery account and can't really use Kubeflow "off-line" with telemetry enabled. Please tell me I'm missing something obvious here?

mhausenblas · 2018-03-07T15:48:33Z

I asked around a bit and Tim confirmed Spartakus is a PoC and so I think, since we've apparently decided to adopt it, it would make sense to do it properly ;)

I've reached out to Tim to see how I can get involved so that if we have needs (for example, my interest for on-prem deployments is to allow for alternative back-ends) we can meet them in a timely fashion. WDYT @jlewi @aronchick?

aronchick · 2018-03-07T17:09:42Z

+1 on automation. Sadly, response rates are VERY low for even great surveys, and most customers may not even be aware of how much it's being used. \I'm happy to strip out ANYTHING that feels PII-ish (even vaguely so), but we really need more information about how KF is being used, and given the precedent in the open source community (e.g. PopCon ( https://pypi.python.org/pypi/python-popcon/1.5.1), we felt like something that was this minimal would be a good fit. I'm happy to provide a service (and open source the service) for further 1-way hashing the info if that's helpful - and we'll be happy to contribute Google security team review/auditing. In re: on-prem, part of the idea is that we're able to track how this is being used even on-prem. The fact that it uses a centralized logging system (BQ) is a feature, not a bug, because otherwise how would we aggregate? Because opting out is SO trivial, we're hoping that it doesn't cause any issues. I think I might be missing the point in re: using KF offline - did you mean you think that users would like to aggregate all the KF deployed across their enterprise in an offline way? What an interesting (and exciting) proposition! I love the idea of exploring that.

…

On Wed, Mar 7, 2018 at 7:48 AM Michael Hausenblas ***@***.***> wrote: I asked around a bit and Tim confirmed <https://twitter.com/thockin/status/971409137690034176> Spartakus is a PoC and so I think, since we've apparently decided to adopt it, it would make sense to do it properly ;) I've reached out to Tim to see how I can get involved so that if we have needs (for example, my interest for on-prem deployments is to allow for alternative back-ends) we can meet them in a timely fashion. WDYT @jlewi <https://github.com/jlewi> @aronchick <https://github.com/aronchick>? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#55 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AADIdeCAa6PW2vYv-dmUTtAWvfAl5PUeks5tcAFSgaJpZM4RI9BN> .

mhausenblas · 2018-03-08T10:45:44Z

Thanks @aronchick.

In re: on-prem, part of the idea is that we're able to track how this is
being used even on-prem. The fact that it uses a centralized logging system
(BQ) is a feature, not a bug, because otherwise how would we aggregate?
Because opting out is SO trivial, we're hoping that it doesn't cause any
issues.

Yes, I get that and I hope you remember that we actually decided on an opt-in policy ;)

I think I might be missing the point in re: using KF offline - did you mean
you think that users would like to aggregate all the KF deployed across
their enterprise in an offline way? What an interesting (and exciting)
proposition! I love the idea of exploring that.

That is exactly what I mean, apologies for not being able to communicate that better. We're all guilty of having a bit of a tunnel vision as we're living in a bubble where we take the tools in our org for granted, but you can trust me, I've been in enough situations with users/customers that went like: "what do you mean, technology X is hard-wired and can't be replaced?" not gonna use/buy it …

FWIW, I'm in touch with @thockin concerning Spartakus, will raise issues there and see how I can help in refactoring and extending the plug-able backend stuff with the goal to have a reliable component we can ship with Kubeflow. Hope that makes sense?

aronchick · 2018-03-09T23:57:45Z

Makes perfect sense - LMK how to help!

…

On Thu, Mar 8, 2018 at 2:45 AM Michael Hausenblas ***@***.***> wrote: Thanks @aronchick <https://github.com/aronchick>. In re: on-prem, part of the idea is that we're able to track how this is being used even on-prem. The fact that it uses a centralized logging system (BQ) is a feature, not a bug, because otherwise how would we aggregate? Because opting out is SO trivial, we're hoping that it doesn't cause any issues. Yes, I get that and I hope you remember that we actually decided on an opt-in policy ;) I think I might be missing the point in re: using KF offline - did you mean you think that users would like to aggregate all the KF deployed across their enterprise in an offline way? What an interesting (and exciting) proposition! I love the idea of exploring that. That is exactly what I mean, apologies for not being able to communicate that better. We're all guilty of having a bit of a tunnel vision as we're living in a bubble where we take the tools in our org for granted, but you can trust me, I've been in enough situations with users/customers that went like: "what do you mean, technology X is hard-wired and can't be replaced?" not gonna use/buy it … FWIW, I'm in touch with @thockin <https://github.com/thockin> concerning Spartakus, will raise issues there and see how I can help in refactoring and extending the plug-able backend stuff with the goal to have a reliable component we can ship with Kubeflow. Hope that makes sense? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#55 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AADIdfY5dKvYODjxhPhF0tujosITvDnxks5tcQvZgaJpZM4RI9BN> .

mhausenblas · 2018-03-10T11:22:45Z

@aronchick for now I think we should be good, thanks. I'm trying to get involved in Spartakus to ensure that it's a stable and reliable component for our needs, for starters I'm focusing on improving the docs, see kubernetes-retired/spartakus#31 and then we'll see how merciful Mr @thockin is with my refactoring PRs ;)

jlewi · 2018-03-19T01:08:23Z

@mhausenblas The spartakus collector defines an interface that abstracts away the database. So if someone wanted to support a DB other than BigQuery it should be pretty straightforward.

jlewi · 2018-03-19T01:11:13Z

Per the discussion in this thread, we are now collecting metrics opt-in. This is described in our instructions
https://github.com/kubeflow/kubeflow#steps

So I'm closing this issue.

@mhausenblas thanks for chipping in on spartakus that will be very useful.

Signed-off-by: Ce Gao <gaoce@caicloud.io>

jlewi added this to the Kubecon Europe milestone Jan 29, 2018

jlewi self-assigned this Jan 29, 2018

This was referenced Jan 30, 2018

Deploy spartakus collector kubeflow/reporting#2

Merged

Use spartakus to report usage metrics #175

Merged

ukclivecox mentioned this issue Feb 19, 2018

Add usage metrics collector SeldonIO/seldon-core#99

Closed

jlewi mentioned this issue Mar 5, 2018

Initial report for spartakus metrics #351

Closed

jlewi added the priority/p2 label Mar 7, 2018

jlewi closed this as completed Mar 19, 2018

kimwnasptd pushed a commit to arrikto/kubeflow that referenced this issue Mar 5, 2019

Fix the github org file. (kubeflow#55)

5ff52a9

yanniszark pushed a commit to arrikto/kubeflow that referenced this issue Feb 15, 2021

main.go: Fix style (kubeflow#55)

eb61c8e

Signed-off-by: Ce Gao <gaoce@caicloud.io>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Include very basic tracking of usage by default #55

Proposal: Include very basic tracking of usage by default #55

aronchick commented Dec 20, 2017

jlewi commented Dec 20, 2017

jlewi commented Feb 15, 2018

jlewi commented Feb 15, 2018

erikerlandson commented Feb 22, 2018

elmiko commented Feb 22, 2018

erikerlandson commented Feb 22, 2018

jlewi commented Feb 23, 2018

erikerlandson commented Feb 23, 2018

aronchick commented Feb 23, 2018 via email

jlewi commented Feb 24, 2018

mattf commented Feb 24, 2018

jlewi commented Feb 24, 2018

aronchick commented Feb 25, 2018

mattf commented Feb 26, 2018

gsunner commented Feb 27, 2018

jlewi commented Feb 27, 2018 •

edited

aronchick commented Feb 27, 2018 via email

jlewi commented Feb 27, 2018

mattf commented Feb 28, 2018

aronchick commented Feb 28, 2018 via email

jlewi commented Mar 1, 2018

elmiko commented Mar 1, 2018

jlewi commented Mar 5, 2018

inc0 commented Mar 6, 2018 •

edited

mhausenblas commented Mar 6, 2018 •

edited

mhausenblas commented Mar 6, 2018

mhausenblas commented Mar 7, 2018

aronchick commented Mar 7, 2018 via email

mhausenblas commented Mar 8, 2018

aronchick commented Mar 9, 2018 via email

mhausenblas commented Mar 10, 2018

jlewi commented Mar 19, 2018

jlewi commented Mar 19, 2018

Proposal: Include very basic tracking of usage by default #55

Proposal: Include very basic tracking of usage by default #55

Comments

aronchick commented Dec 20, 2017

jlewi commented Dec 20, 2017

jlewi commented Feb 15, 2018

jlewi commented Feb 15, 2018

erikerlandson commented Feb 22, 2018

elmiko commented Feb 22, 2018

erikerlandson commented Feb 22, 2018

jlewi commented Feb 23, 2018

erikerlandson commented Feb 23, 2018

aronchick commented Feb 23, 2018 via email

jlewi commented Feb 24, 2018

mattf commented Feb 24, 2018

jlewi commented Feb 24, 2018

aronchick commented Feb 25, 2018

mattf commented Feb 26, 2018

gsunner commented Feb 27, 2018

jlewi commented Feb 27, 2018 • edited

aronchick commented Feb 27, 2018 via email

jlewi commented Feb 27, 2018

mattf commented Feb 28, 2018

aronchick commented Feb 28, 2018 via email

jlewi commented Mar 1, 2018

elmiko commented Mar 1, 2018

jlewi commented Mar 5, 2018

inc0 commented Mar 6, 2018 • edited

mhausenblas commented Mar 6, 2018 • edited

mhausenblas commented Mar 6, 2018

mhausenblas commented Mar 7, 2018

aronchick commented Mar 7, 2018 via email

mhausenblas commented Mar 8, 2018

aronchick commented Mar 9, 2018 via email

mhausenblas commented Mar 10, 2018

jlewi commented Mar 19, 2018

jlewi commented Mar 19, 2018

jlewi commented Feb 27, 2018 •

edited

inc0 commented Mar 6, 2018 •

edited

mhausenblas commented Mar 6, 2018 •

edited