New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Include very basic tracking of usage by default #55
Comments
This looks pretty easy to setup.
|
Created the project kubeflow.org/kubeflow-usage Create the cluster
Reserve a static IP
|
Created a DNS record to associate |
I feel obligated to mention that modern ML technology (irony!) has demonstrated the ability to infer PII from patterns in data that have no literal PII in them. To be clear, when I look at the information currently broadcast by spartakus, I can't off the top of my head imagine a scenario for how that would happen here. OTOH that's what ML is good at, exploiting patterns humans can't directly perceive. And yes, users can opt out :) |
+1 |
Is there a writeup anywhere that gives examples of the various stats that spartakus will collect, and how we plan to use those to improve Kubeflow roadmapping? |
@erikerlandson https://github.com/kubernetes-incubator/spartakus describes the basic metrics collected; these are all generic K8s metrics that aren't Kubeflow specific. So I think the immediate use for these metrics is so that contributors to Kubeflow can demonstrate impact and justify further investment. I think the next step would be to collect more specific Kubeflow metrics to see which components are being used. |
@jlewi so iiuc, the idea is to demonstrate that Kubeflow is being used in the wild? As in "our metrics show that xxx Kubeflow clusters are reporting in, and here is a plot of Kubeflow cluster reports over time" If I'm reading the report definitions right, it's reporting total resources available on nodes in a cluster. Like "here is a node that has 1TB of RAM" as opposed to "here is a pod using 200MB of RAM" |
Correct - getting clouds in use and total resources makes a huge difference
to how we prioritize (e.g. Hey, did you see there are a bunch of 20 node
clusters running on OpenShift? Are we supporting everything properly?)
…On Fri, Feb 23, 2018 at 7:45 AM Erik Erlandson ***@***.***> wrote:
@jlewi <https://github.com/jlewi> so iiuc, the idea is to demonstrate
that Kubeflow is being used in the wild? As in "our metrics show that xxx
Kubeflow clusters are reporting in, and here is a plot of Kubeflow cluster
reports over time"
If I'm reading the report definitions right, it's reporting total
resources available on nodes in a cluster. Like "here is a node that has
1TB of RAM" as opposed to "here is a pod using 200MB of RAM"
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#55 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AADIdSGHclOhUzZ1fltNvYHcDQrdh2Dqks5tXvkrgaJpZM4RI9BN>
.
|
@erikerlandson An obvious metric to track would be deployments of different versions of Kubeflow. This will help us making informed decisions about breaking changes and how much effort to spend supporting older versions. |
three things should be present for something like this to work.
the first two go to the social contract established. the last is my personal position, and i'm usually mollified by a strong social contract, clear indication that the data is collected, a trivial opt-out option. |
100% on board with the first 2. One of the main reasons we want to collect this data is to build trust in Kubeflow by showing that companies/individuals investing in Kubeflow are extending their reach. I'm strongly in favor of starting with opt out and seeing what users think. We're still in alpha/experimental so I think that's very reasonable. If we're opt out we'll get much higher participation just because its the default option. |
+1 with 100% about the first two - this should absolutely be available and build trust. I think we're saying the same thing on #3 - specifically, Matthew has said (which I support), that we have a strong social contract and trivial opt out. Trivial opt out is done (just one command, and it's gone). What does a clear social contract look like? |
the social contract is embodied in doing (0) and (1). |
I agree with trust and transparency should be the main goal. We are also looking to get some basic usage tracking on our project seldon-core using spartakus. As we are in the process of integrating Seldon and Kubeflow, we would also want to take advantage of any global flag for an 'opt-out' of all tracking. Also as you are proposing to share collected data with the community - we may not need to collect the same data as long as usage of Kubeflow related components such as Seldon is also available. |
It seems like the consensus is that collecting metrics is a good thing. Let start with @gsunner My hope is that in follow on PRs we can include additional metadata to break down usage by component. Does someone want to approve the actual PR? |
On the record, I would prefer opt-out (e.g. you have to execute a command
to NOT report). We really are collecting almost nothing, and it's SO
trivial to turn off (one command!).
But, as always, this is a community decision.
…On Tue, Feb 27, 2018 at 7:01 AM Jeremy Lewi ***@***.***> wrote:
It seems like the consensus is that collecting metrics is a good thing.
Let start with opt in and see what users say. If people would strongly
prefer opt out we can change.
@gsunner <https://github.com/gsunner> My hope is that in follow on PRs we
can include additional metadata to break down usage by component.
Does someone want to approve the actual PR?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#55 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AADIdRqm_FckRx9LMRS0piY17YShI-Otks5tZBjQgaJpZM4RI9BN>
.
|
@aronchick That was a typo on my part. I agree with you about making it opt out by default. |
opt-in is my personal view. i agree that opt-out is a reasonable starting point for the community, especially if we make it clear we're collecting, make it clear how to opt out, share the data with the community, and demonstrate ways we use the data to benefit the community. i don't think all those things must, or even can, be done before proceeding. let's proceed in good faith. the kubeflow-discuss post has given this heightened attention for a week now. i propose this be on the agenda for the next community meeting and give until the following day for comments before proceeding w/ opt-out. |
Works for me! Adding to the agenda.
…On Wed, Feb 28, 2018 at 5:04 AM Matthew Farrellee ***@***.***> wrote:
opt-in is my personal view.
i agree that opt-out is a reasonable starting point for the community,
especially if we make it clear we're collecting, make it clear how to opt
out, share the data with the community, and demonstrate ways we use the
data to benefit the community.
i don't think all those things must, or even can, be done before
proceeding.
let's proceed in good faith.
the kubeflow-discuss post has given this heightened attention for a week
now. i propose this be on the agenda for the next community meeting and
give until the following day for comments before proceeding w/ opt-out.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#55 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AADIdfn4nmomiDKX11tYheubirKKDaLoks5tZU7ngaJpZM4RI9BN>
.
|
@mattf Sounds good. I've updated the PR to make it opt in for now and updated the instructions to include the commands to opt in (and make it clear you can skip them). |
same for me, thanks for updating the PR @jlewi |
Use Kubernetes reporting tool (spartakus) to report anonymous statistics about Kubeflow usage such as basic cluster stats. This is optional and we're making it opt in for now. One current limitation is that there's no easy way to give each kubeflow deployment a unique, random id, so it will be hard to distinguish different deployments. users can manually assign a unique id. We could potentially modify spartakus (or the Docker image) to generate a random id. The one downside of this is that the id would be regenerated if the pod restarts. Related to #55
PR has been submitted with opt in. I have created a group I can share access with other folks who will be working on preparing reports for the community. I'll also open up an issue on whether we should make the raw data open to all. |
As I said on meeting, even opt-in is iffy for me. This can be security risk and well, damages from these can be hard to recover from. Another thing would be usefulness of this data. We can see scale of cluster people use, but how much of it is kubeflow? We can add footnote that if you're willing to run spartacus, that's our endpoint and thank you:) I'd rather create google doc (?) questionnaire that we can modify and ask open questions tailored to actually improve our project. If we put scale brackets rather than number of nodes, that's easier to convince operators to share this info etc. |
I'm for opt-in (with very clear strong red-blink notice at install time) and while a questionnaire like suggested by @inc0 sounds nice I believe the point is automation so I don't think the folks who want the data for planning or whatever reasons would prefer that option (understandably so). |
After having reviewed the kubernetes-incubator/spartakus source code now I do have a question: given that it has a hard dependency on BigQuery, how are folks supposed to use this behind a firewall, in an on-premises setup? Don't get me wrong I love and admire BigQuery—heck, a long time ago I even contributed to the open source version of the underlying engine called Dremel, that is, Apache Drill—but I really wouldn't know how I'd explain someone who wanted to set up Kubeflow in a stand-alone fashion that in order to do so she needs a BigQuery account and can't really use Kubeflow "off-line" with telemetry enabled. Please tell me I'm missing something obvious here? |
I asked around a bit and Tim confirmed Spartakus is a PoC and so I think, since we've apparently decided to adopt it, it would make sense to do it properly ;) I've reached out to Tim to see how I can get involved so that if we have needs (for example, my interest for on-prem deployments is to allow for alternative back-ends) we can meet them in a timely fashion. WDYT @jlewi @aronchick? |
+1 on automation. Sadly, response rates are VERY low for even great
surveys, and most customers may not even be aware of how much it's being
used.
\I'm happy to strip out ANYTHING that feels PII-ish (even vaguely so), but
we really need more information about how KF is being used, and given the
precedent in the open source community (e.g. PopCon (
https://pypi.python.org/pypi/python-popcon/1.5.1), we felt like something
that was this minimal would be a good fit. I'm happy to provide a service
(and open source the service) for further 1-way hashing the info if that's
helpful - and we'll be happy to contribute Google security team
review/auditing.
In re: on-prem, part of the idea is that we're able to track how this is
being used even on-prem. The fact that it uses a centralized logging system
(BQ) is a feature, not a bug, because otherwise how would we aggregate?
Because opting out is SO trivial, we're hoping that it doesn't cause any
issues.
I think I might be missing the point in re: using KF offline - did you mean
you think that users would like to aggregate all the KF deployed across
their enterprise in an offline way? What an interesting (and exciting)
proposition! I love the idea of exploring that.
…On Wed, Mar 7, 2018 at 7:48 AM Michael Hausenblas ***@***.***> wrote:
I asked around a bit and Tim confirmed
<https://twitter.com/thockin/status/971409137690034176> Spartakus is a
PoC and so I think, since we've apparently decided to adopt it, it would
make sense to do it properly ;)
I've reached out to Tim to see how I can get involved so that if we have
needs (for example, my interest for on-prem deployments is to allow for
alternative back-ends) we can meet them in a timely fashion. WDYT @jlewi
<https://github.com/jlewi> @aronchick <https://github.com/aronchick>?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#55 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AADIdeCAa6PW2vYv-dmUTtAWvfAl5PUeks5tcAFSgaJpZM4RI9BN>
.
|
Thanks @aronchick.
Yes, I get that and I hope you remember that we actually decided on an opt-in policy ;)
That is exactly what I mean, apologies for not being able to communicate that better. We're all guilty of having a bit of a tunnel vision as we're living in a bubble where we take the tools in our org for granted, but you can trust me, I've been in enough situations with users/customers that went like: "what do you mean, technology X is hard-wired and can't be replaced?" not gonna use/buy it … FWIW, I'm in touch with @thockin concerning Spartakus, will raise issues there and see how I can help in refactoring and extending the plug-able backend stuff with the goal to have a reliable component we can ship with Kubeflow. Hope that makes sense? |
Makes perfect sense - LMK how to help!
…On Thu, Mar 8, 2018 at 2:45 AM Michael Hausenblas ***@***.***> wrote:
Thanks @aronchick <https://github.com/aronchick>.
In re: on-prem, part of the idea is that we're able to track how this is
being used even on-prem. The fact that it uses a centralized logging system
(BQ) is a feature, not a bug, because otherwise how would we aggregate?
Because opting out is SO trivial, we're hoping that it doesn't cause any
issues.
Yes, I get that and I hope you remember that we actually decided on an
opt-in policy ;)
I think I might be missing the point in re: using KF offline - did you mean
you think that users would like to aggregate all the KF deployed across
their enterprise in an offline way? What an interesting (and exciting)
proposition! I love the idea of exploring that.
That is exactly what I mean, apologies for not being able to communicate
that better. We're all guilty of having a bit of a tunnel vision as we're
living in a bubble where we take the tools in our org for granted, but you
can trust me, I've been in enough situations with users/customers that went
like: "what do you mean, technology X is hard-wired and can't be replaced?"
not gonna use/buy it …
FWIW, I'm in touch with @thockin <https://github.com/thockin> concerning
Spartakus, will raise issues there and see how I can help in refactoring
and extending the plug-able backend stuff with the goal to have a reliable
component we can ship with Kubeflow. Hope that makes sense?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#55 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AADIdfY5dKvYODjxhPhF0tujosITvDnxks5tcQvZgaJpZM4RI9BN>
.
|
@aronchick for now I think we should be good, thanks. I'm trying to get involved in Spartakus to ensure that it's a stable and reliable component for our needs, for starters I'm focusing on improving the docs, see kubernetes-retired/spartakus#31 and then we'll see how merciful Mr @thockin is with my refactoring PRs ;) |
@mhausenblas The spartakus collector defines an interface that abstracts away the database. So if someone wanted to support a DB other than BigQuery it should be pretty straightforward. |
Per the discussion in this thread, we are now collecting metrics opt-in. This is described in our instructions So I'm closing this issue. @mhausenblas thanks for chipping in on spartakus that will be very useful. |
Signed-off-by: Ce Gao <gaoce@caicloud.io>
Using something like Spartakus (https://github.com/kubernetes-incubator/spartakus), ping back to a central server information about the Kubeflow deployment once per day. It should be absolutely anonymous, with zero PII. Just how many components are deployed, and how many pods are running - with a unique identifier to track deployments that last for more than one day.
We should also enable opting out with a single flag, something like --report-metrics=false during ksonnet deployment.
The text was updated successfully, but these errors were encountered: