report metrics from third-party resource #10

philips · 2016-11-18T06:15:14Z

It would be nice for people to report additional data without changing spartakus source code. I propose that we create a new third-party resource type called "SpartakusData" that can be used to report additional arbitrary data.

It would look something like this:

apiVersion: "k8s.io/v1"
kind: "SpartakusData"
metadata:
  name: "kube-dns"
spec:
  queries: 10040
  version: "v20"

This would be reported out in the payload like:

{
    "clusterID": "2f9c93d3-156c-47aa-8802-578ffca9b50e",
...
    "data": {
      "kube-dns": {"queries": 10040, "version": "v20"}
     }

Example use cases:

Reporting Cloud Provider specific details like number of persistent disks allocated
Allow add-ons to report usage data e.g. how many DNS queries from kube-dns
Track install specific metrics like other component versions

cc @squat @sym3tri

The text was updated successfully, but these errors were encountered:

thockin · 2016-11-18T06:18:23Z

That's a little finer than what I intended with this. I was more looking
at things like "how many pods per namespace" and "how many containers per
pod" and "how long to pods run" and those sorts of things. I see utility
in arbitrary other stuff, I guess, but it's sort of hard to consume without
a fixed schema...

On Thu, Nov 17, 2016 at 10:15 PM, Brandon Philips notifications@github.com
wrote:

It would be nice for people to report additional data without changing
spartakus source code. I propose that we create a new third-party resource
type called "SpartakusData" that can be used to report additional arbitrary
data.

It would look something like this:

apiVersion: "k8s.io/v1"
kind: "SpartakusData"
metadata:
name: "kube-dns"
spec:
queries: 10040
version: "v20"

This would be reported out in the payload like:

{
"clusterID": "2f9c93d3-156c-47aa-8802-578ffca9b50e",
...
"data": {
"kube-dns": {"queries": 10040, "version": "v20"}
}

Example use cases:

Reporting Cloud Provider specific details like number of persistent
disks allocated

Allow add-ons to report usage data e.g. how many DNS queries from
kube-dns

Track install specific metrics like other component versions

cc @squat https://github.com/squat @sym3tri https://github.com/sym3tri

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#10, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AFVgVFNSZN3LB3HxRWTStZco-nKuUGoBks5q_UJygaJpZM4K2J5i
.

philips · 2016-11-18T16:11:21Z

@thockin well, obviously we would ask people providing these statistics to use a defined schema, likely checked in here.

Essentially this is so that in Tectonic we don't have to fork Spartakus to add a few additional metrics specific to that product.

thockin · 2016-11-18T16:19:28Z

I'm not AGAINST things like this, but there are things that a commercial
offering might want to know that are outside the bounds of what a free and
open offering should touch. I'd like any new data we add to come with
justification of how it will be processed.

On Fri, Nov 18, 2016 at 8:11 AM, Brandon Philips notifications@github.com
wrote:

@thockin https://github.com/thockin well, obviously we would ask people
providing these statistics to use a defined schema, likely checked in here.

Essentially this is so that in Tectonic we don't have to fork Spartakus to
add a few additional metrics specific to that product.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#10 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVGS2dD4tT091FAgK-gy89vwXbAW0ks5q_c4pgaJpZM4K2J5i
.

philips · 2016-11-18T16:22:44Z

@thockin totally, I agree with that. We can totally just fork the code base to add what we need but it would make it easier to participate if there was an extension point. Another option would be another collector in the spartakus pod and a shared mountpoint.

philips · 2016-11-22T17:56:40Z

@thockin Any thought on this? We are happy to add the extension point as we need it.

What if we added a white list as a flag? So you have to say: --extensions=spartakus-extension.k8s.io/v1/namespaces/kube-system/spartakusextensions/kube-addons

Then spartakus would essentially do kubectl get -n kube-system spartakusextensions kube-addons and dump that into the extensions field.

Otherwise we can just fork the code to add the couple of metrics we need :(

thockin · 2016-11-29T06:08:10Z

Couple clarifications:

Do you intend this to be pushed to your own collector instance or to mine? To push to my collector is much more complicated because I am storing it all in BigQuery which wants a schema.

If we read from an arbitrary other location and package that extra data as a JSON object in an list of extensions or something, is that sufficient?

sym3tri · 2016-11-29T19:46:42Z

Do you intend this to be pushed to your own collector instance or to mine?

Our own.

If we read from an arbitrary other location and package that extra data as a JSON object in an list of extensions or something, is that sufficient?

For most things, yes, as long as we can optionally make some of the "extension fields" top-level fields in the DB table. b/c some will be identifiers and/or frequently used in queries.

Our current thinking:

modify the emitter to supplement payloads with additional arbitrary fields (via some json from some known location in cluster)
modify the collector to gracefully handle these extension fields (e.g. ignore any fields not configured in a white-list for the schema, or optionally embed all the extra json into a single additional field)

Desired end state:

Users of Spartakus today would require no additional config changes for basic usage.
All additional data emission/collection is opt-in via flags (on both sides).
Supplemental data generation is the responsibility of separate services, which populate to a known location (TPR?).

thockin · 2016-11-29T20:58:50Z

> If we read from an arbitrary other location and package that extra data as a JSON object in an list of extensions or something, is that sufficient? For most things, yes, as long as we can optionally make some of the "extension fields" top-level fields in the DB table. b/c some will be identifiers and/or frequently used in queries.

blech. Tradeoff complexity of merging top-level structs with a bit of extra typing in queries...

Our current thinking: modify the emitter to supplement payloads with additional arbitrary fields (via some json from somewhere in cluster) modify the collector to gracefully handle these extension fields (e.g. ignore any fields not configured in a white-list for the schema, or optionally embed all the extra json into a single additional field) Desired end state: Users of Spartakus today would require no additional config changes for basic usage. All additional data emission/collection is opt-in via flags. Supplemental data generation is the responsibility of separate services, which populate to a known location (TPR?).

TPR seems heavyweight for this unless you need the data elsewhere. Why not just run a sidecar that dumps to a file in a shared volume, the volunteer reads the latest state from that file and reports it as extra JSON

philips · 2016-11-29T21:07:56Z

Sidecar sgtm

…

On Tue, Nov 29, 2016, 12:58 PM Tim Hockin ***@***.***> wrote: >> If we read from an arbitrary other location and package that extra data as a JSON object in an list of extensions or something, is that sufficient? > > For most things, yes, as long as we can optionally make some of the "extension fields" top-level fields in the DB table. b/c some will be identifiers and/or frequently used in queries. blech. Tradeoff complexity of merging top-level structs with a bit of extra typing in queries... > Our current thinking: > > modify the emitter to supplement payloads with additional arbitrary fields (via some json from somewhere in cluster) > modify the collector to gracefully handle these extension fields (e.g. ignore any fields not configured in a white-list for the schema, or optionally embed all the extra json into a single additional field) > > Desired end state: > > Users of Spartakus today would require no additional config changes for basic usage. > All additional data emission/collection is opt-in via flags. > Supplemental data generation is the responsibility of separate services, which populate to a known location (TPR?). TPR seems heavyweight for this unless you need the data elsewhere. Why not just run a sidecar that dumps to a file in a shared volume, the volunteer reads the latest state from that file and reports it as extra JSON — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#10 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AACDCFFvBxRmBjWru8SXneCENergcalFks5rDJIKgaJpZM4K2J5i> .

sym3tri · 2016-11-30T02:17:45Z

blech. Tradeoff complexity of merging top-level structs with a bit of
extra typing in queries...

I'm not very familiar with BigQuery, but if embedded fields can be referenced easily then we could forego this.

sym3tri · 2016-11-30T02:19:23Z

Sidecar does sound like a good approach.

squat · 2016-11-30T05:11:26Z

Sidecar sounds good, but I think one big extra JSON string field will not be very useful. Please correct me if I'm wrong but I think since that field can be any arbitrary JSON string, BigQuery does not understand its schema and cannot search its nested fields. If we wanted to search, the best we could do would be to do string comparisons on the one big JSON dump. A better approach IMO would be to have a repeated record field called extensions akin to the repeated capacity field for nodes. This field could have two nested fields, name and value. If you want any of the extensions to be a JSON string then it is still possible, but it allows for better querying. So a payload might look like:

{
    "version": "v1.0.0",
    "timestamp": "867530909031976",
    "clusterID": "2f9c93d3-156c-47aa-8802-578ffca9b50e",
    "masterVersion": "v1.3.5",
    "nodes": [...],
    "extensions": [
        {
            "name": "foo",
            "value": "bar",
        },
        ...
    ]
}

…tired#10)

squat added a commit to squat/spartakus that referenced this issue Jan 11, 2017

database: ensure cloudProvider is reported in bigquery (kubernetes-re…

b827dfe

…tired#10)

squat mentioned this issue Feb 7, 2017

Add extensions reporting #19

Merged

thockin closed this as completed in #19 Mar 7, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

report metrics from third-party resource #10

report metrics from third-party resource #10

philips commented Nov 18, 2016

thockin commented Nov 18, 2016

philips commented Nov 18, 2016

thockin commented Nov 18, 2016

philips commented Nov 18, 2016

philips commented Nov 22, 2016

thockin commented Nov 29, 2016

sym3tri commented Nov 29, 2016 •

edited

Loading

thockin commented Nov 29, 2016 via email

philips commented Nov 29, 2016 via email

sym3tri commented Nov 30, 2016

sym3tri commented Nov 30, 2016

squat commented Nov 30, 2016 •

edited

Loading

report metrics from third-party resource #10

report metrics from third-party resource #10

Comments

philips commented Nov 18, 2016

thockin commented Nov 18, 2016

philips commented Nov 18, 2016

thockin commented Nov 18, 2016

philips commented Nov 18, 2016

philips commented Nov 22, 2016

thockin commented Nov 29, 2016

sym3tri commented Nov 29, 2016 • edited Loading

thockin commented Nov 29, 2016 via email

philips commented Nov 29, 2016 via email

sym3tri commented Nov 30, 2016

sym3tri commented Nov 30, 2016

squat commented Nov 30, 2016 • edited Loading

sym3tri commented Nov 29, 2016 •

edited

Loading

squat commented Nov 30, 2016 •

edited

Loading