New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scrape,api: provide per-target metric metadata #4183
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. We agreed to add a LIST method to dump the metadata for all metrics, as an optimization for the startup case. The current implementation makes it easy to add it later, so no worries.
scrape/scrape.go
Outdated
|
||
// metaEntry holds meta information about a metric. | ||
type metaEntry struct { | ||
lastIter uint64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you document this field? It's not obvious to me.
scrape/scrape.go
Outdated
} | ||
c.metaMtx.Lock() | ||
for m, e := range c.metadata { | ||
if c.iter-e.lastIter > 10 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any comments on how 10 was picked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want a reasonable amount of buffer time where we can still catch metadata after a metric has officially disappeared. With normal scrape intervals, 10 will give us 3 to 10 minutes.
I'd do even more personally. But I think @brian-brazil advocated for being rather strict about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was talking mostly about targets going away. Metrics should ~never go away once they are exposed on a target, so this isn't buying you anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I misunderstood you then. My initial version did not have any garbage collection but I added this based on your comment.
Now that we have it, it probably makes sense to just keep it. Metrics should never disappear, but this will help if clients do something funky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's inconsistent with the the others, which is a bit weird. Needs a comment at the very least.
scrape/scrape.go
Outdated
c.metaMtx.Unlock() | ||
} | ||
|
||
func (c *scrapeCache) getMetadata(metric string) (textparse.MetricType, string, bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's your position on creating a struct type for the metadata, instead of passing MetricType/string separately? If we add a field at some point, having a type will make it easier to change the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll probably need that anyway for a list API. So that'll likely happen.
|
||
// seriesCur and seriesPrev store the labels of series that were seen | ||
// in the current and previous scrape. | ||
// We hold two maps and swap them out to save allocations. | ||
seriesCur map[uint64]labels.Labels | ||
seriesPrev map[uint64]labels.Labels | ||
|
||
metaMtx sync.Mutex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An RWMutex could prove beneficial here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hold time is very short everywhere. Based on past experience in those scenarios, the RWMutex performs notably worse then. Especially since >99% of the time we will need the write lock.
It's also unlikely we'll have more than one reader. In which case all callers are mutually exclusive anyway.
scrape/target.go
Outdated
if t.metadataFunc == nil { | ||
return textparse.MetricTypeUnknown, "", false | ||
} | ||
return t.metadataFunc(metric) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably want to either unlock before calling this function or wrap with RLock(), or concurrency for this API will effectively be limited to 1.
If you unlock, consider switching the RWMutex to a regular Mutex or an atomic pointer, which are lighter options for this use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never mind, the concurrency limit is for the target, not the API, so probably not important. I think it's still cleaner to either call RLock or switch to Mutex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that makes sense.
In a bigger test with 1000 mysql_exporter targets, we are looking at 5-7% memory usage increase. CPU remains unchanged. That still seems fairly good. In practice I'd expect it to be a bit less if anything. All scraped test series are constant, artificially lowering the baseline memory usage. There's also an unusually high amount of single-series metrics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a quick look. I'm not sure how this works naming/api wise with listing metadata.
web/api/v1/api.go
Outdated
if s := r.FormValue("limit"); s != "" { | ||
var err error | ||
if limit, err = strconv.Atoi(s); err != nil { | ||
return nil, &apiError{errorBadData, fmt.Errorf("limit must be a number")} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double space before number
pkg/textparse/parse.go
Outdated
@@ -237,6 +237,7 @@ const ( | |||
type MetricType string | |||
|
|||
const ( | |||
MetricTypeUnknown = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should have untyped as a value (or unknown if you want to future-proof for openmetrics).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, missed untyped intitially but have this added locally as yet another one. Untyped is known and unknown. I wanted MetricTypeUnknown
for unknown unknown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Within the current format untyped is that, there is no "unknown".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair, fine with using a boolean in places where that doesn't quite cut it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What cases are you thinking of?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the new changes it's no longer relevant I suppose.
scrape/scrape.go
Outdated
} | ||
c.metaMtx.Lock() | ||
for m, e := range c.metadata { | ||
if c.iter-e.lastIter > 10 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was talking mostly about targets going away. Metrics should ~never go away once they are exposed on a target, so this isn't buying you anything.
web/api/v1/api.go
Outdated
} | ||
} | ||
if metric == "" { | ||
return nil, &apiError{errorBadData, errors.New("matcher must specify metric name")} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this should be required, you should be able to dump all metadata for one target.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, was just about to add that as a "list" implementation that's still per-target.
After some consideration, a global list will generate massive amounts of data and seems to dangerous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, this requires some thought but it'd still be very useful to get an (arbitrarily) deduped list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd opt for providing the metadata first and waiting for feature requests. Better than providing something now with guessed semantics and risky performance implications.
web/api/v1/api.go
Outdated
} | ||
|
||
var metric string | ||
for i, m := range matchers { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment that this is extracting/removing the metric name would be useful
} | ||
} | ||
|
||
func (c *scrapeCache) iterDone() { | ||
// refCache and lsetCache may grow over time through series churn | ||
// All caches may grow over time through series churn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we'd fixed that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I merely changed the comment since it was a bit outdated. This just points out why we are doing the garbage collection below.
5d704f0
to
d418e31
Compare
Listing functionality and endpoint documentation added. |
2fad6f7
to
91b6e86
Compare
just curious what is the use case for this metadata? |
86e2c85
to
76a4a46
Compare
docs/querying/api.md
Outdated
|
||
URL query parameters: | ||
|
||
- `match=<series_selector>`: A selector that matches targets by label matchers. If a metric name is provided, only metadata for that metric name is returned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be clear about what the label matchers are matching against
I'm wondering now if we should split the metric name from the target labels URL parameter wise, it's a bit unclean to have them as one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine with me. metric
and match_target
ok? target_matcher
would work too but be inconsistent with our other endpoints with match params.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm
docs/querying/api.md
Outdated
@@ -363,6 +363,85 @@ $ curl http://localhost:9090/api/v1/targets | |||
} | |||
``` | |||
|
|||
## Target metadata | |||
|
|||
The following endpoint returns metadata about metrics currently scraped by targets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be marked experimental for now
docs/querying/api.md
Outdated
} | ||
``` | ||
|
||
The following example returns metadata for all metrics for the target with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for all the targets
scrape/scrape.go
Outdated
defer c.metaMtx.Unlock() | ||
|
||
for m, e := range c.metadata { | ||
res = append(res, MetricMetadata{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we pre-initilise this to be the right size?
sl.cache.setType(p.Type()) | ||
continue | ||
case textparse.EntryHelp: | ||
sl.cache.setHelp(p.Help()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the Help getting unescaped? I didn't see it in the lexing code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is problematic since we'd allocate a new string. If we store it unescaped, we'd need to unescape all help strings on each scrape and do the same. If we store escaped and unescaped version, our memory usage for this feature doubles, since it's virtually all in help strings.
We could only do it on demand when reading the help strings via the API – that will also come with heavy allocs, but is certainly better.
Only for everything to get escaped all over again when we return it as JSON...
Any ideas on how to handle this all?
There's also no generic unescape function AFAICS. So we have to do replacements ourselves. Is there a full list of escaped symbols we've to cover?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://golang.org/pkg/strconv/#Unquote
We've already got code for this for label values, might be worth seeing how it works. The vast majority of strings won't need escaping.
For help strings it's only \ and \n.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I realized it shortly after I posted. The docs don't quite give it away, the docs for the symmetric Quote
are clearer. It expects actual quotes at beginning and end though, which means we need to allocate yet another transient string to add those.
Why would \t
not be allowed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right, we did the whole thing for label values already. The notable difference is that it's called only on first sight a series there. We'd be doing it on every scrape – so if there's escaped stuff in there, the impact is notably bigger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tabs can be represented by themselves, so no need to escape them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems rather unusual. I don't think I've encountered anything in the wild yet the opted to only escape some things but not others. And I'm relatively sure it's gonna throw off more common escape handling (strconv.Unquote
will just error for example) – after all this parser is not the only consumer.
I guess that raises questions about all other escape sequences that our current handling in label values doesn't cover either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that's the case then those parsers are buggy. \t for example should end up as a backslash followed by a t.
ae96914
to
0d6739e
Compare
This adds a per-target cache of scraped metadata. The metadata is only available for the lifecycle of the attached target. An API endpoint allows to select metadata by metric name and a label selection of targets. Signed-off-by: Fabian Reinartz <freinartz@google.com>
Signed-off-by: Fabian Reinartz <freinartz@google.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
scrape/scrape.go
Outdated
|
||
// metaEntry holds meta information about a metric. | ||
type metaEntry struct { | ||
lastIter uint64 // last scrape iteration the entry was observed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment format is off here
sl.cache.setType(p.Type()) | ||
continue | ||
case textparse.EntryHelp: | ||
sl.cache.setHelp(p.Help()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that's the case then those parsers are buggy. \t for example should end up as a backslash followed by a t.
Signed-off-by: Fabian Reinartz <freinartz@google.com>
This adds a per-target cache of scraped metadata. The metadata is only
available for the lifecycle of the attached target. An API endpoint allows
to select metadata by metric name and a label selection of targets.
Per my (so far short-running) test I somewhat surprisingly cannot detect any difference in memory or CPU utilization. (edit: okay, let's not jump to conclusions, the test input is probably heavily skewed.)
I'll throw in an API-level test once we have agreement on it. It's kinda tedious due to some tech debt.
API example:
I think providing metric and target label matchers through a single series matcher is quite elegant and aligns well with our other endpoints. Possibly we could ignore matchers on labels a target doesn't have – this is a bit one-off semantics but would allow clients to just pass entire series. Since they cannot know which labels are target labels, this seems useful in practice.
Otherwise, just passing instance and job matchers seems relatively safe.
@brian-brazil @jkohen