Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scrape,api: provide per-target metric metadata #4183

Merged
merged 3 commits into from Jun 6, 2018
Merged

scrape,api: provide per-target metric metadata #4183

merged 3 commits into from Jun 6, 2018

Conversation

fabxc
Copy link
Contributor

@fabxc fabxc commented May 22, 2018

This adds a per-target cache of scraped metadata. The metadata is only
available for the lifecycle of the attached target. An API endpoint allows
to select metadata by metric name and a label selection of targets.

Per my (so far short-running) test I somewhat surprisingly cannot detect any difference in memory or CPU utilization. (edit: okay, let's not jump to conclusions, the test input is probably heavily skewed.)

I'll throw in an API-level test once we have agreement on it. It's kinda tedious due to some tech debt.

API example:

curl -G http://localhost:9091/api/v1/targets/metadata \
    --data-urlencode 'match=go_goroutines{job="prometheus"}' \
    --data-urlencode 'limit=3' | jq
{
  "status": "success",
  "data": [
    {
      "target": {
        "instance": "127.0.0.1:9090",
        "job": "prometheus"
      },
      "type": "gauge",
      "help": "Number of goroutines that currently exist."
    },
    {
      "target": {
        "instance": "127.0.0.1:9091",
        "job": "prometheus"
      },
      "type": "gauge",
      "help": "Number of goroutines that currently exist."
    },
    {
      "target": {
        "instance": "localhost:9090",
        "job": "prometheus"
      },
      "type": "gauge",
      "help": "Number of goroutines that currently exist."
    }
  ]
}

I think providing metric and target label matchers through a single series matcher is quite elegant and aligns well with our other endpoints. Possibly we could ignore matchers on labels a target doesn't have – this is a bit one-off semantics but would allow clients to just pass entire series. Since they cannot know which labels are target labels, this seems useful in practice.
Otherwise, just passing instance and job matchers seems relatively safe.

@brian-brazil @jkohen

Copy link
Contributor

@jkohen jkohen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. We agreed to add a LIST method to dump the metadata for all metrics, as an optimization for the startup case. The current implementation makes it easy to add it later, so no worries.

scrape/scrape.go Outdated

// metaEntry holds meta information about a metric.
type metaEntry struct {
lastIter uint64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you document this field? It's not obvious to me.

scrape/scrape.go Outdated
}
c.metaMtx.Lock()
for m, e := range c.metadata {
if c.iter-e.lastIter > 10 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any comments on how 10 was picked?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want a reasonable amount of buffer time where we can still catch metadata after a metric has officially disappeared. With normal scrape intervals, 10 will give us 3 to 10 minutes.
I'd do even more personally. But I think @brian-brazil advocated for being rather strict about it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was talking mostly about targets going away. Metrics should ~never go away once they are exposed on a target, so this isn't buying you anything.

Copy link
Contributor Author

@fabxc fabxc May 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misunderstood you then. My initial version did not have any garbage collection but I added this based on your comment.
Now that we have it, it probably makes sense to just keep it. Metrics should never disappear, but this will help if clients do something funky.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's inconsistent with the the others, which is a bit weird. Needs a comment at the very least.

scrape/scrape.go Outdated
c.metaMtx.Unlock()
}

func (c *scrapeCache) getMetadata(metric string) (textparse.MetricType, string, bool) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's your position on creating a struct type for the metadata, instead of passing MetricType/string separately? If we add a field at some point, having a type will make it easier to change the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll probably need that anyway for a list API. So that'll likely happen.


// seriesCur and seriesPrev store the labels of series that were seen
// in the current and previous scrape.
// We hold two maps and swap them out to save allocations.
seriesCur map[uint64]labels.Labels
seriesPrev map[uint64]labels.Labels

metaMtx sync.Mutex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An RWMutex could prove beneficial here.

Copy link
Contributor Author

@fabxc fabxc May 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hold time is very short everywhere. Based on past experience in those scenarios, the RWMutex performs notably worse then. Especially since >99% of the time we will need the write lock.

It's also unlikely we'll have more than one reader. In which case all callers are mutually exclusive anyway.

scrape/target.go Outdated
if t.metadataFunc == nil {
return textparse.MetricTypeUnknown, "", false
}
return t.metadataFunc(metric)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably want to either unlock before calling this function or wrap with RLock(), or concurrency for this API will effectively be limited to 1.

If you unlock, consider switching the RWMutex to a regular Mutex or an atomic pointer, which are lighter options for this use.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, the concurrency limit is for the target, not the API, so probably not important. I think it's still cleaner to either call RLock or switch to Mutex.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes sense.

@fabxc
Copy link
Contributor Author

fabxc commented May 22, 2018

In a bigger test with 1000 mysql_exporter targets, we are looking at 5-7% memory usage increase. CPU remains unchanged. That still seems fairly good.

In practice I'd expect it to be a bit less if anything. All scraped test series are constant, artificially lowering the baseline memory usage. There's also an unusually high amount of single-series metrics.

Copy link
Contributor

@brian-brazil brian-brazil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a quick look. I'm not sure how this works naming/api wise with listing metadata.

if s := r.FormValue("limit"); s != "" {
var err error
if limit, err = strconv.Atoi(s); err != nil {
return nil, &apiError{errorBadData, fmt.Errorf("limit must be a number")}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double space before number

@@ -237,6 +237,7 @@ const (
type MetricType string

const (
MetricTypeUnknown = ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have untyped as a value (or unknown if you want to future-proof for openmetrics).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, missed untyped intitially but have this added locally as yet another one. Untyped is known and unknown. I wanted MetricTypeUnknown for unknown unknown.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Within the current format untyped is that, there is no "unknown".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair, fine with using a boolean in places where that doesn't quite cut it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What cases are you thinking of?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the new changes it's no longer relevant I suppose.

scrape/scrape.go Outdated
}
c.metaMtx.Lock()
for m, e := range c.metadata {
if c.iter-e.lastIter > 10 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was talking mostly about targets going away. Metrics should ~never go away once they are exposed on a target, so this isn't buying you anything.

}
}
if metric == "" {
return nil, &apiError{errorBadData, errors.New("matcher must specify metric name")}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should be required, you should be able to dump all metadata for one target.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, was just about to add that as a "list" implementation that's still per-target.
After some consideration, a global list will generate massive amounts of data and seems to dangerous.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, this requires some thought but it'd still be very useful to get an (arbitrarily) deduped list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd opt for providing the metadata first and waiting for feature requests. Better than providing something now with guessed semantics and risky performance implications.

}

var metric string
for i, m := range matchers {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment that this is extracting/removing the metric name would be useful

}
}

func (c *scrapeCache) iterDone() {
// refCache and lsetCache may grow over time through series churn
// All caches may grow over time through series churn
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we'd fixed that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I merely changed the comment since it was a bit outdated. This just points out why we are doing the garbage collection below.

@fabxc fabxc force-pushed the metastore branch 2 times, most recently from 5d704f0 to d418e31 Compare May 24, 2018 15:26
@fabxc
Copy link
Contributor Author

fabxc commented May 24, 2018

Listing functionality and endpoint documentation added.

@fabxc fabxc force-pushed the metastore branch 2 times, most recently from 2fad6f7 to 91b6e86 Compare May 24, 2018 17:20
@krasi-georgiev
Copy link
Contributor

just curious what is the use case for this metadata?


URL query parameters:

- `match=<series_selector>`: A selector that matches targets by label matchers. If a metric name is provided, only metadata for that metric name is returned.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be clear about what the label matchers are matching against

I'm wondering now if we should split the metric name from the target labels URL parameter wise, it's a bit unclean to have them as one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine with me. metric and match_target ok? target_matcher would work too but be inconsistent with our other endpoints with match params.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm

@@ -363,6 +363,85 @@ $ curl http://localhost:9090/api/v1/targets
}
```

## Target metadata

The following endpoint returns metadata about metrics currently scraped by targets.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be marked experimental for now

}
```

The following example returns metadata for all metrics for the target with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for all the targets

scrape/scrape.go Outdated
defer c.metaMtx.Unlock()

for m, e := range c.metadata {
res = append(res, MetricMetadata{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we pre-initilise this to be the right size?

sl.cache.setType(p.Type())
continue
case textparse.EntryHelp:
sl.cache.setHelp(p.Help())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the Help getting unescaped? I didn't see it in the lexing code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is problematic since we'd allocate a new string. If we store it unescaped, we'd need to unescape all help strings on each scrape and do the same. If we store escaped and unescaped version, our memory usage for this feature doubles, since it's virtually all in help strings.

We could only do it on demand when reading the help strings via the API – that will also come with heavy allocs, but is certainly better.
Only for everything to get escaped all over again when we return it as JSON...

Any ideas on how to handle this all?

There's also no generic unescape function AFAICS. So we have to do replacements ourselves. Is there a full list of escaped symbols we've to cover?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://golang.org/pkg/strconv/#Unquote

We've already got code for this for label values, might be worth seeing how it works. The vast majority of strings won't need escaping.

For help strings it's only \ and \n.

Copy link
Contributor Author

@fabxc fabxc Jun 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I realized it shortly after I posted. The docs don't quite give it away, the docs for the symmetric Quote are clearer. It expects actual quotes at beginning and end though, which means we need to allocate yet another transient string to add those.

Why would \t not be allowed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, we did the whole thing for label values already. The notable difference is that it's called only on first sight a series there. We'd be doing it on every scrape – so if there's escaped stuff in there, the impact is notably bigger.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tabs can be represented by themselves, so no need to escape them.

Copy link
Contributor Author

@fabxc fabxc Jun 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems rather unusual. I don't think I've encountered anything in the wild yet the opted to only escape some things but not others. And I'm relatively sure it's gonna throw off more common escape handling (strconv.Unquote will just error for example) – after all this parser is not the only consumer.

I guess that raises questions about all other escape sequences that our current handling in label values doesn't cover either.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's the case then those parsers are buggy. \t for example should end up as a backslash followed by a t.

@fabxc fabxc changed the base branch from scrapemeta to master June 5, 2018 10:08
@fabxc fabxc force-pushed the metastore branch 3 times, most recently from ae96914 to 0d6739e Compare June 6, 2018 09:21
Fabian Reinartz added 2 commits June 6, 2018 05:56
This adds a per-target cache of scraped metadata. The metadata is only
available for the lifecycle of the attached target. An API endpoint allows
to select metadata by metric name and a label selection of targets.

Signed-off-by: Fabian Reinartz <freinartz@google.com>
Signed-off-by: Fabian Reinartz <freinartz@google.com>
Copy link
Contributor

@brian-brazil brian-brazil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

scrape/scrape.go Outdated

// metaEntry holds meta information about a metric.
type metaEntry struct {
lastIter uint64 // last scrape iteration the entry was observed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment format is off here

sl.cache.setType(p.Type())
continue
case textparse.EntryHelp:
sl.cache.setHelp(p.Help())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's the case then those parsers are buggy. \t for example should end up as a backslash followed by a t.

Signed-off-by: Fabian Reinartz <freinartz@google.com>
@jacobsy
Copy link

jacobsy commented Oct 25, 2018

Method Not Allowed
b 0g cqayvd b3n_5278vv9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants