-
Notifications
You must be signed in to change notification settings - Fork 519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Asset querying by allowing precise times to be passed to Prometheus queries #1127
Conversation
The problem I'm running into here is that this change actually breaks the |
queryLocalStorageBytes := fmt.Sprintf(`avg_over_time(sum(container_fs_limit_bytes{device!="tmpfs", id="/"}) by (instance, %s)[%s:%dm])`, env.GetPromClusterLabel(), durStr, minsPerResolution) | ||
queryLocalActiveMins := fmt.Sprintf(`count(node_total_hourly_cost) by (%s, node)[%s:%dm]`, env.GetPromClusterLabel(), durStr, minsPerResolution) | ||
|
||
resChPVCost := ctx.QueryAtTime(queryPVCost, t) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can now query at a time, t, which is nice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extremely nice!
pkg/prom/query.go
Outdated
if !t.IsZero() { | ||
// TODO remove log | ||
log.Infof("[Prom] time=%s query=%s", strconv.FormatInt(t.Unix(), 10), query) | ||
q.Set("time", strconv.FormatInt(t.Unix(), 10)) | ||
} else { | ||
q.Set("time", time.Now().UTC().Format(time.RFC3339)) | ||
// for non-range queries, we set the timestamp for the query to time-offset | ||
// this is a special use case that's typically only used when our primary | ||
// prom db has delayed insertion (thanos, cortex, etc...) | ||
if promQueryOffset != 0 && ctx.name != AllocationContextName { | ||
q.Set("time", time.Now().Add(-promQueryOffset).UTC().Format(time.RFC3339)) | ||
} else { | ||
q.Set("time", time.Now().UTC().Format(time.RFC3339)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise, here, want to be very careful @mbolt35
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I love these changes, but I am bit concerned that this will break the intention behind the hosted hacks (which I'm still not certain why allocation breaks).
Nothing outside of very specific circumstances should we actually set the prom query offset, so while I would love to discontinue the offset hackery, I think the only way to keep it consistent is the add the -promQueryOffset to the t time.Time.
We can discuss this more if you have more specific concerns, but if you can imagine every prod deployment of our product having promQueryOffset=0, that'd be great for now. I need to investigate remote write as a solution for hosted, but until I can do that, we need to be able to query at a static offset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cluster_helpers_test.go
long ago flew from my mind, so I'm not sure that I can offer much insight, but I'd be happy to walk through it with you. At a glance, I'd guess it could be related to the nodeMap[id].Minutes = ...
change.
// prom db has delayed insertion (thanos, cortex, etc...) | ||
if promQueryOffset != 0 && ctx.name != AllocationContextName { | ||
q.Set("time", time.Now().Add(-promQueryOffset).UTC().Format(time.RFC3339)) | ||
if !t.IsZero() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not make t
a pointer or have a separate method (like you're doing with QueryAtTime
) if its optional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its okay to do this (when is anyone actually going to try to run a query for January 1, year 1?) but it feels awkward to my Go-sense 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But nil pointers are scary, too! It's just a zero value :) But I see what you mean, of course -- I'll try it out and see what feels better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ended up liking time.Time
more, actually. (You can use time.Now()
and you don't have to worry about nil
pointers and you don't have to worry about mutating the value.) But we'll see how it ages! It'd be a quick change to make.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, while Michael and Bolt have pointed out improvement, this PR accomplishes the task that it is scoped for.
… disk queries; remove 1m addition to end for nodes and disks
093423e
to
ea4705e
Compare
…9 to legal time parameters for Prometheus and Thanos querying
Branches off #1104
What does this PR change?
How does this PR impact users? (This is the kind of thing that goes in release notes!)
Links to Issues or ZD tickets this PR addresses or fixes
How was this PR tested?
Manually, using logs to inspect Prometheus queries, resultant minutes running, and warning logs
Warnings like these entirely disappear
Minutes running for 1d and 1h queries now look full