Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sanitize window duration in ComputeClusterCosts #1074

Merged
merged 1 commit into from
Feb 28, 2022

Conversation

nikovacevic
Copy link
Contributor

@nikovacevic nikovacevic commented Feb 24, 2022

What does this PR change?

  • Fixes a Prometheus error being triggered by ComputeClusterCosts, which is a legacy function, but still exposed by two endpoints (i.e. non-ETL aggregated cost model, cluster costs)

How does this PR impact users? (This is the kind of thing that goes in release notes!)

Fixes a bug. (See ticket below.)

Links to Issues or ZD tickets this PR addresses or fixes

How was this PR tested?

$ kcl | grep ComputeClusterCosts
I0224 20:03:12.791651       1 log.go:47] [Info] [ComputeClusterCosts]: 10m0s -> 10m
I0224 20:04:33.527239       1 log.go:47] [Info] [ComputeClusterCosts]: 10m0.2s -> 10m

Before

Worked for any number of minutes that Prometheus can handle, but too small of a window is not going to be accurate at a resolution of 5m:
Screenshot from 2022-02-24 12-55-41

Fails with Prometheus error for mal-formed window:
Screenshot from 2022-02-24 12-56-05

After

For windows < 10m (even weird ones) simply stop and return an error:
Screenshot from 2022-02-24 13-04-05
Screenshot from 2022-02-24 13-04-24

For windows > 10m with weird formulations, round to a Prometheus-valid window and keep rollin:
Screenshot from 2022-02-24 13-04-36

@nikovacevic nikovacevic added next release This PR/issue is expected to be merged/addressed in the next release bug Something isn't working labels Feb 25, 2022
Copy link
Contributor

@mbolt35 mbolt35 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were we really passing a time.Duration to fmt.Sprintf() in a query? 😨

@nikovacevic
Copy link
Contributor Author

@mbolt35 You know it, brother 💪

@nikovacevic nikovacevic merged commit 95e57c0 into develop Feb 28, 2022
@nikovacevic nikovacevic deleted the niko/fix-compute-cluster-costs branch February 28, 2022 22:45
@nikovacevic nikovacevic removed next release This PR/issue is expected to be merged/addressed in the next release needs cherry pick labels Mar 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Strange duration value passed to prometheus query
3 participants