Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus: Fix $__rate_interval calculation #77234

Merged
merged 21 commits into from
Nov 13, 2023

Conversation

itsmylife
Copy link
Contributor

@itsmylife itsmylife commented Oct 26, 2023

Background Information

$__rate_interval calculation is explained here in detail.
Shortly it is max($__interval + Scrape interval, 4 * Scrape interval) where Scrape interval is the “Min step” setting (also known as query*interval, a setting per PromQL query) if any is set. Otherwise, Grafana uses the Prometheus data source’s “Scrape interval” setting.
Why we need that in the first place is explained here in this beautiful blog post.
$__interval calculation is explained here.

For all data sources, we have a Min Interval value. The default value of this is 15s or you can set it differently in data source settings. To be able to interpolate $__interval we also send intervalMs. This is always the millisecond value of Interval in the panel below.

Last 2 Days time range
image

In the Prometheus query editor, we also have Min Step for queries.
image

Users might want to override Min Interval or Min Step.

Min Interval setting has precedence over the scrape interval as set in the data source.
Min Step setting has precedence over the Min Interval.

Backend receives that as interval. This can be seen in Query Inspector.

So let's use the values below as example.

Data source Scrape Interval 30s
Time Range Last 2 Days
Interval (calculated Automatically) 1m
Min Interval (Overriden) 100s
Min Step (Overriden) 150s

image

Based on the documentation let's put the values in the equation and find the $__rate_interval value.

max($__interval + query.interval, 4 * query.interval)
max(       60   +  150 ,          4 * 150)          

So this should give the value as 600s. And the step value we send to Prometheus should be 150s => 2m30s. Let's see it in the query inspector:
image

As you can see something is not quite right. So this PR is fixing it. Let's switch to this branch and run our query again.
image

Who is this feature for?

Prometheus users

Special notes for your reviewer:

  • I updated the variable names since they were quite confusing
  • I put some comments

Please check that:

  • It works as expected from a user's perspective.
  • If this is a pre-GA feature, it is behind a feature toggle.
  • The docs are updated, and if this is a notable improvement, it's added to our What's New doc.

@itsmylife
Copy link
Contributor Author

Additional information:
In Grafana v7.2 it was working as it was defined in the documentation

v7.2

Scrape Interval   30s
Min Interval 	    100s
Query Min Step  150s

Query rate(go_gc_duration_seconds_count[$__rate_interval])

Browser Mode 
http://localhost:9090/api/v1/query_range?query=rate(
	go_gc_duration_seconds_count[600s]
)
	&start=1697635050
	&end=1697635350
	&step=150


Server Mode
api/datasources/proxy/1734/api/v1/query_range?query=rate(
	go_gc_duration_seconds_count[600s]
)
	&start=1697635350
	&end=1697635650
	&step=150

@itsmylife
Copy link
Contributor Author

image
image

@itsmylife
Copy link
Contributor Author

itsmylife commented Nov 6, 2023

@gouthamve @beorn7 @davkal Can I get feedback, please? I provided a chart and tested various scenarios. I hope those will give some more insight into this proposed change.

@beorn7
Copy link

beorn7 commented Nov 7, 2023

I still think that taking min_step as an override for the scrape interval in the equation for $__rate_interval is the right thing to do. I still don't know what the rationale was to behind dropping that behavior. Based on @ivanahuckova's comment above, I think we can assume that there was no rationale behind it because it happened by accident.

So we should re-instate the old "right" behavior, which this PR seems to do.

There is the special case where $__rate_interval is used as setting for min_step, which appeared to be impossible in Grafana 7.2. I don't know what the user tries to express by doing so, so I cannot really tell what the right behavior should be. I would assume it is best to not let the min_step setting influence the $__rate_interval calculation in this case (to avoid the circular dependency), which this PR seems to implement. (And I am puzzled how the current behavior in the 2nd to last row is happening…)

@itsmylife
Copy link
Contributor Author

I would assume it is best to not let the min_step setting influence the $__rate_interval calculation in this case (to avoid the circular dependency), which this PR seems to implement. (And I am puzzled how the current behavior in the 2nd to last row is happening…)

@beorn7 If min_step is set, it influences. Because min_step is being sent to the backend as interval in the request.
If it is set as $__rate_interval then it first calculates the $__rate_interval and then applies it. See https://github.com/grafana/grafana/pull/77234/files#diff-29066064bdaa8cc50768e60d57f98047c5a4db367865e629c6c7768241a31e16R242-R249

@gouthamve
Copy link
Member

Hrm, I see. So we are using min step as an override for the scrape interval. While I don't think we should use the option like that, I also don't see a better alternative.

Lets go ahead with this PR. Sorry for the confusion.

Copy link
Member

@ivanahuckova ivanahuckova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@itsmylife itsmylife merged commit b607a4e into main Nov 13, 2023
14 checks passed
@itsmylife itsmylife deleted the ismail/fix-prom-rate-interval-calculation branch November 13, 2023 09:13
@bhks
Copy link

bhks commented Nov 13, 2023

We would appreciate if the author /someone familiar with the code backport this to 9.4.x ?

@aangelisc aangelisc modified the milestones: 10.3.x, 10.2.3 Dec 21, 2023
@tolzhabayev tolzhabayev added backport v10.2.x product-approved Pull requests that are approved by product/managers and are allowed to be backported and removed no-backport Skip backport of PR labels Mar 7, 2024
Copy link
Contributor

The backport to v10.2.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new branch
git switch --create backport-77234-to-v10.2.x origin/v10.2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x b607a4e0a2ee2a9b368c6ebcf4021249768b347e

When the conflicts are resolved, stage and commit the changes:

git add . && git cherry-pick --continue

If you have the GitHub CLI installed:

# Push the branch to GitHub:
git push --set-upstream origin backport-77234-to-v10.2.x
# Create the PR body template
PR_BODY=$(gh pr view 77234 --json body --template 'Backport b607a4e0a2ee2a9b368c6ebcf4021249768b347e from #77234{{ "\n\n---\n\n" }}{{ index . "body" }}')
# Create the PR on GitHub
echo "${PR_BODY}" | gh pr create --title "[v10.2.x] Prometheus: Fix $__rate_interval calculation" --body-file - --label "type/bug" --label "datasource/Prometheus" --label "area/backend" --label "add to changelog" --label "product-approved" --label "backport" --base v10.2.x --milestone 10.2.x --web

Or, if you don't have the GitHub CLI installed (we recommend you install it!):

# Push the branch to GitHub:
git push --set-upstream origin backport-77234-to-v10.2.x

# Create a pull request where the `base` branch is `v10.2.x` and the `compare`/`head` branch is `backport-77234-to-v10.2.x`.

# Remove the local backport branch
git switch main
git branch -D backport-77234-to-v10.2.x

@grafana-delivery-bot grafana-delivery-bot bot added the backport-failed Failed to generate backport PR. Please resolve conflicts and create one manually. label Mar 7, 2024
tolzhabayev pushed a commit that referenced this pull request Mar 7, 2024
* Remove unused param

* simple unit test

* rename

* rename

* add some comments

* Update values

* refactor

* rename

* always calculate rate interval

* fix unit tests

* Fix indentation

* linter fix

* update test

* Fixing issues with the calculation

* new test

* fix $__interval interpolation

* fix test

* add comment

(cherry picked from commit b607a4e)
tolzhabayev added a commit that referenced this pull request Mar 7, 2024
Prometheus: Fix $__rate_interval calculation (#77234)

* Remove unused param

* simple unit test

* rename

* rename

* add some comments

* Update values

* refactor

* rename

* always calculate rate interval

* fix unit tests

* Fix indentation

* linter fix

* update test

* Fixing issues with the calculation

* new test

* fix $__interval interpolation

* fix test

* add comment

(cherry picked from commit b607a4e)

Co-authored-by: ismail simsek <ismailsimsek09@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
add to changelog area/backend backport v10.2.x backport-failed Failed to generate backport PR. Please resolve conflicts and create one manually. datasource/Prometheus product-approved Pull requests that are approved by product/managers and are allowed to be backported type/bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants