[Prometheus] Update prometheus histogram #36537

constanca-m · 2023-09-08T09:21:26Z

Important: All documents from the tests commited were the result of running go test -data, since the tests were causing this to fail.

What

The issue is explained in #36317.

To fix it, we just need to change the function histogramMetricName to have support for the correct suffixes:

The problem was occurring because we had no distinction between *_bucket metrics and all the others, so they were being treated as the same.

How to test this PR locally

Clone this branch.
Build metricbeat binary and run it.

Related issues

Closes #36317.

Results

I am running a spring application with the following metrics:

If we go to Discover, we can check that the documents are all there now, since the histogram is correct:

elasticmachine · 2023-09-08T09:28:41Z

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Start Time: 2023-09-22T09:03:34.776+0000
Duration: 84 min 11 sec

Test stats 🧪

Test	Results
Failed	0
Passed	4956
Skipped	1051
Total	6007

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

/test : Re-trigger the build.
/package : Generate the packages and run the E2E tests.
/beats-tester : Run the installation tests with beats-tester.
run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

mergify · 2023-09-08T09:29:41Z

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @constanca-m? 🙏.
For such, you'll need to label your PR with:

The upcoming major version of the Elastic Stack
The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

StephanErb · 2023-09-08T19:21:14Z

Given the discussion and linked tickets in #36317, I am wondering if we should drop all the quantile metrics and just ingest the histogram.

While you managed to ingest both now, you get lot of additional and expensive time series without much added value.

ChrsMark

Thank's @constanca-m for dealing with this.

I have some comments about the implementation details but the main one is that we will need to ensure the fix with unit tests.

Also since unit-tests might not be enough because of the importance of the this specific code-base we would need to execute some manual testing here once this is merged.
@gizas @tommyers-elastic @bturquet could we mark this one for manual testing somehow? We used to do it in the past and helped a lot. The idea is that some random member of the team (preferably not involved with the PR) would execute some manual tests to ensure that the wide range of modules affected are not broken.

metricbeat/helper/prometheus/textparse.go

ChrsMark · 2023-09-21T07:51:08Z

metricbeat/helper/prometheus/textparse.go

-			name = name[:len(name)-7]
-		}
+		name = strings.TrimSuffix(name, suffixGSum)
+	case isBucket(name):


Do we have any unit-tests that cover these changes? It's hard to understand if something might break here in any corner case or whatever?

If not would you mind adding some tests? Otherwise we can exclude these changes from the patch if not needed for the fix. Improving code should always go along with adding tests ensuring that functionality is not broken :).

I added one more test for this exact function. Do you think that is enough?

I think that for the new functionality the added test is fine but what about the existing functionality that is touched here? How we can be sure that the code which is changed (improved) is not broken?
I'm talking mostly about the changes like the one at https://github.com/elastic/beats/pull/36537/files#diff-d66f15900f5676672ddf353458a9373cf8b1ab8e237c473191e4a64ae93cbba2L407-R416.

So the point here is that if we want to improve the code we should ensure that we don't break it. So to my mind we should either:

ensure that we have tests for these lines that still work

if we don't have tests then add them along with the corresponding code changes.

if we cannot add tests maybe it's safer to avoid any code changes that might be prone to errors

Could you verify the above cases please?

We never had tests for those cases. Do you mean adding a test case for every type (histogram, histogram gauge etc)? I only changed it to strings.TrimSuffix because then we know what is happening instead of defining the lenght of the suffix and then subtracting like we were doing.

Do you mean adding a test case for every type (histogram, histogram gauge etc)?

Ideally yes since we touch these specific lines of code. Otherwise the improvement seems to be incomplete to me and potentially prone to errors :).
Maybe that's out of the scope of this PR but the generic idea is what I mention.

I see. So should I try to add new tests for this PR or should we leave it to a different one?

That would be nice since we are on this. Otherwise I doubt it will happen at any time soon.
Another option would be to exclude the specific changes/improvements that are not related to this specific fix and add them later in a follow-up along with tests. That would unblock this one and ensures the quality as well. It's up to you to decide.

metricbeat/helper/prometheus/textparse.go

tetianakravchenko

Please add a changelog record

metricbeat/helper/prometheus/textparse.go

ChrsMark · 2023-09-22T07:42:46Z

metricbeat/helper/prometheus/textparse.go

-			name = name[:len(name)-7]
-		}
+		name = strings.TrimSuffix(name, suffixGSum)
+	case isBucket(name):


I think that for the new functionality the added test is fine but what about the existing functionality that is touched here? How we can be sure that the code which is changed (improved) is not broken?
I'm talking mostly about the changes like the one at https://github.com/elastic/beats/pull/36537/files#diff-d66f15900f5676672ddf353458a9373cf8b1ab8e237c473191e4a64ae93cbba2L407-R416.

So the point here is that if we want to improve the code we should ensure that we don't break it. So to my mind we should either:

ensure that we have tests for these lines that still work

if we don't have tests then add them along with the corresponding code changes.

if we cannot add tests maybe it's safer to avoid any code changes that might be prone to errors

Could you verify the above cases please?

metricbeat/helper/prometheus/textparse.go

CHANGELOG-developer.next.asciidoc

Co-authored-by: Chris Mark <chrismarkou92@gmail.com>

ChrsMark

Changes lgtm. Let's make sure that we take care of the tests either here or in a follow-up based on #36537 (comment).

Also we should consider of introducing manual testing phase for this and similar PRs that touch "core" functionality (specially when unit testing is not sufficient). @constanca-m could you please pair with @gizas and/or @tommyers-elastic on this?

constanca-m · 2023-09-22T13:39:34Z

Yes @ChrsMark , I am also working on adding some tests to the other types in this PR.

constanca-m · 2023-09-22T14:13:00Z

Actually, I will open a new PR because I already found a small error, and I don't want new changes to be added here as they are unrelated to the histogram.

gizas · 2023-09-25T07:11:45Z

@constanca-m please link here the additional PRs for the tests and the additional fixes

constanca-m · 2023-09-25T11:09:38Z

The PR with the tests is this one: #36669.

* Update prometheus histogram. * Update tests. * Fix error. * Revert change on expected files. * Add missing comments. * Add unit test. * Add unit test. * Run mage check. * Update changelog. * Add comment. * Update CHANGELOG-developer.next.asciidoc Co-authored-by: Chris Mark <chrismarkou92@gmail.com> --------- Co-authored-by: Chris Mark <chrismarkou92@gmail.com>

Update prometheus histogram.

cfce8fb

constanca-m added the Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team label Sep 8, 2023

constanca-m requested a review from a team September 8, 2023 09:21

constanca-m requested a review from a team as a code owner September 8, 2023 09:21

constanca-m self-assigned this Sep 8, 2023

constanca-m requested review from belimawr and fearful-symmetry September 8, 2023 09:21

botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Sep 8, 2023

Merge branch 'main' into fix-prometheus-histogram

7625f1d

constanca-m added the backport-skip Skip notification from the automated backport with mergify label Sep 8, 2023

Update tests.

ad87224

constanca-m closed this Sep 8, 2023

constanca-m reopened this Sep 8, 2023

constanca-m added 2 commits September 8, 2023 12:54

Fix error.

58b6842

Revert change on expected files.

4708d81

pierrehilbert added the Team:Elastic-Agent Label for the Agent team label Sep 8, 2023

constanca-m mentioned this pull request Sep 8, 2023

[Prometheus histogram] Histograms are being computed using wrong metrics #36317

Closed

belimawr approved these changes Sep 13, 2023

View reviewed changes

ChrsMark requested changes Sep 21, 2023

View reviewed changes

ChrsMark requested a review from tetianakravchenko September 21, 2023 07:58

ChrsMark added needs tests test-plan Add this PR to be manual test plan labels Sep 21, 2023

constanca-m and others added 2 commits September 21, 2023 10:03

Merge branch 'main' into fix-prometheus-histogram

dd3f323

Add missing comments.

c27e8e6

tetianakravchenko reviewed Sep 21, 2023

View reviewed changes

metricbeat/helper/prometheus/textparse.go Outdated Show resolved Hide resolved

metricbeat/helper/prometheus/textparse.go Show resolved Hide resolved

Add unit test.

9e050a1

constanca-m added 2 commits September 21, 2023 16:08

Add unit test.

561dfd7

Run mage check.

31ece69

ChrsMark reviewed Sep 22, 2023

View reviewed changes

constanca-m added 2 commits September 22, 2023 10:22

Update changelog.

3113763

Add comment.

c474d64

ChrsMark reviewed Sep 22, 2023

View reviewed changes

CHANGELOG-developer.next.asciidoc Outdated Show resolved Hide resolved

constanca-m and others added 2 commits September 22, 2023 11:02

Update CHANGELOG-developer.next.asciidoc

ad81af3

Co-authored-by: Chris Mark <chrismarkou92@gmail.com>

Merge branch 'main' into fix-prometheus-histogram

a41d066

ChrsMark approved these changes Sep 22, 2023

View reviewed changes

constanca-m merged commit 23246fe into elastic:main Sep 22, 2023
6 of 8 checks passed

constanca-m deleted the fix-prometheus-histogram branch September 22, 2023 14:13

constanca-m mentioned this pull request Sep 25, 2023

[Prometheus] Add unit tests for parsing metric families #36669

Merged

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Prometheus] Update prometheus histogram #36537

[Prometheus] Update prometheus histogram #36537

constanca-m commented Sep 8, 2023 •

edited

Loading

elasticmachine commented Sep 8, 2023 •

edited by jenkins-beats-ci bot

Loading

Build stats

Test stats 🧪

mergify bot commented Sep 8, 2023

StephanErb commented Sep 8, 2023

ChrsMark left a comment

ChrsMark Sep 21, 2023

constanca-m Sep 21, 2023

ChrsMark Sep 22, 2023

constanca-m Sep 22, 2023 •

edited

Loading

ChrsMark Sep 22, 2023

constanca-m Sep 22, 2023

ChrsMark Sep 22, 2023

tetianakravchenko left a comment

ChrsMark Sep 22, 2023

ChrsMark left a comment

constanca-m commented Sep 22, 2023

constanca-m commented Sep 22, 2023

gizas commented Sep 25, 2023

constanca-m commented Sep 25, 2023

[Prometheus] Update prometheus histogram #36537

[Prometheus] Update prometheus histogram #36537

Conversation

constanca-m commented Sep 8, 2023 • edited Loading

What

How to test this PR locally

Related issues

Results

elasticmachine commented Sep 8, 2023 • edited by jenkins-beats-ci bot Loading

💚 Build Succeeded

Build stats

Test stats 🧪

💚 Flaky test report

🤖 GitHub comments

mergify bot commented Sep 8, 2023

StephanErb commented Sep 8, 2023

ChrsMark left a comment

Choose a reason for hiding this comment

ChrsMark Sep 21, 2023

Choose a reason for hiding this comment

constanca-m Sep 21, 2023

Choose a reason for hiding this comment

ChrsMark Sep 22, 2023

Choose a reason for hiding this comment

constanca-m Sep 22, 2023 • edited Loading

Choose a reason for hiding this comment

ChrsMark Sep 22, 2023

Choose a reason for hiding this comment

constanca-m Sep 22, 2023

Choose a reason for hiding this comment

ChrsMark Sep 22, 2023

Choose a reason for hiding this comment

tetianakravchenko left a comment

Choose a reason for hiding this comment

ChrsMark Sep 22, 2023

Choose a reason for hiding this comment

ChrsMark left a comment

Choose a reason for hiding this comment

constanca-m commented Sep 22, 2023

constanca-m commented Sep 22, 2023

gizas commented Sep 25, 2023

constanca-m commented Sep 25, 2023

constanca-m commented Sep 8, 2023 •

edited

Loading

elasticmachine commented Sep 8, 2023 •

edited by jenkins-beats-ci bot

Loading

constanca-m Sep 22, 2023 •

edited

Loading