Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alerting: Attach hash of instance labels to state history log lines #65968

Merged
merged 4 commits into from
Apr 19, 2023

Conversation

alexweav
Copy link
Contributor

@alexweav alexweav commented Apr 4, 2023

What is this feature?

Adds a new field in the log line, instanceID, which will be unique based on the alert instance for which the log line is attached. If two lines are associated with the same alert instance, they will have the same instance ID.

Why do we need this feature?

This allows for new types of LogQL queries on the data - queries are now able to group by alert instance which allows for new types of analysis.

Which issue(s) does this PR fix?:

n/a

Special notes for your reviewer:

Please check that:

  • It works as expected from a user's perspective.
  • If this is a pre-GA feature, it is behind a feature toggle.
  • The docs are updated, and if this is a notable improvement, it's added to our What's New doc.

@alexweav alexweav added this to the 10.0.0 milestone Apr 4, 2023
@alexweav alexweav requested a review from a team April 4, 2023 21:37
@@ -2,6 +2,7 @@ package historian

import (
"context"
"crypto/md5"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the algorithm is mostly irrelevant, can we use one of the SHA families instead? Those are supported in the browser's webcrypto API and might allow us to use the same fingerprinting on the front-end for areas where we don't have access to data from the Alert State History API.

Copy link
Contributor Author

@alexweav alexweav Apr 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check @JohnnyQQQQ's comment: #65968 (comment)

I'm thinking of instead just directly lifting prometheus's hash function. It's just a basic FNV hash and we can just use its exact same base and offset. This gives us some stronger alignment with prometheus, and the UI can also easily implement this and get the same values - WDYT? There's no real reason to use any crypto hash here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up going this route. The hash function we used is implemented here: https://github.com/prometheus/common/blob/main/model/fnv.go
And it serializes the label set using: https://github.com/prometheus/common/blob/2f04d2ec94b96df44aa4f8a60ae9fb29d8512636/model/signature.go#L33

This should translate pretty easily, it's faster, and more universal.

Copy link
Member

@JohnnyQQQQ JohnnyQQQQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one nit.

@@ -291,6 +295,7 @@ type lokiEntry struct {
Values *simplejson.Json `json:"values"`
DashboardUID string `json:"dashboardUID"`
PanelID int64 `json:"panelID"`
InstanceID string `json:"instanceID"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Prometheus it's called fingerprint. Perhaps we want to pick that up, so the naming is kind of the same, as it's the same thing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also pick up the implementation

Copy link
Contributor

@gotjosh gotjosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code LGTM, but please see my comment.

@@ -242,14 +243,17 @@ func statesToStream(rule history_model.RuleMeta, states []state.StateTransition,
continue
}

sanitizedLabels := removePrivateLabels(state.Labels)
instFingerprint, _ := json.Marshal(sanitizedLabels)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct.

There's no sorting of the labels so so the same labels might produce different results.

On top of that, why do we need the json marshalling? (which is very expensive - probably more than the hashing itself).

I'd replace the current approach with:

md5.Sum([]byte(sanitizedLabels.String()))

Which a) would remove the need for JSON marshalling and then have a guaranteed order.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The json package sorts the map keys automatically - this is actually why I originally chose json, because it's stable. String() felt more unreliable to me.

I instead decided to lift the exact same fingerprint calculation that Prometheus implements. It's a super simple FNV hash with no intermediate step, so it's faster than all other suggestions - and it also sorts keys, and has the side effect of producing the same fingerprint across alertmanagers 😄 . There was no need for a crypto hash here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lifting FNV is something I can def get behind - historically, JSON computations are pretty expensive in Go and I'd to shy away from it when possible.

Out of curiosity though, why does String() feels unreliable to you? In my eyes, this looks pretty standard.

Copy link
Contributor

@gotjosh gotjosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alexweav alexweav force-pushed the alexweav/instance-labels-hash branch from d15eed8 to 1cdd0bc Compare April 19, 2023 18:16
@@ -263,7 +264,8 @@ func statesToStream(rule history_model.RuleMeta, states []state.StateTransition,
Condition: rule.Condition,
DashboardUID: rule.DashboardUID,
PanelID: rule.PanelID,
InstanceLabels: removePrivateLabels(state.Labels),
Fingerprint: labelFingerprint(sanitizedLabels),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

state contains cacheID, why don't you use it?

Copy link
Contributor

@yuri-tceretian yuri-tceretian Apr 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternative, state has a method GetAlertInstanceKey that returns result that contains a hash of the labels. The only difference is that it include private labels. Why do you remove private labels?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments above, I think it's a much more portable solution to match with Prometheus's fingerprint calculation.

State history has always removed private labels for a very long time, even back when it wrote only annotations. The fingerprint should definitely be based off the labels that actually get written - otherwise, the result is confusing and won't align with fingerprints the UI might be calculating.

Whether we strip private labels at all is another issue, making this behavior change everywhere is something we can consider - but such a change should be made separately from this PR. In this scope of this PR, the fingerprint should follow the labels we actually store.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand, private labels contain information such as alert rule UID, by removing this label and calculating a hash based on only public ones you may encounter a situation when there are two rules that do not have any labels and produce the exact same set of labels would end up with exact same fingerprints.

@alexweav alexweav merged commit 3634079 into main Apr 19, 2023
@alexweav alexweav deleted the alexweav/instance-labels-hash branch April 19, 2023 19:22
mdvictor pushed a commit that referenced this pull request Apr 21, 2023
…65968)

* Add instanceID which is hash of labels

* Rename field to fingerprint

* Move to prometheus style signature

* Appease linter
@zerok zerok modified the milestones: 10.0.0, 10.0.0-preview May 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

6 participants