Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/ingester: handle labels mapping to the same fast fingerprint. #1247

Merged
merged 8 commits into from
Dec 5, 2019

Conversation

pstibrany
Copy link
Member

@pstibrany pstibrany commented Nov 12, 2019

What this PR does / why we need it:
This PR fixes problem in ingester with labels mapped to the same fast fingerprint, which then causes out of order errors.

Which issue(s) this PR fixes:
Fixes #898

Special notes for your reviewer:
Uses slightly adapted fpMapper code from Cortex.

Instead of using []client.LabelAdapter, this PR changes it to labels.Labels in ingester stream. It also avoids conversion via client.FromLabelAdaptersToLabels (which breaks contract of labels.Labels because it doesn't check for input being sorted. cortexproject/cortex#1813) Note that fpMapper code relies on labels.Labels being sorted in the equality check.

This PR doesn't change similar problem present in promtail Client.

Checklist

  • Documentation added (there is no documentation change for this)
  • Tests updated

Uses slightly adapted fpMapper code from Cortex.

Fixes issue #898

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
pkg/ingester/stream.go Outdated Show resolved Hide resolved
Copy link
Contributor

@cyriltovena cyriltovena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks awesome, can you quickly verify race manually with go -test race on the new test once you have added a goroutine. I just want to make sure we don't add new races, we don't have anything to verify and this part of the code is pretty much entitled to races.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
To be run with -race.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
@pstibrany
Copy link
Member Author

I've added slightly bigger test for concurrent pushes. Perhaps it can be used as basis for benchmark. I think that global (per-user) i.streamsMtx isn't helping much with concurrency. Something to improve in the future.

@pstibrany
Copy link
Member Author

@sandlis pointed out issue with this code... it may pass different fingerprint from different ingesters to the chunk storage code.

Mapped fingerprint is used in streams map.
Original ("rawFP") is preserved when saving chunks.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
@pstibrany
Copy link
Member Author

pstibrany commented Nov 13, 2019

I've addressed problem where mapping series to a different fingerprint would also be reflected in chunk name, which is not what we want. Fixed now.

Thanks @sandlis for spotting this!

@pstibrany
Copy link
Member Author

I now have my doubts about whether using original fingerprint for chunk ID is a good idea. By deduping chunks on original fingerprints, we may accidentally drop valid data that way, as we already know that there was fingerprint collision. I cannot find the code doing this deduping in Loki though. In Cortex, it's here: https://github.com/cortexproject/cortex/blob/77a09cc7c953ce5bc4938f042ccb588503ce8eb5/pkg/chunk/chunk_store_utils.go#L55

and with my extra fix, it would definitely drop chunk from different series than expected.

Preserving raw fingerprints could lead to data loss when
doing chunk deduplication while storing chunks. We don't want
that.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
@cyriltovena
Copy link
Contributor

I now have my doubts about whether using original fingerprint for chunk ID is a good idea. By deduping chunks on original fingerprints, we may accidentally drop valid data that way, as we already know that there was fingerprint collision. I cannot find the code doing this deduping in Loki though. In Cortex, it's here: https://github.com/cortexproject/cortex/blob/77a09cc7c953ce5bc4938f042ccb588503ce8eb5/pkg/chunk/chunk_store_utils.go#L55

and with my extra fix, it would definitely drop chunk from different series than expected.

In the pkg/storage we don't filter same fingerprint chunks in Loki. We dedupe entries though.

for _, label := range chunk.Metric {
if label.Value == "" {
return fmt.Errorf("Chunk has blank label %q", label.Name)
}
}

// remove __name__ label
if chunk.Metric.Has("__name__") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@cyriltovena cyriltovena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cyriltovena cyriltovena merged commit e0ec61f into master Dec 5, 2019
@cyriltovena cyriltovena deleted the issue-898-out-of-order branch December 5, 2019 17:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Out-of-order errors with log streams from multiple services despite unique labels
2 participants