Update to Mimir-Prometheus 0db77b4c7 #3682

bboreham · 2022-12-08T16:39:24Z

Lots of interfaces change because of Sparse Histograms.
This PR doesn't make Mimir actually handle Sparse Histograms, but it lets us update to a newer Prometheus.
Possibly in line with upstream v2.40.2?

Much of the code in this PR was copied from the sparsehistograms branch.

Co-authored-by: Levi Harrison git@leviharrison.dev
Co-authored-by: Jeanette Tan jeanette.tan@grafana.com

Checklist

Tests updated
NA Documentation added
CHANGELOG.md updated.

`tsdb/wal` was renamed to `tsdb/wlog`. Several other libraries were bumped by this, including prometheus/client-go and prometheus/exporter-toolkit.

Required to build against prometheus/exporter-toolkit.

Copied from the sparsehistograms branch. Co-authored-by: Levi Harrison <git@leviharrison.dev> Co-authored-by: Jeanette Tan <jeanette.tan@grafana.com>

They are no longer identical in memory layout. This puts up the cost of query sharding slightly.

colega

I have reviewed the non-vendored changes. Mostly LGTM, with some minor comments.

My main concern is that we're taking some assumptions on values being floats that could translate into bugs once histograms are implemented.

I wonder if the people implementing histograms will just have to revisit every single usage of chunkenc.ValFloat, or we could hint them by having our own mimirchunk.ValFloatTODO = chunkenc.ValFloat so they would pay special attention to that ValFloatTODO. This especially corresponds to every place where we do Next() == chunkenc.ValFloat and Seek(...) == chunkenc.ValFloat comparisons.

I'll try to take at least a quick look at the vendored changes before approving this.

colega · 2022-12-14T13:35:26Z

pkg/frontend/querymiddleware/querysharding.go

 			res = append(res, SampleStream{
 				Labels:  mimirpb.FromLabelsToLabelAdapters(series.Metric),
-				Samples: mimirpb.FromPointsToSamples(series.Points),
+				Samples: fromPointsToSamples(series.Points),


Since SampleStream is a local struct, can we change it in a way that it would still support the casting instead of creating a new slice of samples here?

probably yes, but needs some work to align the mimir.proto definition with promql.Point definition.

Made an issue: #3738

Going back to @colega original suggestion, I think adding a dummy pointer into the structure might make it possible to keep the original cast for cases when all samples are floats - so until we officially support native histograms basically.

pkg/frontend/querymiddleware/querysharding_test.go

colega · 2022-12-14T13:38:26Z

pkg/frontend/querymiddleware/querysharding_test_utils_test.go

@@ -271,7 +272,7 @@ func TestNewMockShardedqueryable(t *testing.T) {
 				seriesCt++
 				iter := set.At().Iterator()
 				samples := 0
-				for iter.Next() {
+				for iter.Next() == chunkenc.ValFloat {


I would say this should be:

Suggested change

for iter.Next() == chunkenc.ValFloat {

for iter.Next() != chunkenc.ValNone {

We don't have histogram samples now, but this way the code should work once we start having them, wdyt?
(This is not a blocker comment)

I'm also adding an assert that specifies that this mock function is generating float values

colega · 2022-12-14T13:40:38Z

pkg/frontend/querymiddleware/sharded_queryable_test.go

@@ -450,7 +451,7 @@ func seriesSetToSampleStreams(set storage.SeriesSet) ([]SampleStream, error) {
 		stream := SampleStream{Labels: mimirpb.FromLabelsToLabelAdapters(set.At().Labels())}

 		it := set.At().Iterator()
-		for it.Next() {
+		for it.Next() == chunkenc.ValFloat {


Similar here, I'd get the type of next, and maybe fail the test if we find one of those values? That will save debugging time for the engineers implementing histograms and fixing these tests.

changing, plus adding error return if series contains non float val

colega · 2022-12-14T13:43:22Z

pkg/ingester/ingester.go

+		for it.Next() != chunkenc.ValNone {
 			t, v := it.At()


We should fail if we get a native histogram here.

Suggested change

for it.Next() != chunkenc.ValNone {

t, v := it.At()

for valType := it.Next(); valType != chunkenc.ValNone; valType = it.Next() {

if valType != chunkenc.ValFloat {

return 0, 0, fmt.Errorf("unsupported value type: %v", valType)

t, v := it.At()

yes, adding this code

colega · 2022-12-14T14:19:06Z

pkg/querier/batch/stream.go

+		if !ensureBatchType(valueTypes) {
+			ensureBatchType(valueTypes)
 		}


I think this should do the same? Also it seems that this is the only usage of ensureBatchType so the return value isn't considered.

Suggested change

if !ensureBatchType(valueTypes) {

ensureBatchType(valueTypes)

}

ensureBatchType(valueTypes)

It's a bit more complicated. There's a special case where upon noticing that the Batch type is wrong, we need to close the current Batch (call nextBatch) and then check again that the Batch is of the correct type. So basically run the algorithm twice, but on the second try we can be sure that we're not in the middle of a Batch. I'll try to make this nicer.

pkg/querier/iterators/chunk_merge_iterator.go

pkg/storage/chunk/prometheus_chunk.go

colega · 2022-12-14T14:38:04Z

pkg/storage/chunk/prometheus_chunk.go

+			batch.Timestamps[j] = t
+			batch.FloatHistogramValues[j] = fh
+		}
+	}


Should we have a default: panic case here?

yes, adding

colega

I've also reviewed the non-prometheus vendored deps (I assume prometheus was reviewed when it was brought to mimir-prometheus, plus it's just an insane number of changes).

Changes LGTM modulo the previously mentioned comments, but unblocking with an approval.

zenador · 2022-12-15T12:13:06Z

@colega Thank you for your comments! These changes were picked from a branch that is still WIP and far from being ready for review, and many of these changes are just temporary placeholders to get things to work for now and will definitely be improved before the PR is marked ready, e.g. we do things like for iter.Next() == chunkenc.ValFloat instead iter.Next() != chunkenc.ValNone to ignore histograms until we explicitly add support for them, and we can search for these instances to know what needs to be fixed. Currently (on that branch) we don't make things fail if we get a native histogram, because the plan is to support them before merge, but if we somehow need to merge that branch early then we'll add errors for the unsupported parts. For this PR it should be fine, because we won't get native histograms as the relevant flags are not enabled. Most of the code will need to be revisited for native histograms, so we're trying to increase test coverage and fix failing tests. Anyway, I'll keep your comments here as reference and ensure that they are updated on that branch eventually (some will be more tricky than others) before it's merged, and will resolve them here when it's fixed over there (if github allows that on a merged PR), and some have already been changed on the other branch.

pkg/ingester/ingester_test.go

pkg/querier/batch/batch_test.go

pkg/querier/distributor_queryable_test.go

pkg/querier/iterators/chunk_merge_iterator_test.go

pkg/querier/remote_read.go

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

pracucci · 2022-12-20T14:30:37Z

pkg/querier/batch/stream.go

-	// if the current batch being appended is full.
-	checkForFullBatch := func() {
+	// Step to the next Batch in the result, create it if it does not exist
+	nextBatch := func(valueTypes chunkenc.ValueType) {


Suggested change

nextBatch := func(valueTypes chunkenc.ValueType) {

nextBatch := func(valueType chunkenc.ValueType) {

pracucci · 2022-12-20T14:31:34Z

pkg/querier/batch/stream.go

@@ -113,3 +175,42 @@ func mergeStreams(left, right batchStream, result batchStream, size int) batchSt
 	result.reset()
 	return result
 }
+
+func createBatch(valueTypes chunkenc.ValueType) (batch chunk.Batch) {


Suggested change

func createBatch(valueTypes chunkenc.ValueType) (batch chunk.Batch) {

func createBatch(valueType chunkenc.ValueType) (batch chunk.Batch) {

pracucci · 2022-12-20T14:31:42Z

pkg/querier/batch/stream.go

+}
+
+// Call on uninitialized Batch to allocate the correct valueType array
+func initBatchType(batch *chunk.Batch, valueTypes chunkenc.ValueType) {


Suggested change

func initBatchType(batch *chunk.Batch, valueTypes chunkenc.ValueType) {

func initBatchType(batch *chunk.Batch, valueType chunkenc.ValueType) {

pracucci · 2022-12-20T14:31:48Z

pkg/querier/batch/stream.go

+	}
+}
+
+func populate(b *chunk.Batch, s batchStream, valueTypes chunkenc.ValueType) {


Suggested change

func populate(b *chunk.Batch, s batchStream, valueTypes chunkenc.ValueType) {

func populate(b *chunk.Batch, s batchStream, valueType chunkenc.ValueType) {

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

pracucci · 2022-12-20T14:40:09Z

pkg/querier/batch/stream.go

+				right.next()
+			} else {
+				if (rt == chunkenc.ValHistogram || rt == chunkenc.ValFloatHistogram) && lt == chunkenc.ValFloat {
+					// Prefer historgrams over floats. Take left side if both has histograms.


The comment "Take left side if both has histograms." should be moved inside the else branch.

pracucci · 2022-12-20T14:42:20Z

pkg/querier/batch/stream.go

+
+// mergeStreams merges streams of Batches of the same series over time.
+// Samples are simply merged by time when they are the same type (float/histogram/...), with the left stream taking precedence if the timestamps are equal.
+// When sample are different type, batches are not merged. In case of equal timestamps, histograms take precedence since they have more information.
 func mergeStreams(left, right batchStream, result batchStream, size int) batchStream {


Do we have a benchmark here to check if all the function calls indirection introduce a performance penalty? This is an hot path for us.

pracucci · 2022-12-20T14:43:09Z

pkg/querier/batch/stream.go

+		for {
+			if t := bs.hasNext(); t != chunkenc.ValNone {


Isn't easier this way?

Suggested change

for {

if t := bs.hasNext(); t != chunkenc.ValNone {

for t := bs.hasNext(); t != chunkenc.ValNone; t = bs.hasNext() {

pracucci · 2022-12-20T14:44:20Z

pkg/querier/batch/stream.go

+	}
+
+	// Ensure that the batch we're writing to is not full and of the right type.
+	ensureBatch := func(valueTypes chunkenc.ValueType) {


Suggested change

ensureBatch := func(valueTypes chunkenc.ValueType) {

ensureBatch := func(valueType chunkenc.ValueType) {

pracucci · 2022-12-20T14:44:29Z

pkg/querier/batch/stream.go

+		b = &result[resultLen-1]
+	}
+
+	ensureBatchType := func(valueTypes chunkenc.ValueType) {


Suggested change

ensureBatchType := func(valueTypes chunkenc.ValueType) {

ensureBatchType := func(valueType chunkenc.ValueType) {

pracucci · 2022-12-20T14:59:55Z

pkg/querier/batch/stream.go

 func mergeStreams(left, right batchStream, result batchStream, size int) batchStream {
+
 	// Reset the Index and Length of existing batches.
 	for i := range result {


Shouldn't we reset the value type as well? I think we should have a unit test covering the case result is reused and data types are different.

pracucci · 2022-12-20T15:05:48Z

pkg/querier/batch/stream.go

+			// then the one provided.
+			result = append(result, createBatch(valueTypes))
+		}
+		b = &result[resultLen-1]


Why don't we directly set the b type here to valueTypes? The function nextBatch() looks inconsistent to me, because the input valueTypes is honored only if a new batch is created, but not reused.

pracucci · 2022-12-20T15:08:28Z

pkg/querier/batch/stream.go

-				result = append(result, chunk.Batch{})
-			}
-			b = &result[resultLen-1]
+			nextBatch(valueTypes)


Assuming the suggestion I left in nextBatch() is correct, then here you can return after calling nextBatch(). If we do return, then I would remove ensureBatchType() function at all and inline it below here, given it's just used here.

pracucci · 2022-12-20T15:27:17Z

I quickly checked the vendored changes and LGTM. Prometheus changes correctness not checked (too many changes), but assuming logic is correct then I didn't see any change that requires missing changes on Mimir side (excluding support for native histograms, which we're working on...).

Error out if iterator is asked to merge non float chunks: New/Seek merge Otherwise handle non floats normally Next and nonOverlaping Seek Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

The new code had issues and unknown performance. Mitigate risk by not handling histograms in Batches and restore the current way of working with additional checks to avoid histograms. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

This works out fine since the caller checks the ensuing batch.Length, thus propagating the emptiness of the iterator. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

Be explicit in asserts instead. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

pracucci

Good job! This simplified version looks definitely safer to merge to me. I left few last comments.

In few places you will find a suggestion about adding a "TODO comment". I'm not a big fan of TODO comments in the code, but I've seen it's a pattern you already adopted (in this PR there are already few TODOs). Up to you if you want to address these comments. My main fear is that we may forget about it in few places when we'll add native histogram support, but maybe it's a not justified fear, so I just defer to you.

pracucci · 2022-12-22T13:26:33Z

pkg/querier/block_test.go

@@ -482,7 +483,7 @@ func Benchmark_blockQuerierSeriesSet_iteration(b *testing.B) {
 		set := blockQuerierSeriesSet{series: series}

 		for set.Next() {
-			for t := set.At().Iterator(); t.Next(); {
+			for t := set.At().Iterator(); t.Next() == chunkenc.ValFloat; {


Shouldn't be != none?

pracucci · 2022-12-22T13:29:20Z

tools/tsdb-chunks/main.go

-				ts, val := it.At()
-				if ts < minTS {
-					minTS = ts
+			for valType := it.Next(); it.Err() == nil && valType != chunkenc.ValNone; valType = it.Next() {


[nit] By contract, Next() must return ValNone on error.

Suggested change

for valType := it.Next(); it.Err() == nil && valType != chunkenc.ValNone; valType = it.Next() {

for valType := it.Next(); valType != chunkenc.ValNone; valType = it.Next() {

i thought of this, but wasn't sure, fixed

pracucci · 2022-12-22T13:29:32Z

tools/tsdb-index-health/main.go

@@ -338,8 +339,13 @@ func verifyChunks(l log.Logger, cr *chunks.Reader, lset labels.Labels, chks []ch
 		prevTs := int64(-1)

 		it := ch.Iterator(nil)
-		for it.Err() == nil && it.Next() {
+		for valType := it.Next(); it.Err() == nil && valType != chunkenc.ValNone; valType = it.Next() {


[nit] By contract, Next() must return ValNone on error.

Suggested change

for valType := it.Next(); it.Err() == nil && valType != chunkenc.ValNone; valType = it.Next() {

for valType := it.Next(); valType != chunkenc.ValNone; valType = it.Next() {

pracucci · 2022-12-22T13:29:51Z

tools/tsdb-print-chunk/main.go

-		for it.Err() == nil && it.Next() {
-			ts, val := it.At()
-			fmt.Printf("%g\t%d (%s)\n", val, ts, timestamp.Time(ts).UTC().Format(time.RFC3339Nano))
+		for valType := it.Next(); it.Err() == nil && valType != chunkenc.ValNone; valType = it.Next() {


[nit] By contract, Next() must return ValNone on error.

Suggested change

for valType := it.Next(); it.Err() == nil && valType != chunkenc.ValNone; valType = it.Next() {

for valType := it.Next(); valType != chunkenc.ValNone; valType = it.Next() {

pkg/storage/chunk/prometheus_chunk.go

pkg/querier/iterators/chunk_merge_iterator.go

pkg/querier/batch/batch.go

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

pracucci

3..2..1.. LGTM! I just left a final nit. 🙇

pracucci · 2022-12-22T14:58:14Z

pkg/querier/iterators/chunk_merge_iterator.go

 	c.h = c.h[:0]

+	if c.currErr != nil {


I think should be moved above the line c.h = c.h[:0]

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

bboreham force-pushed the prometheus-0db77b4c7 branch from 14b9fb9 to 47b2810 Compare December 8, 2022 17:59

bboreham changed the title ~~Update to Mimir-Prometheus 0db77b4c7 with Sparse Histograms~~ Update to Mimir-Prometheus 0db77b4c7 Dec 8, 2022

bboreham force-pushed the prometheus-0db77b4c7 branch from 47b2810 to 021ce71 Compare December 8, 2022 18:14

bboreham and others added 5 commits December 14, 2022 12:13

Update mimir-prometheus to latest main 0db77b4c76da

273c50a

`tsdb/wal` was renamed to `tsdb/wlog`. Several other libraries were bumped by this, including prometheus/client-go and prometheus/exporter-toolkit.

go mod vendor

0cccf6c

Update prometheus/alertmanager to latest 1b202b91bf26

93c0c65

Required to build against prometheus/exporter-toolkit.

Cope with Prometheus interface changes for Sparse Histograms

f577f05

Copied from the sparsehistograms branch. Co-authored-by: Levi Harrison <git@leviharrison.dev> Co-authored-by: Jeanette Tan <jeanette.tan@grafana.com>

Stop casting Prometheus points to Mimir samples

4de050e

They are no longer identical in memory layout. This puts up the cost of query sharding slightly.

bboreham force-pushed the prometheus-0db77b4c7 branch from fd1f1ef to 4de050e Compare December 14, 2022 12:21

bboreham marked this pull request as ready for review December 14, 2022 12:22

bboreham requested a review from a team as a code owner December 14, 2022 12:22

colega reviewed Dec 14, 2022

View reviewed changes

colega approved these changes Dec 14, 2022

View reviewed changes