fix(background cache): increment queue size exactly once (#11776)

**What this PR does / why we need it**: While looking at the metrics from one of our cells, we noticed that the queue size is set to a very high value even when the length is pretty much 0. Background cache is incrementing the queue size twice for each enqueued key. This pr removes the additional increment call. `TestBackgroundSizeLimit` might be flaky because of the [writeBackLoop](https://github.com/grafana/loki/blob/8eb09c78c842b61f6619fb3755a43b180536b761/pkg/storage/chunk/cache/background.go#L193) which dequeues from the channel and reduces the queue size concurrently. To make the test predictable, I have set the `WriteBackGoroutines` to 0. **Which issue(s) this PR fixes**: Fixes #<issue number> **Special notes for your reviewer**: **Checklist** - [x] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (**required**) - [ ] Documentation added - [x] Tests updated - [ ] `CHANGELOG.md` updated - [ ] If the change is worth mentioning in the release notes, add `add-to-release-notes` label - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/setup/upgrade/_index.md` - [ ] For Helm chart changes bump the Helm chart version in `production/helm/loki/Chart.yaml` and update `production/helm/loki/CHANGELOG.md` and `production/helm/loki/README.md`. [Example PR](d10549e) - [ ] If the change is deprecating or removing a configuration option, update the `deprecated-config.yaml` and `deleted-config.yaml` files respectively in the `tools/deprecated-config-checker` directory. [Example PR](0d4416a) (cherry picked from commit 5c8fd52)
grafana · Jan 25, 2024 · 59f2262 · 59f2262
1 parent 126e6d3
commit 59f2262
Show file tree

Hide file tree

Showing 4 changed files with 71 additions and 6 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -8,6 +8,67 @@
 * [11612](https://github.com/grafana/loki/pull/11612) **dannykopping** Ruler: Fixed another data race in tenant config.
 * [11654](https://github.com/grafana/loki/pull/11654) **dannykopping** Cache: atomically check background cache size limit correctly.
 * [11657](https://github.com/grafana/loki/pull/11657) **ashwanthgoli** Log results cache: compose empty response based on the request being served to avoid returning incorrect limit or direction.
+* [11587](https://github.com/grafana/loki/pull/11587) **trevorwhitney** Fix semantics of label parsing logic of metrics and logs queries. Both only parse the first label if multiple extractions into the same label are requested.
+* [11776](https://github.com/grafana/loki/pull/11776) **ashwanthgoli** Background Cache: Fixes a bug that is causing the background queue size to be incremented twice for each enqueued item.
+
+##### Changes
+
+* [11490](https://github.com/grafana/loki/pull/11490) **andresperezl**: Helm: Use `/ingester/shutdown` for `preStop` hook in write pods.
+* [10366](https://github.com/grafana/loki/pull/10366) **shantanualsi** Upgrade thanos objstore, dskit and other modules
+* [10451](https://github.com/grafana/loki/pull/10451) **shantanualsi** Upgrade thanos `objstore`
+* [10814](https://github.com/grafana/loki/pull/10814) **shantanualsi,kaviraj** Upgrade prometheus to v0.47.1 and dskit
+* [10959](https://github.com/grafana/loki/pull/10959) **slim-bean** introduce a backoff wait on subquery retries.
+* [11121](https://github.com/grafana/loki/pull/11121) **periklis** Ensure all lifecycler cfgs ref a valid IPv6 addr and port combination
+* [10650](https://github.com/grafana/loki/pull/10650) **matthewpi** Ensure the frontend uses a valid IPv6 addr and port combination
+* [11665](https://github.com/grafana/loki/pull/11665) **salvacorts** Deprecate and flip `-legacy-read-mode` flag to `false` by default.
+
+#### Promtail
+
+* [10752](https://github.com/grafana/loki/pull/10752) **gonzalesraul**: structured_metadata: enable structured_metadata convert labels
+* [11511](https://github.com/grafana/loki/pull/11511) **kavirajk**: chore(promtail): Improve default configuration that is shipped with rpm/deb packages to avoid possible high CPU utilisation if there are lots of files inside `/var/log`.
+
+##### Enhancements
+
+* [10416](https://github.com/grafana/loki/pull/10416) **lpugoy**: Lambda-Promtail: Add support for WAF logs in S3
+* [10301](https://github.com/grafana/loki/pull/10301) **wildum**: users can now define `additional_fields` in cloudflare configuration.
+* [10755](https://github.com/grafana/loki/pull/10755) **hainenber**: Lambda-Promtail: Add support for dropping labels passed via env var
+
+##### Changes
+
+* [10677](https://github.com/grafana/loki/pull/10677) **chaudum** Remove deprecated `stream_lag_labels` setting from both the `options` and `client` configuration sections.
+* [10689](https://github.com/grafana/loki/pull/10689) **dylanguedes**: Ingester: Make jitter to be 20% of flush check period instead of 1%.
+* [11420](https://github.com/grafana/loki/pull/11420) **zry98**: Show a clearer reason in "disable watchConfig" log message when server is disabled.
+
+##### Fixes
+
+* [10631](https://github.com/grafana/loki/pull/10631) **thampiotr**: Fix race condition in cleaning up metrics when stopping to tail files.
+* [10798](https://github.com/grafana/loki/pull/10798) **hainenber**: Fix agent panicking after reloaded due to duplicate metric collector registration.
+* [10848](https://github.com/grafana/loki/pull/10848) **rgroothuijsen**: Correctly parse list of drop stage sources from YAML.
+
+#### LogCLI
+
+#### Mixins
+
+* [11087](https://github.com/grafana/loki/pull/11087) **JoaoBraveCoding**: Adds structured metadata panels for ingested data
+* [11637](https://github.com/grafana/loki/pull/11637) **JoaoBraveCoding**: Add route to write Distributor Latency dashboard
+
+#### Fixes
+
+#### FluentD
+
+#### Jsonnet
+
+* [11312](https://github.com/grafana/loki/pull/11312) **sentoz**: Loki ksonnet: Do not generate configMap for consul if you are using memberlist
+
+* [11020](https://github.com/grafana/loki/pull/11020) **ashwanthgoli**: Loki ksonnet: Do not generate table-manager manifests if shipper store is in-use.
+
+* [10784](https://github.com/grafana/loki/pull/10894) **slim-bean** Update index gateway client to use a headless service.
+
+* [10542](https://github.com/grafana/loki/pull/10542) **chaudum**: Remove legacy deployment mode for ingester (Deployment, without WAL) and instead always run them as StatefulSet.
+
+## 2.9.2 (2023-10-16)
+
+### All Changes
 
 ##### Security
 

diff --git a/pkg/storage/chunk/cache/background.go b/pkg/storage/chunk/cache/background.go
@@ -142,7 +142,6 @@ func (c *backgroundCache) Store(ctx context.Context, keys []string, bufs [][]byt
 
 		select {
 		case c.bgWrites <- bgWrite:
-			c.size.Add(int64(size))
 			c.queueBytes.Set(float64(c.size.Load()))
 			c.queueLength.Add(float64(num))
 		default:

diff --git a/pkg/storage/chunk/cache/background_extra_test.go b/pkg/storage/chunk/cache/background_extra_test.go
@@ -5,3 +5,8 @@ func Flush(c Cache) {
 	close(b.bgWrites)
 	b.wg.Wait()
 }
+
+func QueueSize(c Cache) int64 {
+	b := c.(*backgroundCache)
+	return b.size.Load()
+}
diff --git a/pkg/storage/chunk/cache/background_test.go b/pkg/storage/chunk/cache/background_test.go
@@ -47,7 +47,7 @@ func TestBackgroundSizeLimit(t *testing.T) {
 	require.NoError(t, err)
 
 	c := cache.NewBackground("mock", cache.BackgroundConfig{
-		WriteBackGoroutines: 1,
+		WriteBackGoroutines: 0,
 		WriteBackBuffer:     100,
 		WriteBackSizeLimit:  flagext.ByteSize(limit),
 	}, cache.NewMockCache(), nil)
@@ -63,10 +63,10 @@ func TestBackgroundSizeLimit(t *testing.T) {
 
 	// store the first 10KB
 	require.NoError(t, c.Store(ctx, []string{firstKey}, [][]byte{first}))
+	require.Equal(t, cache.QueueSize(c), int64(10e3))
+
 	// second key will not be stored because it will exceed the 15KB limit
 	require.NoError(t, c.Store(ctx, []string{secondKey}, [][]byte{second}))
-	cache.Flush(c)
-
-	found, _, _, _ := c.Fetch(ctx, []string{firstKey, secondKey})
-	require.Equal(t, []string{firstKey}, found)
+	require.Equal(t, cache.QueueSize(c), int64(10e3))
+	c.Stop()
 }