Skip to content

Commit

Permalink
fix(background cache): increment queue size exactly once (#11776)
Browse files Browse the repository at this point in the history
**What this PR does / why we need it**:
While looking at the metrics from one of our cells, we noticed that the
queue size is set to a very high value even when the length is pretty
much 0.

Background cache is incrementing the queue size twice for each enqueued
key. This pr removes the additional increment call.

`TestBackgroundSizeLimit` might be flaky because of the
[writeBackLoop](https://github.com/grafana/loki/blob/8eb09c78c842b61f6619fb3755a43b180536b761/pkg/storage/chunk/cache/background.go#L193)
which dequeues from the channel and reduces the queue size concurrently.
To make the test predictable, I have set the `WriteBackGoroutines` to 0.

**Which issue(s) this PR fixes**:
Fixes #<issue number>

**Special notes for your reviewer**:

**Checklist**
- [x] Reviewed the
[`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md)
guide (**required**)
- [ ] Documentation added
- [x] Tests updated
- [ ] `CHANGELOG.md` updated
- [ ] If the change is worth mentioning in the release notes, add
`add-to-release-notes` label
- [ ] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/setup/upgrade/_index.md`
- [ ] For Helm chart changes bump the Helm chart version in
`production/helm/loki/Chart.yaml` and update
`production/helm/loki/CHANGELOG.md` and
`production/helm/loki/README.md`. [Example
PR](d10549e)
- [ ] If the change is deprecating or removing a configuration option,
update the `deprecated-config.yaml` and `deleted-config.yaml` files
respectively in the `tools/deprecated-config-checker` directory.
[Example
PR](0d4416a)

(cherry picked from commit 5c8fd52)
  • Loading branch information
ashwanthgoli authored and grafana-delivery-bot[bot] committed Jan 25, 2024
1 parent 126e6d3 commit 59f2262
Show file tree
Hide file tree
Showing 4 changed files with 71 additions and 6 deletions.
61 changes: 61 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,67 @@
* [11612](https://github.com/grafana/loki/pull/11612) **dannykopping** Ruler: Fixed another data race in tenant config.
* [11654](https://github.com/grafana/loki/pull/11654) **dannykopping** Cache: atomically check background cache size limit correctly.
* [11657](https://github.com/grafana/loki/pull/11657) **ashwanthgoli** Log results cache: compose empty response based on the request being served to avoid returning incorrect limit or direction.
* [11587](https://github.com/grafana/loki/pull/11587) **trevorwhitney** Fix semantics of label parsing logic of metrics and logs queries. Both only parse the first label if multiple extractions into the same label are requested.
* [11776](https://github.com/grafana/loki/pull/11776) **ashwanthgoli** Background Cache: Fixes a bug that is causing the background queue size to be incremented twice for each enqueued item.

##### Changes

* [11490](https://github.com/grafana/loki/pull/11490) **andresperezl**: Helm: Use `/ingester/shutdown` for `preStop` hook in write pods.
* [10366](https://github.com/grafana/loki/pull/10366) **shantanualsi** Upgrade thanos objstore, dskit and other modules
* [10451](https://github.com/grafana/loki/pull/10451) **shantanualsi** Upgrade thanos `objstore`
* [10814](https://github.com/grafana/loki/pull/10814) **shantanualsi,kaviraj** Upgrade prometheus to v0.47.1 and dskit
* [10959](https://github.com/grafana/loki/pull/10959) **slim-bean** introduce a backoff wait on subquery retries.
* [11121](https://github.com/grafana/loki/pull/11121) **periklis** Ensure all lifecycler cfgs ref a valid IPv6 addr and port combination
* [10650](https://github.com/grafana/loki/pull/10650) **matthewpi** Ensure the frontend uses a valid IPv6 addr and port combination
* [11665](https://github.com/grafana/loki/pull/11665) **salvacorts** Deprecate and flip `-legacy-read-mode` flag to `false` by default.

#### Promtail

* [10752](https://github.com/grafana/loki/pull/10752) **gonzalesraul**: structured_metadata: enable structured_metadata convert labels
* [11511](https://github.com/grafana/loki/pull/11511) **kavirajk**: chore(promtail): Improve default configuration that is shipped with rpm/deb packages to avoid possible high CPU utilisation if there are lots of files inside `/var/log`.

##### Enhancements

* [10416](https://github.com/grafana/loki/pull/10416) **lpugoy**: Lambda-Promtail: Add support for WAF logs in S3
* [10301](https://github.com/grafana/loki/pull/10301) **wildum**: users can now define `additional_fields` in cloudflare configuration.
* [10755](https://github.com/grafana/loki/pull/10755) **hainenber**: Lambda-Promtail: Add support for dropping labels passed via env var

##### Changes

* [10677](https://github.com/grafana/loki/pull/10677) **chaudum** Remove deprecated `stream_lag_labels` setting from both the `options` and `client` configuration sections.
* [10689](https://github.com/grafana/loki/pull/10689) **dylanguedes**: Ingester: Make jitter to be 20% of flush check period instead of 1%.
* [11420](https://github.com/grafana/loki/pull/11420) **zry98**: Show a clearer reason in "disable watchConfig" log message when server is disabled.

##### Fixes

* [10631](https://github.com/grafana/loki/pull/10631) **thampiotr**: Fix race condition in cleaning up metrics when stopping to tail files.
* [10798](https://github.com/grafana/loki/pull/10798) **hainenber**: Fix agent panicking after reloaded due to duplicate metric collector registration.
* [10848](https://github.com/grafana/loki/pull/10848) **rgroothuijsen**: Correctly parse list of drop stage sources from YAML.

#### LogCLI

#### Mixins

* [11087](https://github.com/grafana/loki/pull/11087) **JoaoBraveCoding**: Adds structured metadata panels for ingested data
* [11637](https://github.com/grafana/loki/pull/11637) **JoaoBraveCoding**: Add route to write Distributor Latency dashboard

#### Fixes

#### FluentD

#### Jsonnet

* [11312](https://github.com/grafana/loki/pull/11312) **sentoz**: Loki ksonnet: Do not generate configMap for consul if you are using memberlist

* [11020](https://github.com/grafana/loki/pull/11020) **ashwanthgoli**: Loki ksonnet: Do not generate table-manager manifests if shipper store is in-use.

* [10784](https://github.com/grafana/loki/pull/10894) **slim-bean** Update index gateway client to use a headless service.

* [10542](https://github.com/grafana/loki/pull/10542) **chaudum**: Remove legacy deployment mode for ingester (Deployment, without WAL) and instead always run them as StatefulSet.

## 2.9.2 (2023-10-16)

### All Changes

##### Security

Expand Down
1 change: 0 additions & 1 deletion pkg/storage/chunk/cache/background.go
Original file line number Diff line number Diff line change
Expand Up @@ -142,7 +142,6 @@ func (c *backgroundCache) Store(ctx context.Context, keys []string, bufs [][]byt

select {
case c.bgWrites <- bgWrite:
c.size.Add(int64(size))
c.queueBytes.Set(float64(c.size.Load()))
c.queueLength.Add(float64(num))
default:
Expand Down
5 changes: 5 additions & 0 deletions pkg/storage/chunk/cache/background_extra_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,8 @@ func Flush(c Cache) {
close(b.bgWrites)
b.wg.Wait()
}

func QueueSize(c Cache) int64 {
b := c.(*backgroundCache)
return b.size.Load()
}
10 changes: 5 additions & 5 deletions pkg/storage/chunk/cache/background_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ func TestBackgroundSizeLimit(t *testing.T) {
require.NoError(t, err)

c := cache.NewBackground("mock", cache.BackgroundConfig{
WriteBackGoroutines: 1,
WriteBackGoroutines: 0,
WriteBackBuffer: 100,
WriteBackSizeLimit: flagext.ByteSize(limit),
}, cache.NewMockCache(), nil)
Expand All @@ -63,10 +63,10 @@ func TestBackgroundSizeLimit(t *testing.T) {

// store the first 10KB
require.NoError(t, c.Store(ctx, []string{firstKey}, [][]byte{first}))
require.Equal(t, cache.QueueSize(c), int64(10e3))

// second key will not be stored because it will exceed the 15KB limit
require.NoError(t, c.Store(ctx, []string{secondKey}, [][]byte{second}))
cache.Flush(c)

found, _, _, _ := c.Fetch(ctx, []string{firstKey, secondKey})
require.Equal(t, []string{firstKey}, found)
require.Equal(t, cache.QueueSize(c), int64(10e3))
c.Stop()
}

0 comments on commit 59f2262

Please sign in to comment.