Skip to content

Commit

Permalink
compact: add metric thanos_compactor_iterations_total (thanos-io#1733)
Browse files Browse the repository at this point in the history
* compact: add metric thanos_compactor_iterations_total

Add a metric called thanos_compactor_iterations_total that is a counter
and will get increased by 1 every time an iteration gets executed
successfully. This is needed in case --wait is specified and then our
Compactor could die. We need to alert on such a case.

One thing would be to alert on a restart of the container however that
is not the most flexible thing - it might still be OK as long as it
successfully finishes its job in time. However, it is impossible to know
that exact part ATM.

Add this metric so that users could add alerts like:

```
rate(thanos_compactor_iterations_total[1d]) == 0
FOR 3d
```

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>

* CHANGELOG: add entry

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>

* compact: simplify wait check

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>

* cmd: thanos: compact: remove wait check

Let's register the metric no matter what since if it is run as a batch
job then this metric does not matter either way.

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>

* CHANGELOG: add period

Add a period at the end of an item in the CHANGELOG to keep it uniform.

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>
Signed-off-by: Aleksey Sin <asin@ozon.ru>
  • Loading branch information
GiedriusS authored and Aleksey Sin committed Nov 26, 2019
1 parent bc3c50d commit 0e8ae5f
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 0 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Expand Up @@ -18,6 +18,7 @@ We use *breaking* word for marking changes that are not backward compatible (rel
- [#1573](https://github.com/thanos-io/thanos/pull/1573) `AliYun OSS` object storage, see [documents](docs/storage.md#aliyun-oss) for further information.
- [#1680](https://github.com/thanos-io/thanos/pull/1680) Add a new `--http-grace-period` CLI option to components which serve HTTP to set how long to wait until HTTP Server shuts down.
- [#1712](https://github.com/thanos-io/thanos/pull/1712) Rename flag on bucket web component from `--listen` to `--http-address` to match other components.
- [#1733](https://github.com/thanos-io/thanos/pull/1733) New metric `thanos_compactor_iterations_total` on Thanos Compactor which shows the number of successful iterations.

### Fixed

Expand Down
6 changes: 6 additions & 0 deletions cmd/thanos/compact.go
Expand Up @@ -168,10 +168,15 @@ func runCompact(
Name: "thanos_compactor_retries_total",
Help: "Total number of retries after retriable compactor error",
})
iterations := prometheus.NewCounter(prometheus.CounterOpts{
Name: "thanos_compactor_iterations_total",
Help: "Total number of iterations that were executed successfully",
})
halted.Set(0)

reg.MustRegister(halted)
reg.MustRegister(retried)
reg.MustRegister(iterations)

downsampleMetrics := newDownsampleMetrics(reg)

Expand Down Expand Up @@ -313,6 +318,7 @@ func runCompact(
return runutil.Repeat(5*time.Minute, ctx.Done(), func() error {
err := f()
if err == nil {
iterations.Inc()
return nil
}

Expand Down

0 comments on commit 0e8ae5f

Please sign in to comment.