Skip to content

Conversation

periklis
Copy link

@periklis periklis commented Feb 1, 2024

Refs:

salvacorts and others added 27 commits January 26, 2024 13:51
**What this PR does / why we need it**:
This PR modifies the metrics of the bloom compactor to:
- Add new histogram metrics to track the time spent to process
compaction cycles, tenants and jobs.
- Replace `_succeeded` and `_failed` metrics with a new `_completed`
metric with a status label.

**Which issue(s) this PR fixes**:
Fixes #<issue number>

**Special notes for your reviewer**:

**Checklist**
- [ ] Reviewed the
[`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md)
guide (**required**)
- [ ] Documentation added
- [ ] Tests updated
- [ ] `CHANGELOG.md` updated
- [ ] If the change is worth mentioning in the release notes, add
`add-to-release-notes` label
- [ ] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/setup/upgrade/_index.md`
- [ ] For Helm chart changes bump the Helm chart version in
`production/helm/loki/Chart.yaml` and update
`production/helm/loki/CHANGELOG.md` and
`production/helm/loki/README.md`. [Example
PR](grafana@d10549e)
- [ ] If the change is deprecating or removing a configuration option,
update the `deprecated-config.yaml` and `deleted-config.yaml` files
respectively in the `tools/deprecated-config-checker` directory.
[Example
PR](grafana@0d4416a)
**What this PR does / why we need it**:

Adding the stage back after removing it in
grafana#11785 to fix the test failing in
main.
**What this PR does / why we need it**:

We ported this fix a few days ago to the Agent via this
[PR](grafana/agent#6201) but the added test is
often failing in our drone pipeline.

I believe that there are two reasons why this test is flaky:
- the test expects that the 5 last lines of logs are the expected ones
but it seems that other lines might be logged as well
(https://drone.grafana.net/grafana/agent/16045/4/2)
- we saw that the simulated container did not always have enough time to
stop before calling the tgt.startIfNotRunning() to restart it.
(https://drone.grafana.net/grafana/agent/16077/4/2)

Fix:
- the test now uses "assert.EventuallyWithT" and checks that within 5
seconds the expected log lines will be amongst the logs without
duplicate.
- the test now stops the simulated container before restarting it

**Checklist**
- [x] Reviewed the
[`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md)
guide (**required**)

---------

Co-authored-by: Michel Hollands <42814411+MichelHollands@users.noreply.github.com>
…fana#11793)

This PR refactors a bunch of code to separate state+I/O related
complexity from logic so we can test and extend it more easily. It also
adds more logic for the bloom generator to use.
…#11524)

Co-authored-by: Periklis Tsirakidis <periklis@redhat.com>
**What this PR does / why we need it**:
Misspelling in top level key name, promtail accept docker_sd_config**s**
but not docker_sd_config

**Checklist**
- [x] Reviewed the
[`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md)
guide (**required**)
- [ ] Documentation added
- [ ] Tests updated
- [ ] `CHANGELOG.md` updated
- [ ] If the change is worth mentioning in the release notes, add
`add-to-release-notes` label
- [ ] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/setup/upgrade/_index.md`
- [ ] For Helm chart changes bump the Helm chart version in
`production/helm/loki/Chart.yaml` and update
`production/helm/loki/CHANGELOG.md` and
`production/helm/loki/README.md`. [Example
PR](grafana@d10549e)
- [ ] If the change is deprecating or removing a configuration option,
update the `deprecated-config.yaml` and `deleted-config.yaml` files
respectively in the `tools/deprecated-config-checker` directory.
[Example
PR](grafana@0d4416a)

Co-authored-by: J Stickler <julie.stickler@grafana.com>
…rafana#11576)

**What this PR does / why we need it**:

**Which issue(s) this PR fixes**:
Fixes grafana#11572 

**Special notes for your reviewer**:

**Checklist**
- [x] Reviewed the
[`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md)
guide (**required**)
- [ ] Documentation added
- [ ] Tests updated
- [ ] `CHANGELOG.md` updated
- [ ] If the change is worth mentioning in the release notes, add
`add-to-release-notes` label
- [x] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/setup/upgrade/_index.md`
- [ ] For Helm chart changes bump the Helm chart version in
`production/helm/loki/Chart.yaml` and update
`production/helm/loki/CHANGELOG.md` and
`production/helm/loki/README.md`. [Example
PR](grafana@d10549e)
- [ ] If the change is deprecating or removing a configuration option,
update the `deprecated-config.yaml` and `deleted-config.yaml` files
respectively in the `tools/deprecated-config-checker` directory.
[Example
PR](grafana@0d4416a)

---------

Signed-off-by: Sheikh-Abubaker <sheikhabubaker761@gmail.com>
Co-authored-by: Michel Hollands <42814411+MichelHollands@users.noreply.github.com>
**What this PR does / why we need it**:
Following recent loki `v2.9.4` and GEL `v1.8.6` patch releases, updates
Helm chart docs and yaml.

Loki release drone creates is the [PR as
expected](https://github.com/grafana/loki/pull/11769/files), but there
seems to be a problem with the GEL drone generating the docs -which is
looked at separately.
Instead I manually updated them in one go.
…ssing them to the FusedQuerier (grafana#11809)

**What this PR does / why we need it**:

The `requestIterator` (`v1.Iterator[v1.Request]`) can be built directly
from the task and does not require any unnecessary merging of tasks,
since this is done by the `v1.FusedQuerier` already.

Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
This PR refactors the bloom gateway workers so that tasks that have been
enqueued by requests do not end up locking the results channel and
therefore the worker, in case the request was cancelled (`context
cancelled`) or timed out (`context deadline exceeded`).
It also handles errors from the shipper in a way that they are returned
to the waiting request asap so it can return and does not need to wait
for all tasks to finish.

This PR also fixes the worker shutdown in a way that it now gracefully
stops and continues to work off the remaining tasks from the queue.

---------

Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
…te struct (grafana#11812)

**What this PR does / why we need it**:

The processor executes a set of tasks.
It cleanly separates I/O from logic and therefore is more testable than the current worker implementation.

https://github.com/grafana/loki/blob/9602214abf5f0b016f1cad90e921d1e4d969856c/pkg/bloomgateway/processor.go#L33-L36
    
**Note**:

The processor is not used yet in the worker.

**Certain parts must be refactored when grafana#11810 is merged.**

---------

Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
**What this PR does / why we need it**:
Previously we relied only on the index of the tenant's queue to read the
requests from the same tenant's queue.
However, as long as the queue is aggressively used by the consumers in
parallel, there are a few edge cases when knowing the index of last used
tenant's queue is not enough to guarantee that we dequeue items from
exactly the same tenant's queue.

**Special notes for your reviewer**:

**Checklist**
- [ ] Reviewed the
[`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md)
guide (**required**)
- [ ] Documentation added
- [x] Tests updated
- [ ] `CHANGELOG.md` updated
- [ ] If the change is worth mentioning in the release notes, add
`add-to-release-notes` label
- [ ] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/setup/upgrade/_index.md`
- [ ] For Helm chart changes bump the Helm chart version in
`production/helm/loki/Chart.yaml` and update
`production/helm/loki/CHANGELOG.md` and
`production/helm/loki/README.md`. [Example
PR](grafana@d10549e)
- [ ] If the change is deprecating or removing a configuration option,
update the `deprecated-config.yaml` and `deleted-config.yaml` files
respectively in the `tools/deprecated-config-checker` directory.
[Example
PR](grafana@0d4416a)

---------

Signed-off-by: Vladyslav Diachenko <vlad.diachenko@grafana.com>
This PR continues work refactoring the bloom compactor pkg and adds a
bunch of utilities for keyspace manipulation/mgmt. A lot of the bloom
block logic is contained here as a skeleton.

This code is not complete, but is fine to merge as we don't use this yet
and should be easier to review before it gets too large.
**What this PR does / why we need it**:

**Which issue(s) this PR fixes**:
Fixes #<issue number>

**Special notes for your reviewer**:

**Checklist**
- [ ] Reviewed the
[`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md)
guide (**required**)
- [ ] Documentation added
- [ ] Tests updated
- [ ] `CHANGELOG.md` updated
- [ ] If the change is worth mentioning in the release notes, add
`add-to-release-notes` label
- [ ] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/setup/upgrade/_index.md`
- [ ] For Helm chart changes bump the Helm chart version in
`production/helm/loki/Chart.yaml` and update
`production/helm/loki/CHANGELOG.md` and
`production/helm/loki/README.md`. [Example
PR](grafana@d10549e)
- [ ] If the change is deprecating or removing a configuration option,
update the `deprecated-config.yaml` and `deleted-config.yaml` files
respectively in the `tools/deprecated-config-checker` directory.
[Example
PR](grafana@0d4416a)

---------

Co-authored-by: J Stickler <julie.stickler@grafana.com>
…ana#11825)

**What this PR does / why we need it**:
Modify our code to not emit the `GetChunks` and the
`IndexClient.GetChunkRefs` spans.
We don't worse our observability by getting rid of them because their
data is redundant with parent spans. Example:
- If an specific index-gateway or ingester struggles, the parent span
`/logproto.Querier/GetChunkIDs` will report it just fine
- If an specific query causes trouble the `query.Exec` span will show
the query name, the `start` and `end`, etc. Also, its children will show
the slower parts of the query (in case we have a weak link)

**Which issue(s) this PR fixes**:
N/A
…na#11827)

Currently, the distributor latency panel uses a "=" in the code as
opposed to a "=~", causing the panel to render as empty. This is to fix
that issue.

**What this PR does / why we need it**:

**Which issue(s) this PR fixes**:
Fixes #<issue number>

**Special notes for your reviewer**:

**Checklist**
- [ ] Reviewed the
[`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md)
guide (**required**)
- [ ] Documentation added
- [ ] Tests updated
- [ ] `CHANGELOG.md` updated
- [ ] If the change is worth mentioning in the release notes, add
`add-to-release-notes` label
- [ ] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/setup/upgrade/_index.md`
- [ ] For Helm chart changes bump the Helm chart version in
`production/helm/loki/Chart.yaml` and update
`production/helm/loki/CHANGELOG.md` and
`production/helm/loki/README.md`. [Example
PR](grafana@d10549e)
- [ ] If the change is deprecating or removing a configuration option,
update the `deprecated-config.yaml` and `deleted-config.yaml` files
respectively in the `tools/deprecated-config-checker` directory.
[Example
PR](grafana@0d4416a)
Adding fixing lake of annotations support for `read` StatefulSet

**What this PR does / why we need it**:
Currently there is no use for `read.annotations` at the
`statefulset-read.yaml` file.
The key exists on `values.yaml` file, with no use.

**Which issue(s) this PR fixes**:
Fixes grafana#11688 

**Special notes for your reviewer**:

**Checklist**
- [x] Reviewed the
[`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md)
guide (**required**)
- [ ] Documentation added
- [ ] Tests updated
- [ ] `CHANGELOG.md` updated
- [ ] If the change is worth mentioning in the release notes, add
`add-to-release-notes` label
- [ ] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/setup/upgrade/_index.md`
- [x] For Helm chart changes bump the Helm chart version in
`production/helm/loki/Chart.yaml` and update
`production/helm/loki/CHANGELOG.md` and
`production/helm/loki/README.md`. [Example
PR](grafana@d10549e)
- [ ] If the change is deprecating or removing a configuration option,
update the `deprecated-config.yaml` and `deleted-config.yaml` files
respectively in the `tools/deprecated-config-checker` directory.
[Example
PR](grafana@0d4416a)

---------

Co-authored-by: Michel Hollands <42814411+MichelHollands@users.noreply.github.com>
**What this PR does / why we need it**:

This PR changes the structure of how the bloom client

* The `BloomStore` is the top component interface which implements the
`Store` and the `Client` interfaces. The store holds a store entry for
each schema period.
* The `bloomStoreEntry` implements the `Store` and `Client` interfaces.
It holds a bloom client for a single schema period. Additionally, the
store entry also exposes a fetcher for metas (and in the future for
blocks), which is responsible for getting data from cache or storage (if
not available in cache).
* The `BloomClient` implement the `Client` interface. The bloom client
uses an object client for the schema period of the bloom client.
* The `Fetcher` can fetch and cache metas.

This structure is very similar to what we use for the chunk store.

**Note**

Before implementing `FetchBlocks()` in the `BloomStore`, I want to
implement the new `FingerprintBounds` type also in the shipper and bloom
gateway code to be consistent across components, as well as to make use
of the new utilities exposed by them.

The store implementation probably needs some specific test cases around
using the correct store entry for different schema period configs. The
code however is mostly taken from the chunk store implementation.

---------

Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
…ined (grafana#11837)

**What this PR does / why we need it**:
When no resource attributes are defined, check for `service.name`
resource attributes fail with `nil pointer dereference`. This PR fixes
the issue by checking the bool returned by the `Get` method on resource
attributes.

**Checklist**
- [x] Tests updated
for v10 or greater, default is 16

refer to
https://github.com/grafana/loki/blob/1002ba00dff58ed588987169c0d3a0ddac2d022b/pkg/storage/config/schema_config.go#L336

**What this PR does / why we need it**:

**Which issue(s) this PR fixes**:
Fixes #<issue number>

**Special notes for your reviewer**:

**Checklist**
- [ ] Reviewed the
[`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md)
guide (**required**)
- [ ] Documentation added
- [ ] Tests updated
- [ ] `CHANGELOG.md` updated
- [ ] If the change is worth mentioning in the release notes, add
`add-to-release-notes` label
- [ ] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/setup/upgrade/_index.md`
- [ ] For Helm chart changes bump the Helm chart version in
`production/helm/loki/Chart.yaml` and update
`production/helm/loki/CHANGELOG.md` and
`production/helm/loki/README.md`. [Example
PR](grafana@d10549e)
- [ ] If the change is deprecating or removing a configuration option,
update the `deprecated-config.yaml` and `deleted-config.yaml` files
respectively in the `tools/deprecated-config-checker` directory.
[Example
PR](grafana@0d4416a)

---------

Co-authored-by: J Stickler <julie.stickler@grafana.com>
Functionality & testware for building bloom block plans.
…Bounds` (grafana#11839)

The latter struct has more utility functions to compare and operate on
bounds.

---------

Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
* Rewrote description to highlight primary use case
* Added defaults for how much structured metadata can be added to log
lines
* Rewrote example queries

**What this PR does / why we need it**:

**Which issue(s) this PR fixes**:
Fixes #<issue number>

**Special notes for your reviewer**:

**Checklist**
- [ ] Reviewed the
[`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md)
guide (**required**)
- [ ] Documentation added
- [ ] Tests updated
- [ ] `CHANGELOG.md` updated
- [ ] If the change is worth mentioning in the release notes, add
`add-to-release-notes` label
- [ ] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/setup/upgrade/_index.md`
- [ ] For Helm chart changes bump the Helm chart version in
`production/helm/loki/Chart.yaml` and update
`production/helm/loki/CHANGELOG.md` and
`production/helm/loki/README.md`. [Example
PR](grafana@d10549e)
- [ ] If the change is deprecating or removing a configuration option,
update the `deprecated-config.yaml` and `deleted-config.yaml` files
respectively in the `tools/deprecated-config-checker` directory.
[Example
PR](grafana@0d4416a)
Prepares `bloomcompactor` pkg for integration with new logic, primarily
removes/thins out existing code.
grafana#11841)

Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
@periklis periklis requested a review from xperimental February 1, 2024 09:44
@periklis periklis self-assigned this Feb 1, 2024
@openshift-ci openshift-ci bot requested a review from jcantrill February 1, 2024 09:44
@openshift-ci openshift-ci bot requested a review from syedriko February 1, 2024 09:44
Copy link

openshift-ci bot commented Feb 1, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: periklis

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 1, 2024
Copy link

openshift-ci bot commented Feb 1, 2024

@periklis: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/images ac46465 link true /test images
ci/prow/test-operator ac46465 link true /test test-operator
ci/prow/ci-index-loki-operator-bundle ac46465 link true /test ci-index-loki-operator-bundle

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@periklis periklis closed this Feb 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.