Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update from upstream repository #217

Closed
wants to merge 30 commits into from
Closed

Conversation

periklis
Copy link

@periklis periklis commented Dec 6, 2023

paul1r and others added 29 commits November 28, 2023 09:31
**What this PR does / why we need it**:
Removes the stubbed out metrics from the Bloom Tokenizer, and puts in
metrics that were being captured and deemed useful during initial POC
testing.

**Which issue(s) this PR fixes**:
Fixes #<issue number>

**Special notes for your reviewer**:

**Checklist**
- [ ] Reviewed the
[`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md)
guide (**required**)
- [ ] Documentation added
- [ ] Tests updated
- [ ] `CHANGELOG.md` updated
- [ ] If the change is worth mentioning in the release notes, add
`add-to-release-notes` label
- [ ] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/setup/upgrade/_index.md`
- [ ] For Helm chart changes bump the Helm chart version in
`production/helm/loki/Chart.yaml` and update
`production/helm/loki/CHANGELOG.md` and
`production/helm/loki/README.md`. [Example
PR](d10549e)
- [ ] If the change is deprecating or removing a configuration option,
update the `deprecated-config.yaml` and `deleted-config.yaml` files
respectively in the `tools/deprecated-config-checker` directory.
[Example
PR](0d4416a)
**What this PR does / why we need it**:

Add swift to common config via helm.

**Which issue(s) this PR fixes**:

No issue

**Special notes for your reviewer**:

**Checklist**
- [x] Reviewed the
[`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md)
guide (**required**)
- [x] Documentation added
- [ ] Tests updated
- [x] `CHANGELOG.md` updated
- [ ] If the change is worth mentioning in the release notes, add
`add-to-release-notes` label
- [ ] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/setup/upgrade/_index.md`
- [x] For Helm chart changes bump the Helm chart version in
`production/helm/loki/Chart.yaml` and update
`production/helm/loki/CHANGELOG.md` and
`production/helm/loki/README.md`. [Example
PR](d10549e)
- [ ] If the change is deprecating or removing a configuration option,
update the `deprecated-config.yaml` and `deleted-config.yaml` files
respectively in the `tools/deprecated-config-checker` directory.
[Example
PR](0d4416a)
**What this PR does / why we need it**:

The main module of Loki already uses features of Go 1.21 (`cmp` module,
`context.WithTimeoutCause`), so it can't be built with Go versions lower
than 1.21 anymore. The module file should reflect this.

The `go.sum` file was updated automatically by the Go tool `go mod tidy`
after updating the version.

**Special notes for your reviewer**:

Go 1.21 module files introduce a new `toolchain` attribute. I have set
this to the currently available latest version.
---------

Co-authored-by: Periklis Tsirakidis <periklis@redhat.com>
**What this PR does / why we need it**:

This PR adds a missing `String()` call for the `or` statement in line
filters. It also fixes a bug where `or` statements were not applied in
conjunction with an IPLineFilter expression.

**Which issue(s) this PR fixes**:
Fixes #<issue number>

**Special notes for your reviewer**:

**Checklist**
- [ ] Reviewed the
[`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md)
guide (**required**)
- [ ] Documentation added
- [ ] Tests updated
- [ ] `CHANGELOG.md` updated
- [ ] If the change is worth mentioning in the release notes, add
`add-to-release-notes` label
- [ ] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/setup/upgrade/_index.md`
- [ ] For Helm chart changes bump the Helm chart version in
`production/helm/loki/Chart.yaml` and update
`production/helm/loki/CHANGELOG.md` and
`production/helm/loki/README.md`. [Example
PR](d10549e)
- [ ] If the change is deprecating or removing a configuration option,
update the `deprecated-config.yaml` and `deleted-config.yaml` files
respectively in the `tools/deprecated-config-checker` directory.
[Example
PR](0d4416a)

---------

Co-authored-by: Travis Patterson <travis.patterson@grafana.com>
**What this PR does / why we need it**:
A check for `len(queryRangeMiddleware) > 0` is always true so we can
remove it.

**Checklist**
- [ ] Reviewed the
[`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md)
guide (**required**)
- [ ] Documentation added
- [ ] Tests updated
- [ ] `CHANGELOG.md` updated
- [ ] If the change is worth mentioning in the release notes, add
`add-to-release-notes` label
- [ ] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/setup/upgrade/_index.md`
- [ ] For Helm chart changes bump the Helm chart version in
`production/helm/loki/Chart.yaml` and update
`production/helm/loki/CHANGELOG.md` and
`production/helm/loki/README.md`. [Example
PR](d10549e)
- [ ] If the change is deprecating or removing a configuration option,
update the `deprecated-config.yaml` and `deleted-config.yaml` files
respectively in the `tools/deprecated-config-checker` directory.
[Example
PR](0d4416a)
**What this PR does / why we need it**:
Unmarshalling for `IndexPeriodicTableConfig` is not working as expected
since it contains an inline field that implements `yaml.Unmarshaler`
interface. Configuring `path_prefix` is resulting in a error because it
tries to Unmarshal to the parent inline field directly.

Fix this by implementing `yaml.Unmarshaler` interface for
`IndexPeriodicTableConfig`

related to: go-yaml/yaml#125

```
field path_prefix not found in type struct { Prefix string "yaml:\"prefix\""; Period model.Duration "yaml:\"period\""; Tags config.Tags "yaml:\"tags\"" }
```

**Which issue(s) this PR fixes**:
Fixes #<issue number>

**Special notes for your reviewer**:

**Checklist**
- [x] Reviewed the
[`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md)
guide (**required**)
- [ ] Documentation added
- [x] Tests updated
- [ ] `CHANGELOG.md` updated
- [ ] If the change is worth mentioning in the release notes, add
`add-to-release-notes` label
- [ ] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/setup/upgrade/_index.md`
- [ ] For Helm chart changes bump the Helm chart version in
`production/helm/loki/Chart.yaml` and update
`production/helm/loki/CHANGELOG.md` and
`production/helm/loki/README.md`. [Example
PR](d10549e)
- [ ] If the change is deprecating or removing a configuration option,
update the `deprecated-config.yaml` and `deleted-config.yaml` files
respectively in the `tools/deprecated-config-checker` directory.
[Example
PR](0d4416a)
**What this PR does / why we need it**:

This PR addresses some TODOs in the Bloom-Gateway component to reduce
allocations by pooling.

**Note for reviewers:**
- I'm not really sure about the max size for the preallocated pools.
Used 16K for all of them but that's probably too much for pools such as
the `responsesPool`, `addressesPool`... Feel free to leave a comment in
the PR with an estimate on the size you think it would make sense to set
as the max.
The `blockDownloader` does not wait until all of its worker goroutines are finished. This leads to a flaky test, see #11347

To properly fix the behaviour, the `stop()` function should wait until all workers are done.

Closes #11347

Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
The queue already exposes the functionality to track connected consumers
(workers), so there is no need to have callback that is only used in
tests.

Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
…1308)

This redacts the basic auth credentials from the configured URL that is
displayed when fluent-bit is started, if these are included. Go's
built-in redaction only hides the password, turning a url like
`http://user:pass@example.com` into `http://user:xxxxx@example.com`.

Fixes #10514
…g_map (#11312)

The jsonnet lib should not generate a ConfigMap for consul if you are using memberlist.

Fixes #11311
**Adds the extraContainers to the write pod**:

**Which issue(s) this PR fixes**:
This fix will allow my team to add a pods that makes modification to the
incoming logs before sending them to loki with minimal network transfer.

**Special notes for your reviewer**:
NONE

**Checklist**
- [x] Reviewed the
[`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md)
guide (**required**)
- [x] Documentation added
- [x] Tests updated
- [x] `CHANGELOG.md` updated
- [x] If the change is worth mentioning in the release notes, add
`add-to-release-notes` label
- [x] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/setup/upgrade/_index.md`
- [x] For Helm chart changes bump the Helm chart version in
`production/helm/loki/Chart.yaml` and update
`production/helm/loki/CHANGELOG.md` and
`production/helm/loki/README.md`. [Example
PR](d10549e)
- [x] If the change is deprecating or removing a configuration option,
update the `deprecated-config.yaml` and `deleted-config.yaml` files
respectively in the `tools/deprecated-config-checker` directory.
[Example
PR](0d4416a)

---------

Co-authored-by: Michel Hollands <42814411+MichelHollands@users.noreply.github.com>
**What this PR does / why we need it**:
The existing `Job` abstraction has 1 stream fingerprint and compactor
was making a 1bloom and 1 block at the end of compaction.
This PR changes the `Job` struct to have a list of stream fingerprints.
At the end of compaction n streams are iterated over to create n blooms
that are appended to 1 block.
Stream fingerprints are added to the Job after checking if they are in
owned by that compactor's ring, so it is still benefiting from sharding
strategy. Resulting block will only contain series blooms that belong to
1 tsdb table.

relates to grafana/loki-private#856
**Which issue(s) this PR fixes**:
Fixes #<issue number>

**Special notes for your reviewer**:
Logic to control block size will be added in a follow up PR.
**What this PR does / why we need it**:

The main work of this PR is to improve the performance of the
reassemble() function. By changing the assignment of the 'cur' variable
to not do addition, and incrementing the 'pos' variable separately, the
benchmarks improved as follows:

BenchmarkTokens/threeSkip1Chunk/v2-10  	   88864	     13487 ns/op
BenchmarkTokens/threeSkip1ChunkAt2/v2At2-10 98607 12256 ns/op

(This was validated with multiple runs, on multiple size inputs)

The remaining changes in the PR are just to add some benchmarking for
PopulateSeriesWithBloom, in case we need to do other comparisons for
future improvements.
…apacity (#11284)

**What this PR does / why we need it**:
Adds a new config `frontend.max-query-capacity` that allows users to
configure what portion of the the available querier replicas can be used
by a tenant. `max_query_capacity` is the corresponding YAML option that
can be configured in limits or runtime overrides.

For example, setting this to 0.5 would allow a tenant to use half of the
available queriers.

This complements the existing `frontend.max-queriers-per-tenant`. When
both are configured, the smaller value of the resulting querier replica
count is considered:
```
min(frontend.max-queriers-per-tenant, ceil(querier_replicas * frontend.max-query-capacity))
```
*All* queriers will handle requests for a tenant if neither limits are
applied.

**Which issue(s) this PR fixes**:
Fixes #<issue number>

**Special notes for your reviewer**:
noticed that we don't pass down the shuffle sharding limits for frontend
(only using it with schedulers)

https://github.com/grafana/loki/blob/26f097162a856db48ecbd16bef2f0b750029855b/pkg/loki/modules.go#L895
but the
[docs](https://github.com/grafana/loki/blob/26f097162a856db48ecbd16bef2f0b750029855b/pkg/validation/limits.go#L276)
mention that`frontend.max-queriers-per-tenant` applies to frontend as
well.
```
This option only works with queriers connecting to the query-frontend / query-scheduler, not when using downstream URL.
```

**Checklist**
- [x] Reviewed the
[`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md)
guide (**required**)
- [x] Documentation added
- [x] Tests updated
- [x] `CHANGELOG.md` updated
- [ ] If the change is worth mentioning in the release notes, add
`add-to-release-notes` label
- [ ] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/setup/upgrade/_index.md`
- [ ] For Helm chart changes bump the Helm chart version in
`production/helm/loki/Chart.yaml` and update
`production/helm/loki/CHANGELOG.md` and
`production/helm/loki/README.md`. [Example
PR](d10549e)
- [ ] If the change is deprecating or removing a configuration option,
update the `deprecated-config.yaml` and `deleted-config.yaml` files
respectively in the `tools/deprecated-config-checker` directory.
[Example
PR](0d4416a)

---------

Co-authored-by: J Stickler <julie.stickler@grafana.com>
Co-authored-by: Danny Kopping <danny.kopping@grafana.com>
**What this PR does / why we need it**:
Adds an error return value to PopulateSeriesWithBloom

**Which issue(s) this PR fixes**:
Fixes #<issue number>

**Special notes for your reviewer**:

**Checklist**
- [ ] Reviewed the
[`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md)
guide (**required**)
- [ ] Documentation added
- [ ] Tests updated
- [ ] `CHANGELOG.md` updated
- [ ] If the change is worth mentioning in the release notes, add
`add-to-release-notes` label
- [ ] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/setup/upgrade/_index.md`
- [ ] For Helm chart changes bump the Helm chart version in
`production/helm/loki/Chart.yaml` and update
`production/helm/loki/CHANGELOG.md` and
`production/helm/loki/README.md`. [Example
PR](d10549e)
- [ ] If the change is deprecating or removing a configuration option,
update the `deprecated-config.yaml` and `deleted-config.yaml` files
respectively in the `tools/deprecated-config-checker` directory.
[Example
PR](0d4416a)

---------

Co-authored-by: Christian Haudum <christian.haudum@gmail.com>
…1360)

At current point in time, the worker calls
`w.store.GetBlockQueriersForBlockRefs()` to obtain a list of block
queriers sorted by fingerprint in ascending order.
However, the shipper used by the store to fetch blocks by its refs,
calls a callback function when a requested block is available. Instead
of waiting for all callbacks to be called and then returning a single
list of block queriers, the store now exposes a new function `ForEach()`
that also accepts a callback that is passed to the underlying shipper.

This allows to process blocks in the worker as soon as they become
available. Some block may be served from cache, some may need to be
downloaded. That way, we reduce waiting time.

---------

Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
**What this PR does / why we need it**:

This extracts the results cache from `queryrangebase` into its own pkg
so we can reuse it in other components such as the bloom-gateway without
having to import `queryrangebase`.

- Most of the logic inside
`pkg/querier/queryrange/queryrangebase/results_cache.go` now lives in
`pkg/storage/chunk/cache/results_cache/cache.go`.
- Some of the tests in
`pkg/querier/queryrange/queryrangebase/results_cache.go` are moved into
pkg/storage/chunk/cache/results_cache/cache_test.go.
- Note that here we don't have access to the types we use in
`queryrangebase` so we created a new set of mock request/response types
to test with.
…gerprints to server instances (#11336)

Instead of iterating over all fingerprints and determine their respective instance from the ring individually, this new approach takes calculates the token ranges from the instances from the ring and matches and partitions the fingerprints based on that using a binary search to determine membership to a token range.

Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
This fixes an issue with the build system when the Go version is updated
in the build image and go.mod relies on the new features of the Go
version.

See failed CI run https://drone.grafana.net/grafana/loki/31777/28/2

Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
With multiple buckets configured on a single AWS/S3 store, each object
is stored on one of the buckets by hashing its object key in
`bucketFromKey()`. That is sufficient for `GetObject()` and
`PutObject()`.

`List()` needs to list each bucket and combine the results. It appends
all object keys into a `storageObjects` slice. That works because each
key uniquely exists in only one of the buckets.

But that is not the case for the common prefixes; a common prefix may
exist in multiple buckets. This PR removes duplicates from the common
prefixes as gathered from the multiple buckets in `List()`.

Without this fix, we find that a repeated common prefix leads to a table
being compacted multiple times concurrently which is not safe. E.g. this
error results when the compactor concurrently tries to compact a table
and a 'losing' execution finds that its clean-up has already been done:

`level=error caller=compactor.go:128 table-name=loki_index_tsdb_19683
msg="failed to remove downloaded index file"
path=/var/loki/compactor/loki_index_tsdb_19683/1700667008-loki-write-0-1700660468255353690.tsdb
err="remove
/var/loki/compactor/loki_index_tsdb_19683/1700667008-loki-write-0-1700660468255353690.tsdb:
no such file or directory"
`
**What this PR does / why we need it**:

**Which issue(s) this PR fixes**:
Fixes #<issue number>

**Special notes for your reviewer**:

**Checklist**
- [ ] Reviewed the
[`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md)
guide (**required**)
- [ ] Documentation added
- [ ] Tests updated
- [ ] `CHANGELOG.md` updated
- [ ] If the change is worth mentioning in the release notes, add
`add-to-release-notes` label
- [ ] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/setup/upgrade/_index.md`
- [ ] For Helm chart changes bump the Helm chart version in
`production/helm/loki/Chart.yaml` and update
`production/helm/loki/CHANGELOG.md` and
`production/helm/loki/README.md`. [Example
PR](d10549e)
- [ ] If the change is deprecating or removing a configuration option,
update the `deprecated-config.yaml` and `deleted-config.yaml` files
respectively in the `tools/deprecated-config-checker` directory.
[Example
PR](0d4416a)

---------

Co-authored-by: J Stickler <julie.stickler@grafana.com>
My previous fix (part of #9672) was wrong - reverting the change

Co-authored-by: J Stickler <julie.stickler@grafana.com>
**What this PR does / why we need it**:

Since chunks don't necessarily have the same duration, we cannot assume
that the last item of a slice sorted by `From` time also has the
greatest `Through` time.

Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
**What this PR does / why we need it**:

This PR adds caching to the bloom-gateway client. It uses the result
cache from #11343.

Here's how we:
- Merge responses: group all chunks by FP and remove duplicated chunk
checksums.
- Extract responses based on time span: For all chunks in each FP, add
to the extracted response only the chunks that overlaps with the desired
start and end time.
**What this PR does / why we need it**:

Adds the `__aws_log_type` label to AWS CloudWatch logs in
`lambda-promtail`.

**Which issue(s) this PR fixes**:

N/A

**Special notes for your reviewer**:

AWS S3 & Kinesis log types already have this label. The
`lambda-promtail` documentation
[here](https://github.com/grafana/loki/blob/main/docs/sources/send-data/lambda-promtail/_index.md?plain=1#L154)
suggests that CloudWatch logs has this label added as well, but in
practice it does not AFAICT.

**Checklist**
- [x] Reviewed the
[`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md)
guide (**required**)
- [ ] Documentation added
- [x] Tests updated
- [ ] `CHANGELOG.md` updated
- [ ] If the change is worth mentioning in the release notes, add
`add-to-release-notes` label
- [ ] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/setup/upgrade/_index.md`
- [ ] For Helm chart changes bump the Helm chart version in
`production/helm/loki/Chart.yaml` and update
`production/helm/loki/CHANGELOG.md` and
`production/helm/loki/README.md`. [Example
PR](d10549e)
- [ ] If the change is deprecating or removing a configuration option,
update the `deprecated-config.yaml` and `deleted-config.yaml` files
respectively in the `tools/deprecated-config-checker` directory.
[Example
PR](0d4416a)
@periklis periklis self-assigned this Dec 6, 2023
Copy link

openshift-ci bot commented Dec 6, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: periklis

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 6, 2023
**What this PR does / why we need it**:
We would like to know how long a querier worker is idle to understand if
workstealing would have an impact. The original metric was too noisy and
its cardinality was too high. Instead, we are going to log the wait
time.

**Checklist**
- [ ] Reviewed the
[`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md)
guide (**required**)
- [ ] Documentation added
- [ ] Tests updated
- [ ] `CHANGELOG.md` updated
- [ ] If the change is worth mentioning in the release notes, add
`add-to-release-notes` label
- [ ] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/setup/upgrade/_index.md`
- [ ] For Helm chart changes bump the Helm chart version in
`production/helm/loki/Chart.yaml` and update
`production/helm/loki/CHANGELOG.md` and
`production/helm/loki/README.md`. [Example
PR](d10549e)
- [ ] If the change is deprecating or removing a configuration option,
update the `deprecated-config.yaml` and `deleted-config.yaml` files
respectively in the `tools/deprecated-config-checker` directory.
[Example
PR](0d4416a)

---------

Co-authored-by: Danny Kopping <dannykopping@gmail.com>
Copy link

openshift-ci bot commented Dec 6, 2023

@periklis: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/images 5b8d0e6 link true /test images
ci/prow/ci-index-loki-operator-bundle 5b8d0e6 link true /test ci-index-loki-operator-bundle
ci/prow/test-operator 5b8d0e6 link true /test test-operator

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@periklis
Copy link
Author

periklis commented Dec 7, 2023

Replaced by #218

@periklis periklis closed this Dec 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet