Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update from upstream repository #253

Merged
merged 25 commits into from
Jan 22, 2024

Conversation

periklis
Copy link

@periklis periklis commented Jan 22, 2024

kavirajk and others added 25 commits January 14, 2024 15:10
…na#11511)

Related issue: grafana#11398

This minimal config scrape only single file thus not overloading the
systems as described in the issue
**What this PR does / why we need it**:
Adds `max_metadata_cache_freshness` to limit the metadata requests that
get cached. When configured, only metadata requests with end time before
`now - max_metadata_cache_freshness` are cacheable.

_reason for setting the default to 24h?_
metric results cache can [extract samples for the desired time range
from an
extent](https://github.com/grafana/loki/blob/b6e64e1ef1fb2a2155661c815d0198e147579c8e/pkg/querier/queryrange/queryrangebase/results_cache.go#L78)
since the samples are associated with a timestamp. But the same is not
true for metadata caching, it is not possible to extract a subset of
labels/series from a cached extent. As a result, we could return
inaccurate results, more that what was requested. for ex: returning
results from an entire 1h extent for a 5m query

Setting `max_metadata_cache_freshness` to 24h should help us avoid
caching recent data. For anything older, we would report cached metadata
results at a granularity controlled by
`split_metadata_queries_by_interval`

**Which issue(s) this PR fixes**:
Fixes #<issue number>

**Special notes for your reviewer**:

**Checklist**
- [x] Reviewed the
[`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md)
guide (**required**)
- [x] Documentation added
- [x] Tests updated
- [ ] `CHANGELOG.md` updated
- [ ] If the change is worth mentioning in the release notes, add
`add-to-release-notes` label
- [ ] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/setup/upgrade/_index.md`
- [ ] For Helm chart changes bump the Helm chart version in
`production/helm/loki/Chart.yaml` and update
`production/helm/loki/CHANGELOG.md` and
`production/helm/loki/README.md`. [Example
PR](grafana@d10549e)
- [ ] If the change is deprecating or removing a configuration option,
update the `deprecated-config.yaml` and `deleted-config.yaml` files
respectively in the `tools/deprecated-config-checker` directory.
[Example
PR](grafana@0d4416a)
**What this PR does / why we need it**:
Removed the name of a departed employee from the maintainers list.

**Which issue(s) this PR fixes**:
Fixes #<issue number>

**Special notes for your reviewer**:

**Checklist**
- [ ] Reviewed the
[`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md)
guide (**required**)
- [ ] Documentation added
- [ ] Tests updated
- [ ] `CHANGELOG.md` updated
- [ ] If the change is worth mentioning in the release notes, add
`add-to-release-notes` label
- [ ] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/setup/upgrade/_index.md`
- [ ] For Helm chart changes bump the Helm chart version in
`production/helm/loki/Chart.yaml` and update
`production/helm/loki/CHANGELOG.md` and
`production/helm/loki/README.md`. [Example
PR](grafana@d10549e)
- [ ] If the change is deprecating or removing a configuration option,
update the `deprecated-config.yaml` and `deleted-config.yaml` files
respectively in the `tools/deprecated-config-checker` directory.
[Example
PR](grafana@0d4416a)

Co-authored-by: J Stickler <julie.stickler@grafana.com>
)

**What this PR does / why we need it**:

A data race introduced in grafana#11587 was
caught in the [backport to
k184](grafana#11668). This removes the
shared state of a single global `noParserHints` in favor of creating an
empty `Hint` object for each label builder, since the `Hints` is keeping
state of `extracted` and `requiredLabels`.
…rafana#11677)

For better observability of the bloom gateway, this PR adds two
additional metrics that expose the amount of chunk refs pre and post
filtering. This can be used to calculate the filter ratio of the
gateways.

The PR also adds a metric that observes the latency of the actual
processing time of bloom filters within the worker.

---------

Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
**What this PR does / why we need it**:

Currently using the Loki Chart we can only either enable\disable ALL
alert rules.
For specific environments and use-cases sometimes not all alert Rules
are useful to have enabled.

With this PR change, we can cleanly and through the Chart values disable
specific Alerts.
)

**What this PR does / why we need it**:

We originally added this limit from fuzz testing and realizing there
should be some maximum limit to an allowed query size. The original
limit was 5120 based on some internet searching and a best estimate of
what a reasonable limit would be. We have seen use cases with queries
containing a lot of filter expressions or long expanded variable names
where this limit was too small. Apparently the spec does not specify a
limit, and more internet searching suggests almost all browsers will
handle 100k+ length urls without issue
Some limit here still seems prudent however, so the new limit is now
128k.

**Which issue(s) this PR fixes**:
Fixes #<issue number>

**Special notes for your reviewer**:

**Checklist**
- [ ] Reviewed the
[`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md)
guide (**required**)
- [ ] Documentation added
- [ ] Tests updated
- [ ] `CHANGELOG.md` updated
- [ ] If the change is worth mentioning in the release notes, add
`add-to-release-notes` label
- [ ] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/setup/upgrade/_index.md`
- [ ] For Helm chart changes bump the Helm chart version in
`production/helm/loki/Chart.yaml` and update
`production/helm/loki/CHANGELOG.md` and
`production/helm/loki/README.md`. [Example
PR](grafana@d10549e)
- [ ] If the change is deprecating or removing a configuration option,
update the `deprecated-config.yaml` and `deleted-config.yaml` files
respectively in the `tools/deprecated-config-checker` directory.
[Example
PR](grafana@0d4416a)

---------

Signed-off-by: Edward Welch <edward.welch@grafana.com>
… keys (grafana#11679)

**What this PR does / why we need it**:
Follow up from grafana#11535 (specifically from [this
thread](grafana#11535 (comment))),
this PR modifies the results cache implementation to use the same
interval for generating cache keys.
**What this PR does / why we need it**:
`validate-example-configs` step is flaky and fails with the following
error: `[Text file busy](bash: line 1: ./cmd/loki/loki: Text file busy)`

looks like the recently added `validate-dev-cluster-config` runs in
parallel. since both steps run loki target first, they often end up
updating the binary when the other step is executing it.

this pr fixes this by running these steps sequentially

**Which issue(s) this PR fixes**:
Fixes #<issue number>

**Special notes for your reviewer**:

**Checklist**
- [x] Reviewed the
[`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md)
guide (**required**)
- [ ] Documentation added
- [ ] Tests updated
- [ ] `CHANGELOG.md` updated
- [ ] If the change is worth mentioning in the release notes, add
`add-to-release-notes` label
- [ ] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/setup/upgrade/_index.md`
- [ ] For Helm chart changes bump the Helm chart version in
`production/helm/loki/Chart.yaml` and update
`production/helm/loki/CHANGELOG.md` and
`production/helm/loki/README.md`. [Example
PR](grafana@d10549e)
- [ ] If the change is deprecating or removing a configuration option,
update the `deprecated-config.yaml` and `deleted-config.yaml` files
respectively in the `tools/deprecated-config-checker` directory.
[Example
PR](grafana@0d4416a)
…afana#11678)

Co-authored-by: Robert Jacob <rojacob@redhat.com>
Co-authored-by: Robert Jacob <xperimental@solidproject.de>
…ge (grafana#11692)

**What this PR does / why we need it**:
Modify our fluentbit go client to not throw away error messages for
invalid configuration values. Instead, wrap them with our existing
descriptive messages.
…ve queries (grafana#11703)

This PR ensures we retain previous TSDB heads on ingesters for one cycle
(15m) in order to serve index queries while waiting for index-gws or
queriers to sync it from storage. This bug hadn't surfaced yet (for us)
because we've historically used ~6m `chunk-retain` periods which masked
this problem (chunks were kept around so we didn't see read gaps).
Fixing this bug should allow us to avoid retaining chunks post-flush in
ingester memory in TSDB since it doesn't utilize the index-cache (which
was the other reason for retaining chunks in memory).

This is very attractive because it allows us to reduce memory pressure
in ingesters as well as reduces the data they query since now ingesters
will only return the chunk references. This allows queriers to download
chunks directly rather than ingesters iterating the chunk data for
flushed chunks themselves.

Also does some refactoring to make testing validation easier.
…_fp (grafana#11712)

While debugging some unrelated code, I found that `Seek` was improperly
implemented. It should return the first value greater or equal than the
target. Previously it would cause iterators to return different values
when reset.
This fixes various issues with docker_sd on promtail. Mostly related to
duplicate logs being send to loki from promtail.

Root case of issue is that positions file is updated after process
function, but in memory field `since` of `Target` struct is not updated.
The flaky test is caused by a race condition caused by concurrent writes
of a counter that is used to track how often a method was called.

Fixed by converting into an atomic integer.

Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
**What this PR does / why we need it**:

The bloom store was initially thought as a translation layer between the
bloom gateway and the bloom shipper to be able to simplify the API of
the shipper.

However, it turned out that the store calls were only a 1-to-1 mapping
of the shipper calls and therefore adding unnecessary complexity and
call stack.

Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
**What this PR does / why we need it**:

Fixes the flaky `TestMultiTenantQuery` integration test.


Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
…e it properly (grafana#11716)

**What this PR does / why we need it**:
We have a linter in place in our internal deployment tooling to catch
the usage of Unicode quotes because it has caused some issues due to
shells/tooling not recognizing them. The linter has caught one such
instance in the jsonnet in the Loki repo, which is fixed in this PR.
…afana#11499)

**What this PR does / why we need it**:

Adding feature present in mimir, specifically
grafana/mimir#5030.

Adds a config option to the query-frontend to specify a list of request
headers to include in query logs.

For example, setting
-frontend.log-query-request-headers="X-Grafana-Org-Id" and sending a
query with X-Grafana-Org-Id:1 results in query log lines that include
header_x_grafana_org_id=1.

**Which issue(s) this PR fixes**:
Fixes grafana#11422
…hema upgrade (grafana#11513)

Co-authored-by: Robert Jacob <rojacob@redhat.com>
Co-authored-by: Periklis Tsirakidis <periklis@redhat.com>
…ple blocks (grafana#11725)

Adds a few tests to aid in debugging behavior & add some more coverage.
Notably tests the `MergeBuilder` sourcing blocks with varying degrees of
duplicate data.
**What this PR does / why we need it**: Loki Gateway pod crashes on
systems that only have IPv4 enabled since PR
grafana#9510

**Which issue(s) this PR fixes**:
Fixes grafana#9681

**Special notes for your reviewer**:

**Checklist**
- [x] Reviewed the
[`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md)
guide (**required**)
- [x] Documentation added
- [ ] Tests updated
- [x] `CHANGELOG.md` updated
- [x] If the change is worth mentioning in the release notes, add
`add-to-release-notes` label
- [x] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/setup/upgrade/_index.md`
- [x] For Helm chart changes bump the Helm chart version in
`production/helm/loki/Chart.yaml` and update
`production/helm/loki/CHANGELOG.md` and
`production/helm/loki/README.md`. [Example
PR](grafana@d10549e)
- [x] If the change is deprecating or removing a configuration option,
update the `deprecated-config.yaml` and `deleted-config.yaml` files
respectively in the `tools/deprecated-config-checker` directory.
[Example
PR](grafana@0d4416a)

---------

Signed-off-by: Pieter van der Giessen <pieter@pionative.com>
Co-authored-by: Michel Hollands <42814411+MichelHollands@users.noreply.github.com>
@periklis periklis self-assigned this Jan 22, 2024
Copy link

openshift-ci bot commented Jan 22, 2024

@periklis: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@periklis periklis merged commit cf44380 into openshift:main Jan 22, 2024
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet