Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: mixin generation when cluster label is changed #12613

Merged
merged 5 commits into from
Apr 19, 2024

Conversation

QuentinBisson
Copy link
Contributor

What this PR does / why we need it:

This PR fixes mixin generation when the per_cluster_label value is changed

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • Title matches the required conventional commits format, see here
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • For Helm chart changes bump the Helm chart version in production/helm/loki/Chart.yaml and update production/helm/loki/CHANGELOG.md and production/helm/loki/README.md. Example PR
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

@QuentinBisson QuentinBisson requested a review from a team as a code owner April 15, 2024 10:02
@QuentinBisson QuentinBisson changed the title Fix mixin generation when cluster label is changed fix: mixin generation when cluster label is changed Apr 15, 2024
@QuentinBisson QuentinBisson force-pushed the fix-cluster-label branch 3 times, most recently from c92564a to eb1a6e4 Compare April 15, 2024 11:46
@QuentinBisson
Copy link
Contributor Author

@QuentinBisson
Copy link
Contributor Author

@MichelHollands could you take a look at this?

Signed-off-by: QuentinBisson <quentin@giantswarm.io>
@QuentinBisson
Copy link
Contributor Author

@JStickler Do you know who could help review this PR?

Copy link
Contributor

@cstyan cstyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's enough changes here that we would want to test deploying these dashboards ourselves with your changes to check that nothing is broken, I don't have time to do that myself this week but I know @shantanualsi has been doing some dashboard work so maybe he can test them out?

Comment on lines 64 to 66
message: std.strReplace(|||
{{ $labels.cluster }} {{ $labels.namespace }} has had {{ printf "%.0f" $value }} compactors running for more than 5m. Only one compactor should run at a time.
|||,
|||, 'cluster', $._config.per_cluster_label),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this will work as written, it will look specifically for cluster in the string when in reality the values for the labels would be something like prod-<region> and loki-prod-#, so we would have prod-<region> loki-prod-# has had <some value> compactors running for more than 5m as the message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my tests here, it actually replaces the label correctly in the rule https://github.com/QuentinBisson/loki-upstream/pull/1/files#diff-147d57f875c8b29ffd1bfb7026b4d16da82db9df6b9a2d7dfb1cb82d18fa50a4R36.

I tried to use the % syntax with %s but it did not work due to the printf "%.0f" $value

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah right, the jsonnet strReplace is happening earlier than I thought

is that still not incorrect though? in your code within config.libsonnet the cluster_label is still cluster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah okay! looks correct then

Comment on lines +64 to +79
// Replace the recording rules cluster label with the per-cluster label
then std.strReplace(
// Replace the cluster label for equality matchers with the per-cluster label
std.strReplace(
// Replace the cluster label for regex matchers with the per-cluster label
std.strReplace(
expr,
'cluster=~"$cluster"',
$._config.per_cluster_label + '=~"$cluster"'
),
'cluster="$cluster"',
$._config.per_cluster_label + '="$cluster"'
),
'cluster_',
$._config.per_cluster_label + '_'
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just noting that having to do something like this is yet another reason for us to try and get rid of these giant copy/paste json dashboards at some point

@@ -9,10 +9,9 @@ local utils = import 'mixin-utils/utils.libsonnet';
local cfg = self,

showMultiCluster:: true,
clusterLabel:: $._config.per_cluster_label,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we removing this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed it because it is unused

@@ -31,10 +31,9 @@ local utils = import 'mixin-utils/utils.libsonnet';
local cfg = self,

showMultiCluster:: true,
clusterLabel:: $._config.per_cluster_label,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here and others

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, I did not see it being used

@QuentinBisson
Copy link
Contributor Author

Also @cstyan and @shantanualsi this #12567 might be easier to review as it only fixes the compactor pod matcher which is currently broken

@QuentinBisson
Copy link
Contributor Author

And thanks for the review :)

for: 5m
labels:
severity: warning
- name: loki_alerts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these indentation changes intentional or from your editor? can you please keep the original indentation so as to keep this diff clean? thank you!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those happen when we generate the alerts from the mixins so I can realign them sure but they probably hsould not have been indented at all :D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you want me to do?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the confusion here was caused by differences in jsonnet versions that were generating the compiled mixin, that's my fault. I added the github action to check the jsonnet linting and compiled mixins, which is using a different version of jsonnet within the build image. So unfortunately make loki-mixin and make BUILD_IN_CONTAINER=false loki-mixin will get you different results at the moment. I'm working on resolving that today.

@shantanualsi
Copy link
Contributor

shantanualsi commented Apr 19, 2024

@QuentinBisson I'll try and spend some time testing these changes today on our dashboards. Meanwhile, when you get some time, can you resolve the conflicts on this PR?
Thank you!

@pull-request-size pull-request-size bot added size/M and removed size/L labels Apr 19, 2024
@QuentinBisson
Copy link
Contributor Author

@shantanualsi thanks for reviewing those :) I fixed the conflicts and the list of changes is a lot less :D

@cstyan
Copy link
Contributor

cstyan commented Apr 19, 2024

Had a chance to test this out locally, no obvious breakages so lets merge and we can deal with any issues via follow up PRs

@cstyan cstyan merged commit 1ba7a30 into grafana:main Apr 19, 2024
15 checks passed
@QuentinBisson
Copy link
Contributor Author

Thank you for the merge :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants