Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add store gateway consistency check errors to errors catalog #2150

Merged
merged 6 commits into from
Jun 24, 2022

Conversation

zenador
Copy link
Contributor

@zenador zenador commented Jun 20, 2022

What this PR does

Following what we've done in #2066, this PR adds common store gateway consistency check errors to errors catalogue.

Which issue(s) this PR fixes or relates to

N/A

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@zenador zenador added the type/docs Improvements or additions to documentation label Jun 20, 2022
@zenador zenador force-pushed the zenador/err-cat-store-gateway-consis-check branch from 862457e to ab3b727 Compare June 20, 2022 16:05
@zenador zenador requested a review from pracucci June 20, 2022 16:12
Comment on lines 1442 to 1444
- At query time the querier and ruler determine how old a bucket index is based on the time it was last updated by the compactor.
- If the age is older than the maximum stale period configured via `-blocks-storage.bucket-store.bucket-index.max-stale-period`, the query fails.
- This circuit breaker ensures queriers and rulers do not return any partial query results due to a stale view over the long-term storage.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- At query time the querier and ruler determine how old a bucket index is based on the time it was last updated by the compactor.
- If the age is older than the maximum stale period configured via `-blocks-storage.bucket-store.bucket-index.max-stale-period`, the query fails.
- This circuit breaker ensures queriers and rulers do not return any partial query results due to a stale view over the long-term storage.
- At query time, the querier and the ruler determine how old a bucket index is based on the time that it was last updated by the compactor.
- If the age is older than the maximum stale period that is configured via `-blocks-storage.bucket-store.bucket-index.max-stale-period`, the query fails.
- This circuit breaker ensures that the queriers and rulers do not return any partial query results, due to a stale view that spans long-term storage.

Not 100% what "over" is trying to convey here. Do you mean, ", due to a stale overview of the entire long-term storage."?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied your suggestion except for the last part, please check that it makes sense.

@@ -112,3 +114,7 @@ func (f *BucketIndexBlocksFinder) GetBlocks(ctx context.Context, userID string,

return blocks, matchingDeletionMarks, nil
}

func newBucketIndexTooOldError(updatedAt time.Time, maxStalePeriod time.Duration) error {
return errors.New(globalerror.BucketIndexTooOld.MessageWithLimitConfig(fmt.Sprintf("bucket index is too old. It was last updated at %s which exceeds the maximum allowed staleness period of %v", updatedAt.UTC().Format(time.RFC3339Nano), maxStalePeriod), tsdb.BucketIndexConfigPrefix+tsdb.MaxStalePeriodFlag))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return errors.New(globalerror.BucketIndexTooOld.MessageWithLimitConfig(fmt.Sprintf("bucket index is too old. It was last updated at %s which exceeds the maximum allowed staleness period of %v", updatedAt.UTC().Format(time.RFC3339Nano), maxStalePeriod), tsdb.BucketIndexConfigPrefix+tsdb.MaxStalePeriodFlag))
return errors.New(globalerror.BucketIndexTooOld.MessageWithLimitConfig(fmt.Sprintf("The bucket index is too old. It was last updated at %s, which exceeds the maximum allowed staleness period of %v", updatedAt.UTC().Format(time.RFC3339Nano), maxStalePeriod), tsdb.BucketIndexConfigPrefix+tsdb.MaxStalePeriodFlag))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit before applying this: golang errors have to start with lowercase.

If we're going to hack that rule (which may make sense since we're building end-user facing messages here, not just internal wrappable errors), we should make the hacking standard.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied accept for the initial capital. If we decide to change that, let's do it in a separate PR since it would affect many errors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @colega. Consistency wins in most cases. Thank you for the pushback. Good call, @zenador.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No separate PR needed. Let’s keep this as is per @colega’s rationale.

}{
"newBucketIndexTooOldError": {
err: newBucketIndexTooOldError(time.Unix(1000000000, 0), time.Hour),
msg: `bucket index is too old. It was last updated at 2001-09-09T01:46:40Z which exceeds the maximum allowed staleness period of 1h0m0s (err-mimir-bucket-index-too-old). You can adjust the related per-tenant limit by configuring -blocks-storage.bucket-store.bucket-index.max-stale-period, or by contacting your service administrator.`,
Copy link
Contributor

@osg-grafana osg-grafana Jun 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
msg: `bucket index is too old. It was last updated at 2001-09-09T01:46:40Z which exceeds the maximum allowed staleness period of 1h0m0s (err-mimir-bucket-index-too-old). You can adjust the related per-tenant limit by configuring -blocks-storage.bucket-store.bucket-index.max-stale-period, or by contacting your service administrator.`,
msg: "The bucket index is too old. It was last updated at 2001-09-09T01:46:40Z, which exceeds the maximum allowed staleness period of 1h0m0s (err-mimir-bucket-index-too-old). To adjust the related per-tenant limit, configure `-blocks-storage.bucket-store.bucket-index.max-stale-period`, or contact your service administrator.",

Change from single to double quotes; please make sure I did not break things.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied the first part. The second part of this message is part of a template, so if we decide to change that, let's do it in a separate PR since it would affect many errors.

}{
"newStoreConsistencyCheckFailedError": {
err: newStoreConsistencyCheckFailedError([]ulid.ULID{ulid.MustNew(1, nil)}),
msg: `consistency check failed because some blocks were not queried (err-mimir-store-consistency-check-failed). The blocks are: 00000000010000000000000000`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
msg: `consistency check failed because some blocks were not queried (err-mimir-store-consistency-check-failed). The blocks are: 00000000010000000000000000`,
msg: `The consistency check failed because some blocks were not queried (err-mimir-store-consistency-check-failed). The blocks are: 00000000010000000000000000`,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied except for the initial capital

Copy link
Contributor

@osg-grafana osg-grafana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unblocking with feedback; please have a look so things can be more clear.

@osg-grafana osg-grafana changed the title Add store gateway consistency check errors to errors catalogue Add store gateway consistency check errors to errors catalog Jun 20, 2022
Copy link
Contributor

@jhesketh jhesketh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change(s) look good to me, but I agree with the other reviewer comments.

zenador and others added 2 commits June 22, 2022 13:30
Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com>
pkg/storage/tsdb/config.go Outdated Show resolved Hide resolved
pkg/querier/blocks_store_queryable.go Outdated Show resolved Hide resolved
pkg/querier/blocks_finder_bucket_index.go Outdated Show resolved Hide resolved
docs/sources/operators-guide/mimir-runbooks/_index.md Outdated Show resolved Hide resolved
docs/sources/operators-guide/mimir-runbooks/_index.md Outdated Show resolved Hide resolved
zenador and others added 2 commits June 23, 2022 17:35
Co-authored-by: Marco Pracucci <marco@pracucci.com>
Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com>
@zenador zenador force-pushed the zenador/err-cat-store-gateway-consis-check branch from db27a98 to 7e39f62 Compare June 23, 2022 10:12
@zenador
Copy link
Contributor Author

zenador commented Jun 23, 2022

@osg-grafana Please check the latest changes when you're free, thank you!

Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job, LGTM! I just left a couple of final nits.

docs/sources/operators-guide/mimir-runbooks/_index.md Outdated Show resolved Hide resolved
docs/sources/operators-guide/mimir-runbooks/_index.md Outdated Show resolved Hide resolved
@osg-grafana
Copy link
Contributor

"Ship it!" :)

Co-authored-by: Marco Pracucci <marco@pracucci.com>
@pracucci pracucci enabled auto-merge (squash) June 24, 2022 08:52
@pracucci pracucci merged commit 5a78ece into main Jun 24, 2022
@pracucci pracucci deleted the zenador/err-cat-store-gateway-consis-check branch June 24, 2022 09:20
masonmei pushed a commit to udmire/mimir that referenced this pull request Jul 11, 2022
…#2150)

* Add store gateway consistency check errors to errors catalogue

* Apply suggestions from code review

Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com>

* Manually apply code review changes

* Apply suggestions from code review

Co-authored-by: Marco Pracucci <marco@pracucci.com>
Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com>

* Update based on changes from code review

* Apply suggestions from code review

Co-authored-by: Marco Pracucci <marco@pracucci.com>

Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com>
Co-authored-by: Marco Pracucci <marco@pracucci.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/docs Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants