Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage controller: log hygiene & better error type #7508

Merged
merged 5 commits into from
Apr 26, 2024

Conversation

jcsp
Copy link
Contributor

@jcsp jcsp commented Apr 25, 2024

These are testability/logging improvements spun off from #7475

  • Don't log warnings for shutdown errors in compute hook
  • Revise logging around heartbeats and reconcile_all so that we aren't emitting such a large volume of INFO messages under normal quite conditions.
  • Clean up the last_error of TenantShard to hold a ReconcileError instead of a String, and use that properly typed error to suppress reconciler cancel errors during reconcile_all_now. This is important for tests that iteratively call that, as otherwise they would get 500 errors when some reconciler in flight was cancelled (perhaps due to a state change on the tenant shard starting a new reconciler).

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • Do not forget to reformat commit message to not include the above checklist

@jcsp jcsp added a/tech_debt Area: related to tech debt c/storage/controller Component: Storage Controller labels Apr 25, 2024
@jcsp jcsp requested a review from VladLazar April 25, 2024 07:06
@jcsp jcsp requested a review from a team as a code owner April 25, 2024 07:06
@jcsp jcsp changed the title Jcsp/storcon shutdown errors storage controller: log hygiene & better error type Apr 25, 2024
Copy link

github-actions bot commented Apr 25, 2024

2790 tests run: 2670 passed, 0 failed, 120 skipped (full report)


Flaky tests (4)

Postgres 16

  • test_compute_pageserver_connection_stress: debug
  • test_pageserver_metrics_removed_after_detach: debug

Postgres 15

  • test_partial_evict_tenant[relative_equal]: release

Postgres 14

  • test_timeline_size_quota_on_startup: release

Code coverage* (full report)

  • functions: 28.1% (6484 of 23066 functions)
  • lines: 46.9% (46073 of 98228 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
caad0a9 at 2024-04-26T08:27:44.710Z :recycle:

Copy link
Contributor

@VladLazar VladLazar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but it made a test unhappy

storage_controller/src/service.rs Outdated Show resolved Hide resolved
@jcsp jcsp requested a review from VladLazar April 25, 2024 16:58
@jcsp jcsp enabled auto-merge (squash) April 25, 2024 17:29
@jcsp jcsp merged commit d63185f into main Apr 26, 2024
53 checks passed
@jcsp jcsp deleted the jcsp/storcon-shutdown-errors branch April 26, 2024 08:16
jcsp added a commit that referenced this pull request Apr 30, 2024
## Problem

This test became flaky recently with failures like:
```
AssertionError: Log errors on storage_controller: (129, '2024-04-29T16:41:03.591506Z ERROR request{method=PUT path=/control/v1/tenant/b38c0447fbdbcf4e1c023f00b0f7c221/shard_split request_id=34df4975-2ef3-4ed8-b167-2956650e365c}: Error processing HTTP request: InternalServerError(Reconcile error on shard b38c0447fbdbcf4e1c023f00b0f7c221-0002: Cancelled\n')
```

Likely due to #7508 changing how errors are reported from Reconcilers.

## Summary of changes

- Tolerate `Reconcile error.*Cancelled` log errors
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a/tech_debt Area: related to tech debt c/storage/controller Component: Storage Controller
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants