Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: remote timeline client shutdown trips circuit breaker #8495

Merged
merged 9 commits into from
Jul 25, 2024

Conversation

problame
Copy link
Contributor

@problame problame commented Jul 24, 2024

Before this PR

1.The circuit breaker would trip on CompactionError::Shutdown. That's wrong, we want to ignore those cases.
2. remote timeline client shutdown would not be mapped to CompactionError::Shutdown in all circumstances.

We observed this in staging, see https://neondb.slack.com/archives/C033RQ5SPDH/p1721829745384449

This PR fixes (1) with a simple match statement, and (2) by switching a bunch of anyhow usage over to distinguished errors that ultimately get mapped to CompactionError::Shutdown.

I removed the implicit #[from] conversion from anyhow::Error to CompactionError::Other to discover all the places that were mapping remote timeline client shutdown to anyhow::Error.

In my opinion #[from] is an antipattern and we should avoid it, especially for anyhow::Error. If some callee is going to return anyhow, the very least the caller should to is to acknowledge, through a map_err(MyError::Other) that they're conflating different failure reasons.

the price to pay is one .expect() where previously we would bail into an anyhow error
We can simplify because the rationale from #5880
doesn't apply anymore. VirtualFile doesn't have transient failures.

Private DM link with Joonas: https://neondb.slack.com/archives/D049K7HJ9JM/p1721836424615799
… to CompactionError::ShuttingDown

This PR is the product of removing the implicit #[from] conversion from anyhow => Other
downloaded.get() can be a simple map to CompactionError::Other, no need for that refactoring

This reverts commit e017aca.
Copy link

3126 tests run: 3005 passed, 0 failed, 121 skipped (full report)


Code coverage* (full report)

  • functions: 32.7% (7001 of 21410 functions)
  • lines: 50.1% (55675 of 111059 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
6148081 at 2024-07-24T17:54:42.518Z :recycle:

@problame problame marked this pull request as ready for review July 25, 2024 08:24
@problame problame requested a review from a team as a code owner July 25, 2024 08:24
@problame problame requested review from VladLazar and jcsp and removed request for VladLazar July 25, 2024 08:24
@jcsp
Copy link
Collaborator

jcsp commented Jul 25, 2024

In my opinion #[from] is an antipattern and we should avoid it, especially for anyhow::Error

+100

@problame problame enabled auto-merge (squash) July 25, 2024 08:43
@problame problame merged commit a1256b2 into main Jul 25, 2024
66 checks passed
@problame problame deleted the problame/uploadqueue-shutdown-trips-circuit-breaker branch July 25, 2024 08:44
Comment on lines 771 to 773
return Err(LoadError::Io(
anyhow::Error::new(e).context("open layer file"),
));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be LoadError::Open, second one could be LoadError::Read, third could be LoadError::Corruption, none need an anyhow...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants