-
Notifications
You must be signed in to change notification settings - Fork 62
Check if CRDB is telling us to retry the txn #1683
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Match against DieselDatabaseErrorKind::SerializationFailure and check if it's a case where a CRDB transaction didn't fail but needs to be retried. This can happen if there's too much database contention, and requires clients to retry the request. Return HTTP 429 in this case to signal to clients that a retry is required.
|
429 doesn't seem quite right:
|
|
Admittedly it's hard to know what to do here. This is basically a spurious error, not the client's fault. But it's hard to automatically retry it -- it seems like it might depend on the specific operation we're doing. Maybe a 500 or 503 is the best default? Looking at oxidecomputer/remote-access-preview#23 I see we hit
Do we know which record was experiencing contention? Maybe we can rework the queries to avoid it? CC @smklein at some point we were talking about using batched transactions -- it sounds like it might address this problem as well. |
|
I believe the big region_allocate transaction is the problem here. Sending all the statements as a single batch is something to try but I don't immediately see how we can take that arbitrary transaction function and do that. |
|
That makes sense. Are you able to tell which query or record we're hitting this with? I know we don't have an alternative to interactive transactions in Omicron today. Nevertheless, this looks like another serious consequence of using them. Looking at Retrying doesn't feel like a great long-term solution. It seems to replace a transient error (the retry error) with a latency bubble. As load increases and either Nexus or database latency increases, the window for a conflict gets longer. This increases the probability of a conflict. This results in more retries and more load -- it's a feedback loop. I don't know if this is likely in this scenario, but similar behaviors brought down Joyent's Manta several times. We could reduce the likelihood of a conflict by reworking the function a bit. For example, it sequentially fetches each dataset and updates it, then does the same with zpools. We could do all the fetches at once and save maybe half of the round-trips? We could also introduce some randomness into the dataset selection so that not every concurrent attempt to create a disk tries to use (and update) the same dataset records. I'm not sure how much either of these will help. Even without solving the interactive transaction problem, I bet we could improve this significantly by moving more of the logic into SQL using a CTE. (A CTE is sort of a side door way to create a batched transaction, in that if you want to do X, then Y, then Z, you can always Or we could bite the bullet and invest in an interface for batched transactions. I think @smklein talked about this at some point but I don't remember if there's an issue for it? Whether using a CTE or a batched transaction, there could be a sizable improvement in provisioning time. If each database round-trip is even 50ms, doing it in one go could shave off hundreds of milliseconds per disk per instance provisioned. |
|
Related: #973 |
|
I agree with @davepacheco , my inclination is either: (1) make this a CTE, or (2) make it possible for this to be a batched transaction. However, for this to be usable as a batched transaction, all the statements must be usable without depending on the results of previous operations: https://www.cockroachlabs.com/docs/stable/transactions.html#batched-statements I'm not sure this is viable here - the operations are roughly:
These operations are highly dependent on each other -- so even if we added batched transactions, I don't know if this would be a viable candidate. |
|
Yeah. I'm not sure it's impossible, but a CTE may be easier because it lets you give names to those intermediate things and then use them in subsequent statements. |
|
Closing this because of #1688 |
Crucible changes are: Make `ImpactedBlocks` an absolute range (#1685) update omicron (#1687) Update Rust crate chrono to v0.4.40 (#1682) Update Rust crate tempfile to v3.19.0 (#1676) Log loops during crutest replay (#1674) Refactor live-repair start and send (#1673) Update Rust crate libc to v0.2.171 (#1684) Update Rust crate clap to v4.5.32 (#1683) Update Rust crate async-trait to 0.1.88 (#1680) Update Rust crate bytes to v1.10.1 (#1681) Update Rust crate anyhow to v1.0.97 (#1679) Update Rust crate http to v1 (#1678) Update Rust crate tokio-util to v0.7.14 (#1675) Update Rust crate thiserror to v2 (#1657) Update Rust crate omicron-zone-package to 0.12.0 (#1647) Update Rust crate opentelemetry to 0.28.0 (#1543) Update Rust crate itertools to 0.14.0 (#1646) Update Rust crate clearscreen to v4 (#1656) nightly test polish (#1665) Update Rust crate strum_macros to 0.27 (#1654) Update Rust crate strum to 0.27 (#1653) Update Rust crate rusqlite to 0.34 (#1652) Better return semantics from should_send Trying to simplify enqueue Remove unnecessary argument Remove unused code from `ImpactedBlocks` (#1668) Log when pantry/volume layers activate things (#1670) Fix unused import (#1669) Use `RangeSet` instead of tracking every complete job (#1663) Update Rust crate dropshot to 0.16.0 (#1645) Simplify pending work tracking (#1662) Remove rusqlite dependency from crucible-common (#1659) Update Rust crate reedline to 0.38.0 (#1651) Update Rust crate proptest to 1.6.0 (#1648) Update Rust crate bytes to v1.10.0 (#1644) Update Rust crate tracing-subscriber to 0.3.19 (#1643) Update Rust crate tracing to v0.1.41 (#1642) Update Rust crate toml to v0.8.20 (#1641) Update Rust crate thiserror to v1.0.69 (#1640) Update Rust crate serde_json to v1.0.139 (#1639) Update Rust crate serde to v1.0.218 (#1638) Update Rust crate reqwest to v0.12.12 (#1637) Update Rust crate libc to v0.2.170 (#1636) Update Rust crate indicatif to 0.17.11 (#1635) Update Rust crate httptest to 0.16.3 (#1634) Update Rust crate clap to v4.5.31 (#1632) Update Rust crate chrono to v0.4.39 (#1631) Update Rust crate byte-unit to 5.1.6 (#1630) Update Rust crate async-trait to 0.1.86 (#1629) Update Rust crate csv to 1.3.1 (#1633) Update Rust crate anyhow to v1.0.96 (#1628) Fix a test flake (#1626) Bitpack per-block slot data in memory (#1625) Use `div_ceil` instead of `(x + 7) / 8` (#1624) Add repair server dynamometer (#1618) Propolis changes are: Paper over minor() differences Update crucible (#886) Test oximeter metrics end to end (#855) server: improve behavior of starting VMs that are waiting for Crucible activation (#873) server: extend API request queue's memory of prior requests (#880) Use production propolis-server flags in PHD CI builds (#878) Added a new field to the Board struct that propolis requires.
Crucible changes are: Make `ImpactedBlocks` an absolute range (#1685) update omicron (#1687) Update Rust crate chrono to v0.4.40 (#1682) Update Rust crate tempfile to v3.19.0 (#1676) Log loops during crutest replay (#1674) Refactor live-repair start and send (#1673) Update Rust crate libc to v0.2.171 (#1684) Update Rust crate clap to v4.5.32 (#1683) Update Rust crate async-trait to 0.1.88 (#1680) Update Rust crate bytes to v1.10.1 (#1681) Update Rust crate anyhow to v1.0.97 (#1679) Update Rust crate http to v1 (#1678) Update Rust crate tokio-util to v0.7.14 (#1675) Update Rust crate thiserror to v2 (#1657) Update Rust crate omicron-zone-package to 0.12.0 (#1647) Update Rust crate opentelemetry to 0.28.0 (#1543) Update Rust crate itertools to 0.14.0 (#1646) Update Rust crate clearscreen to v4 (#1656) nightly test polish (#1665) Update Rust crate strum_macros to 0.27 (#1654) Update Rust crate strum to 0.27 (#1653) Update Rust crate rusqlite to 0.34 (#1652) Better return semantics from should_send Trying to simplify enqueue Remove unnecessary argument Remove unused code from `ImpactedBlocks` (#1668) Log when pantry/volume layers activate things (#1670) Fix unused import (#1669) Use `RangeSet` instead of tracking every complete job (#1663) Update Rust crate dropshot to 0.16.0 (#1645) Simplify pending work tracking (#1662) Remove rusqlite dependency from crucible-common (#1659) Update Rust crate reedline to 0.38.0 (#1651) Update Rust crate proptest to 1.6.0 (#1648) Update Rust crate bytes to v1.10.0 (#1644) Update Rust crate tracing-subscriber to 0.3.19 (#1643) Update Rust crate tracing to v0.1.41 (#1642) Update Rust crate toml to v0.8.20 (#1641) Update Rust crate thiserror to v1.0.69 (#1640) Update Rust crate serde_json to v1.0.139 (#1639) Update Rust crate serde to v1.0.218 (#1638) Update Rust crate reqwest to v0.12.12 (#1637) Update Rust crate libc to v0.2.170 (#1636) Update Rust crate indicatif to 0.17.11 (#1635) Update Rust crate httptest to 0.16.3 (#1634) Update Rust crate clap to v4.5.31 (#1632) Update Rust crate chrono to v0.4.39 (#1631) Update Rust crate byte-unit to 5.1.6 (#1630) Update Rust crate async-trait to 0.1.86 (#1629) Update Rust crate csv to 1.3.1 (#1633) Update Rust crate anyhow to v1.0.96 (#1628) Fix a test flake (#1626) Bitpack per-block slot data in memory (#1625) Use `div_ceil` instead of `(x + 7) / 8` (#1624) Add repair server dynamometer (#1618) Propolis changes are: Paper over minor() differences Update crucible (#886) Test oximeter metrics end to end (#855) server: improve behavior of starting VMs that are waiting for Crucible activation (#873) server: extend API request queue's memory of prior requests (#880) Use production propolis-server flags in PHD CI builds (#878) --------- Co-authored-by: Alan Hanson <alan@oxide.computer>
Match against DieselDatabaseErrorKind::SerializationFailure and check if it's a case where a CRDB transaction didn't fail but needs to be retried. This can happen if there's too much database contention, and requires clients to retry the request.
Return HTTP 429 in this case to signal to clients that a retry is required.
Closes https://github.com/oxidecomputer/remote-access-preview/issues/23