New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wip: automatic retries for transient failures #267
Conversation
return new Promise((resolve) => setTimeout(() => resolve(), ms)) | ||
} | ||
|
||
const RetryableErrorCodes = new Set([503, 504, 522, 523, 524]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments on what the code means? I think some of the semantics are CF-specific
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think 504 Gateway Timeout, 522/523/524(Cloudflare specific) are fine to retry on.
503 means that PostgreSQL is unreachable for postgrest, retrying in that case could be handled in another PR since PostgREST provides a Retry-After header. Later on 503 will also signal when postgrest's pool is overloaded.
@@ -88,6 +94,35 @@ export abstract class PostgrestBuilder<T> implements PromiseLike<PostgrestRespon | |||
return this | |||
} | |||
|
|||
retryingFetch(url: string, opts: any, retries: number, retryDelayMs: number): Promise<Response> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make it private?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make it mirror fetch:
retryingFetch(url: string, opts: any, retries: number, retryDelayMs: number): Promise<Response> { | |
retryingFetch(input: RequestInfo, init?: RequestInit, retries: number, retryDelayMs: number): Promise<Response> { |
return res.then(onfulfilled, onrejected) | ||
} | ||
|
||
processResponse = async <TResult1 = PostgrestResponse<T>>(res: Response): Promise<PostgrestResponse<T>> => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make it private
|
||
app.get('/temporary503failure', async function(request, reply) { | ||
if (count_503 < 2) { | ||
count_503++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
npm run format
if (!['GET', 'HEAD'].includes(this.method)) { | ||
return Promise.resolve(this.fetch(url, opts)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good call. However, how should we handle PATCH requests that time out as on supabase/supabase#6813?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@darora Perhaps for this type of cases, we can increase the Cloudflare timeout?
Enterprise customers can increase the 524 timeout up to 6000 seconds using the proxy_read_timeout API endpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If PostgREST treats PATCH (and DELETE) as semantically idempotent it should be fine, but maybe safer to do it only for reads for now just to limit the impact, and note it as a TODO.
The linked issue seems odd though - I would've expected res.ok
to handle it so that error
is not null/undefined. I may need to repro it first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If PostgREST treats PATCH (and DELETE) as semantically idempotent it should be fin
@soedirgo They are idempotent if we stick to the definition in rfc7231:
A request method is considered "idempotent" if the intended EFFECT ON THE SERVER of multiple identical requests with that method is the same as the effect for a single such request.
It knows that repeating the request will have the same intended effect, even if the original request succeeded, though the response might differ.
The last paragraph applies(response might be different) when using Prefer: return=representation
, so a DELETE would return contents the first time but an empty array the next time is tried on the same resource. Not sure if would be safe to retry in those cases, as it could cause application errors?
The way around that, would be not forcing the Prefer: return=representation
(which IIRC, is also needed for RLS policies to work as intended without return=minimal
) on postgrest-js
, and then retry only for cases which do not specify that header.
PATCH should be safe to retry on, even with Prefer: return=representation
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
retry only for cases which do not specify that header
Yup sounds good
PATCH should be safe to retry on, even with Prefer: return=representation
Wouldn't it return different rows? I'm assuming return=representation returns the old rows
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it still may not be idempotent even with return=minimal
, because you might have a trigger that itself isn't idempotent (extreme example: on DELETE charge a Stripe payment w/ pg_net)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it return different rows? I'm assuming return=representation returns the old rows
Hm, return=representation
returns the new updated row. Or do you mean the row would have been modified by another PATCH request?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return=representation returns the new updated row
Ah I see, you can ignore that part
An update here, PostgREST main branch has a References: |
No description provided.