Change table queue to in-memory queue and add callbacks #62

steve-chavez · 2022-02-15T04:50:36Z

Reasons

For all the requests to finish, an INSERT on the _http_response table must be done, this reduces throughput. There are cases where the client doesn't care about the response so it doesn't need to be persisted. For example, function hooks don't do anything with the response.
As discussed in feat: libuv #50 (comment), we don't have to ensure delivery and do retries since we're just an HTTP client, so we don't really need to persist the requests into http_request_queue.
The http_request_queue can grow big with many requests and cause production issues(internal link).
Users create FKs to our private tables Handle user referencing private tables #44, making the bg worker crash when trying to do the TTL.

Proposal

Drop the _http_response and http_request_queue tables and instead use an in-memory queue, plus add two callbacks:

create or replace function net.http_get(
   url text,
-- ..,
   success_cb text default '',
   error_cb text default ''
)
returns void -- no id is returned now
as $$
-- ...
$$ language sql;

Which can be used like:

select net.http_get(
  url := 'http://domain.com',
, success_cb := $$ insert into my_table values ($1, $2, $3) $$ -- $1=status, $2=headers, $3=body
, error_cb := $$ do $_$begin raise exception 'Failed request on %: "%"', $1, $2; end$_$; $$ -- $1=url, $2=error_message
);

Pros

The callbacks are optional, so we don't have to start a transaction for each request if there are none provided. Throughput should be greatly increased from this.
The error_cb can also be an insert/update on a table, so the request can be retried if needed.

@olirice @soedirgo WDYT?

The text was updated successfully, but these errors were encountered:

soedirgo · 2022-02-15T05:53:02Z

Ohh, so that's what you meant by callbacks. Nice! Also like that this makes the impl a lot simpler and more akin to typical HTTP clients.

How would go about implementing the in-memory queue? IIRC when I looked into it passing info from the client process to the bgworker process was nontrivial.

olirice · 2022-02-15T12:40:50Z

that looks great!

Is it correct that all http status codes (e.g. including 500) would still go through the success_cb and the error_cb applies to stuff like timeouts?

returns void -- no id is returned now

I think it would still be good to return a uuid and pass that through to the callbacks i.e.

select net.http_get(
  ...
, success_cb := $$ -- $1=status, $2=headers, $3=body $4=request_id $$,
, error_cb := $$ -- $1=url, $2=error_message, $3=request_id $$
); -- result is uuid (request_id)

so if people are firing off a large number of requests, or want to associate a request with a response they'll have the tools they need

I know pg_cron does a similar inline sql approach but afaik they don't have to deal with passing parameters. Are we sure theres a good way to do that?

if not, a backup solution might be

create or replace function net.http_get(
-- ..,
   success_cb regproc default null,
   error_cb regproc default null
)
...as $$
-- ...
$$ language sql;

where the function signature matches the callback requirements

soedirgo · 2022-02-15T13:40:48Z

I know pg_cron does a similar inline sql approach but afaik they don't have to deal with passing parameters. Are we sure theres a good way to do that?

Should be possible with the USING clause, though we should probably document that the callbacks are limited to plpgsql command strings.

steve-chavez · 2022-02-15T14:54:43Z

How would go about implementing the in-memory queue?

There's an example in postgres/src/test/modules/test_shm_mq for how to use the shared memory queue.

Is it correct that all http status codes (e.g. including 500) would still go through the success_cb and the error_cb applies to stuff like timeouts?

Yes, correct. Things like "unable to resolve host" as well.

so if people are firing off a large number of requests, or want to associate a request with a response they'll have the tools they need

I think we should leave that to be done externally, users can pass an id on the callback like:

select net.http_get(
  -- url, headers, body..
, success_cb := format($$insert into responses values(%s, $1, $2, $3)$$, id);
);

And this id can come from a trigger on a table or maybe from a sequence used in a wrapper function. With that they can still associate a request/response if they want.

I know pg_cron does a similar inline sql approach but afaik they don't have to deal with passing parameters. Are we sure theres a good way to do that?

Yes, SPI_execute_with_args should take care of that.

I've also thought about supporting something like insert into tbl($status, $headers, $body) for better DX but that would require parsing and we'd loss perf. So I've just kept it simple at this stage.

olirice · 2022-02-15T17:57:42Z

Should be possible with the USING clause

cool! didn't know about that

I think we should leave that to be done externally, users can pass an id on the callback like:

I think we should leave that to be done externally, users can pass an id on the callback like:
select net.http_get(
  -- url, headers, body..
, success_cb := format($$insert into responses values(%s, $1, $2, $3)$$, id);
); 
And this id can come from a trigger on a table or maybe from a sequence used in a wrapper function. With that they can still associate a request/response if they want.

ah nice, that works

or maybe from a sequence used in a wrapper function. With that they can still associate a request/response if they want.

good point

FelixZY · 2022-08-07T01:34:45Z

I'm thinking that moving to an in-memory queue would allow for http requests during postgrest GET requests as well?

During GET requests, postgrest uses a read-only transaction, during which INSERTs and other modifications are not allowed. I'm currently having troubles with making http requests as part of my policy for performing SELECTs on some tables due to this.

steve-chavez · 2022-08-08T17:49:44Z

I'm thinking that moving to an in-memory queue would allow for http requests during postgrest GET requests as well?

Yes, since an http_get wouldn't need an INSERT on the net.http_request_queue anymore.

steve-chavez · 2022-08-30T23:47:21Z

We would lose observability if we go this route, currently customers can do:

-- requests with fail/success HTTP status codes  
select count(*), status_code from net._http_response group by status_code;
 count | status_code
-------+-------------
   448 |         429
   821 |         500
  1182 |         408
  2567 |         200

-- rate limiting error
select content from net._http_response where status_code = 429 limit 1;
                          content
------------------------------------------------------------
 <html><body><h1>429 Too Many Requests</h1>                +
 You have sent too many requests in a given amount of time.+
 </body></html>

Which is great for debugging and finding out why a webhook receiver might be failing.

To increase throughput, we could make the net tables UNLOGGED, remove indexes and PKs from them(this would also solve #44).

An alternative could be logging the failed http requests. Then the user would have to search the pg logs.

soedirgo · 2022-08-31T05:20:11Z

+1 for just logging errors, in general making it as stateless as possible should help with stability.

steve-chavez · 2022-09-01T04:45:42Z

Yeah, agree. Also forgot about #49, observability would be better with logging.

bigbear-ryan · 2023-04-27T21:13:02Z

Any plans to implement this new feature in the near future?

steve-chavez added the enhancement New feature or request label Feb 15, 2022

steve-chavez mentioned this issue Jun 8, 2022

Handle Server return ed nothing (no headers, no data)(CURLE_GOT_NOTHING) #64

Closed

steve-chavez mentioned this issue Sep 28, 2022

How does pg_net fit into the overall supabase architecture? #73

Closed

steve-chavez mentioned this issue Oct 24, 2022

Docs: Adds pgnet and database webhooks supabase/supabase#9803

Merged

steve-chavez mentioned this issue Jan 24, 2023

Timeout was reached error with incorrect status: SUCCESS #74

Closed

This was referenced May 18, 2023

Fixed Timeout bug #92

Closed

Updated SQL: Timed out responses are no longer recorded as "SUCCESSFUL" #93

Merged

TheOtherBrian1 mentioned this issue Aug 9, 2023

cleanup of net._http_response table #97

Open

TheOtherBrian1 mentioned this issue Sep 25, 2023

Add retries #110

Open

steve-chavez mentioned this issue Jun 19, 2024

UNLOGGED table segfault causing whole cluster to restart #132

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change table queue to in-memory queue and add callbacks #62

Change table queue to in-memory queue and add callbacks #62

steve-chavez commented Feb 15, 2022

soedirgo commented Feb 15, 2022

olirice commented Feb 15, 2022

soedirgo commented Feb 15, 2022

steve-chavez commented Feb 15, 2022

olirice commented Feb 15, 2022 •

edited

Loading

FelixZY commented Aug 7, 2022

steve-chavez commented Aug 8, 2022

steve-chavez commented Aug 30, 2022 •

edited

Loading

soedirgo commented Aug 31, 2022

steve-chavez commented Sep 1, 2022

bigbear-ryan commented Apr 27, 2023

Change table queue to in-memory queue and add callbacks #62

Change table queue to in-memory queue and add callbacks #62

Comments

steve-chavez commented Feb 15, 2022

Reasons

Proposal

Pros

soedirgo commented Feb 15, 2022

olirice commented Feb 15, 2022

soedirgo commented Feb 15, 2022

steve-chavez commented Feb 15, 2022

olirice commented Feb 15, 2022 • edited Loading

FelixZY commented Aug 7, 2022

steve-chavez commented Aug 8, 2022

steve-chavez commented Aug 30, 2022 • edited Loading

soedirgo commented Aug 31, 2022

steve-chavez commented Sep 1, 2022

bigbear-ryan commented Apr 27, 2023

olirice commented Feb 15, 2022 •

edited

Loading

steve-chavez commented Aug 30, 2022 •

edited

Loading