Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change table queue to in-memory queue and add callbacks #62

Open
steve-chavez opened this issue Feb 15, 2022 · 11 comments
Open

Change table queue to in-memory queue and add callbacks #62

steve-chavez opened this issue Feb 15, 2022 · 11 comments
Labels
enhancement New feature or request

Comments

@steve-chavez
Copy link
Member

Reasons

  • For all the requests to finish, an INSERT on the _http_response table must be done, this reduces throughput. There are cases where the client doesn't care about the response so it doesn't need to be persisted. For example, function hooks don't do anything with the response.
  • As discussed in feat: libuv #50 (comment), we don't have to ensure delivery and do retries since we're just an HTTP client, so we don't really need to persist the requests into http_request_queue.
  • The http_request_queue can grow big with many requests and cause production issues(internal link).
  • Users create FKs to our private tables Handle user referencing private tables #44, making the bg worker crash when trying to do the TTL.

Proposal

Drop the _http_response and http_request_queue tables and instead use an in-memory queue, plus add two callbacks:

create or replace function net.http_get(
   url text,
-- ..,
   success_cb text default '',
   error_cb text default ''
)
returns void -- no id is returned now
as $$
-- ...
$$ language sql;

Which can be used like:

select net.http_get(
  url := 'http://domain.com',
, success_cb := $$ insert into my_table values ($1, $2, $3) $$ -- $1=status, $2=headers, $3=body
, error_cb := $$ do $_$begin raise exception 'Failed request on %: "%"', $1, $2; end$_$; $$ -- $1=url, $2=error_message
);

Pros

  • The callbacks are optional, so we don't have to start a transaction for each request if there are none provided. Throughput should be greatly increased from this.
  • The error_cb can also be an insert/update on a table, so the request can be retried if needed.

@olirice @soedirgo WDYT?

@steve-chavez steve-chavez added the enhancement New feature or request label Feb 15, 2022
@soedirgo
Copy link
Member

Ohh, so that's what you meant by callbacks. Nice! Also like that this makes the impl a lot simpler and more akin to typical HTTP clients.

How would go about implementing the in-memory queue? IIRC when I looked into it passing info from the client process to the bgworker process was nontrivial.

@olirice
Copy link
Collaborator

olirice commented Feb 15, 2022

that looks great!

Is it correct that all http status codes (e.g. including 500) would still go through the success_cb and the error_cb applies to stuff like timeouts?


returns void -- no id is returned now

I think it would still be good to return a uuid and pass that through to the callbacks i.e.

select net.http_get(
  ...
, success_cb := $$ -- $1=status, $2=headers, $3=body $4=request_id $$,
, error_cb := $$ -- $1=url, $2=error_message, $3=request_id $$
); -- result is uuid (request_id)

so if people are firing off a large number of requests, or want to associate a request with a response they'll have the tools they need


I know pg_cron does a similar inline sql approach but afaik they don't have to deal with passing parameters. Are we sure theres a good way to do that?

if not, a backup solution might be

create or replace function net.http_get(
-- ..,
   success_cb regproc default null,
   error_cb regproc default null
)
...as $$
-- ...
$$ language sql;

where the function signature matches the callback requirements

@soedirgo
Copy link
Member

I know pg_cron does a similar inline sql approach but afaik they don't have to deal with passing parameters. Are we sure theres a good way to do that?

Should be possible with the USING clause, though we should probably document that the callbacks are limited to plpgsql command strings.

@steve-chavez
Copy link
Member Author

How would go about implementing the in-memory queue?

There's an example in postgres/src/test/modules/test_shm_mq for how to use the shared memory queue.

Is it correct that all http status codes (e.g. including 500) would still go through the success_cb and the error_cb applies to stuff like timeouts?

Yes, correct. Things like "unable to resolve host" as well.

so if people are firing off a large number of requests, or want to associate a request with a response they'll have the tools they need

I think we should leave that to be done externally, users can pass an id on the callback like:

select net.http_get(
  -- url, headers, body..
, success_cb := format($$insert into responses values(%s, $1, $2, $3)$$, id);
); 

And this id can come from a trigger on a table or maybe from a sequence used in a wrapper function. With that they can still associate a request/response if they want.

I know pg_cron does a similar inline sql approach but afaik they don't have to deal with passing parameters. Are we sure theres a good way to do that?

Yes, SPI_execute_with_args should take care of that.

I've also thought about supporting something like insert into tbl($status, $headers, $body) for better DX but that would require parsing and we'd loss perf. So I've just kept it simple at this stage.

@olirice
Copy link
Collaborator

olirice commented Feb 15, 2022

Should be possible with the USING clause

cool! didn't know about that

I think we should leave that to be done externally, users can pass an id on the callback like:

I think we should leave that to be done externally, users can pass an id on the callback like:

select net.http_get(
  -- url, headers, body..
, success_cb := format($$insert into responses values(%s, $1, $2, $3)$$, id);
); 

And this id can come from a trigger on a table or maybe from a sequence used in a wrapper function. With that they can still associate a request/response if they want.

ah nice, that works

or maybe from a sequence used in a wrapper function. With that they can still associate a request/response if they want.

good point

@FelixZY
Copy link

FelixZY commented Aug 7, 2022

I'm thinking that moving to an in-memory queue would allow for http requests during postgrest GET requests as well?

During GET requests, postgrest uses a read-only transaction, during which INSERTs and other modifications are not allowed. I'm currently having troubles with making http requests as part of my policy for performing SELECTs on some tables due to this.

@steve-chavez
Copy link
Member Author

I'm thinking that moving to an in-memory queue would allow for http requests during postgrest GET requests as well?

Yes, since an http_get wouldn't need an INSERT on the net.http_request_queue anymore.

@steve-chavez
Copy link
Member Author

steve-chavez commented Aug 30, 2022

We would lose observability if we go this route, currently customers can do:

-- requests with fail/success HTTP status codes  
select count(*), status_code from net._http_response group by status_code;
 count | status_code
-------+-------------
   448 |         429
   821 |         500
  1182 |         408
  2567 |         200-- rate limiting errorselect content from net._http_response where status_code = 429 limit 1;
                          content
------------------------------------------------------------
 <html><body><h1>429 Too Many Requests</h1>                +
 You have sent too many requests in a given amount of time.+
 </body></html>             

Which is great for debugging and finding out why a webhook receiver might be failing.

To increase throughput, we could make the net tables UNLOGGED, remove indexes and PKs from them(this would also solve #44).


An alternative could be logging the failed http requests. Then the user would have to search the pg logs.

@soedirgo
Copy link
Member

+1 for just logging errors, in general making it as stateless as possible should help with stability.

@steve-chavez
Copy link
Member Author

Yeah, agree. Also forgot about #49, observability would be better with logging.

@bigbear-ryan
Copy link

Any plans to implement this new feature in the near future?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants