experiment / request for feedback: lazy bytestring variant of 'unknown' #146

robx · 2022-06-16T09:58:09Z

Postgrest happens to pass a potentially large JSON value from a lazy byte string to hasql via unknown. By using unknownLazy as introduced by this PR, we can save a copy, reducing memory usage significantly in some scenarios.

Filing this PR mostly to get your input on whether you think whether (something like) unknownLazy would be a reasonable addition, and/or you have some other idea how hasql might help here.

(It's entirely possible that we should be making deeper changes on the Postgrest side, this is just scratching the surface, but seems like a potentially nice and easy win.)

Reference: PostgREST/postgrest#2333 (comment)

nikita-volkov · 2022-06-16T10:43:59Z

What exactly is binding you to a lazy bytestring on the Postgrest side? If it's the builder, then there's a plethora of strict builders which are more efficient, if it's the JSON construction, then again, there are packages like "jsonifier" which are more efficient as well.

In my experience I haven't seen a problem where lazy bytestring or lazy text are not inferior to alternative solutions. I know that other people grow to a similar conclusion as well.

robx · 2022-06-16T11:22:11Z

What exactly is binding you to a lazy bytestring on the Postgrest side? If it's the builder, then there's a plethora of strict builders which are more efficient, if it's the JSON construction, then again, there are packages like "jsonifier" which are more efficient as well.

In my experience I haven't seen a problem where lazy bytestring or lazy text are not inferior to alternative solutions. I know that other people grow to a similar conclusion as well.

Thanks for your reply! My understanding on the Postgrest side is still limited -- I suspect there may well be no particularly good reason to be using lazy bytestrings, I'll definitely experiment with switching out the type fully.

The concrete context is that over all, we're validating and copying JSON data from an HTTP request body to postgres, without reencoding. The body is "naturally" chunked -- presently, it is read fully using WAI's strictRequestBody, which reasonably returns a lazy byte string. Ultimately/ideally, we'll probably want to stream the body, and the optimal solution might well involve bypassing hasql's encoder entirely. But this is for me to figure out :).

nikita-volkov · 2022-06-16T11:44:14Z

BTW, implementing this won't have the effect you want. Because the haskell binding to libpq expects a strict bytestring, which is because the libpq itself (the C lib) expects a ptr to data for the whole param. So the lazy bytestring will get converted to strict down the road either way.

robx · 2022-06-16T15:12:44Z

BTW, implementing this won't have the effect you want. Because the haskell binding to libpq expects a strict bytestring, which is because the libpq itself (the C lib) expects a ptr to data for the whole param. So the lazy bytestring will get converted to strict down the road either way.

Right, the best we can do is to copy the body directly to the buffer that's passed to libpq (unless we bypass libpq and write postgres protocol directly...). But we could ideally copy directly from the request to that buffer. Right now, we read to lazy bytestring (copy 1), convert that to strict (copy 2), pass it through hasql/postgresql-binary/bytestring-strict-builder (copy 3), convert it to a C-string to pass to libpq (copy 4).

My change here avoids "copy 2". I suppose another way to avoid it would be for bytestring-strict-builder to be smart enough to avoid the copy if it's a single bytestring -- does that make sense, and would you think that's a preferable change?

nikita-volkov · 2022-06-16T15:36:32Z

another way to avoid it would be for bytestring-strict-builder to be smart enough to avoid the copy if it's a single bytestring -- does that make sense, and would you think that's a preferable change?

It does. I'll think about it

robx · 2022-06-30T10:36:10Z

Just to give a small status update on this:

It seems largely incidental that Postgrest uses unknown to encode the body. I'm considering switching it to use jsonBytes instead (specifically because that allows binary encoding, which doesn't require zero termination when passing to libpq). So this PR as such does seem like a bit of an arbitrary change -- there's as much reason to add unknownLazy as there is jsonBytesLazy (and the latter is rather what I need now).
I do think by now that the fact that we want to encode a lazy bytestring is reasonable. This HTTP request body is naturally read chunk by chunk.

(Here's a hacky commit that adds jsonBytesLazy: robx@54b5e48.)

robx · 2022-08-10T13:22:52Z

Closing in favour of #149.

unknown variants

433428a

This was referenced Jun 16, 2022

experiment: Avoid copy when building from a single strict bytestring nikita-volkov/bytestring-strict-builder#11

Closed

nix: update bytestring-strict-builder to 0.4.5.6 PostgREST/postgrest#2336

Closed

This was referenced Jun 29, 2022

Hasql.Encoders allows encoding an "array of unknown", which doesn't (can't?) work #147

Open

avoid copy when encoding request body PostgREST/postgrest#2349

Merged

robx mentioned this pull request Jul 26, 2022

Provide a generic encoding escape hatch #149

Closed

robx closed this Aug 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

experiment / request for feedback: lazy bytestring variant of 'unknown' #146

experiment / request for feedback: lazy bytestring variant of 'unknown' #146

robx commented Jun 16, 2022 •

edited

nikita-volkov commented Jun 16, 2022

robx commented Jun 16, 2022

nikita-volkov commented Jun 16, 2022 •

edited

robx commented Jun 16, 2022

nikita-volkov commented Jun 16, 2022

robx commented Jun 30, 2022

robx commented Aug 10, 2022

experiment / request for feedback: lazy bytestring variant of 'unknown' #146

experiment / request for feedback: lazy bytestring variant of 'unknown' #146

Conversation

robx commented Jun 16, 2022 • edited

nikita-volkov commented Jun 16, 2022

robx commented Jun 16, 2022

nikita-volkov commented Jun 16, 2022 • edited

robx commented Jun 16, 2022

nikita-volkov commented Jun 16, 2022

robx commented Jun 30, 2022

robx commented Aug 10, 2022

robx commented Jun 16, 2022 •

edited

nikita-volkov commented Jun 16, 2022 •

edited