Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

experiment / request for feedback: lazy bytestring variant of 'unknown' #146

Closed
wants to merge 1 commit into from

Conversation

robx
Copy link

@robx robx commented Jun 16, 2022

Postgrest happens to pass a potentially large JSON value from a lazy byte string to hasql via unknown. By using unknownLazy as introduced by this PR, we can save a copy, reducing memory usage significantly in some scenarios.

Filing this PR mostly to get your input on whether you think whether (something like) unknownLazy would be a reasonable addition, and/or you have some other idea how hasql might help here.

(It's entirely possible that we should be making deeper changes on the Postgrest side, this is just scratching the surface, but seems like a potentially nice and easy win.)

Reference: PostgREST/postgrest#2333 (comment)

@nikita-volkov
Copy link
Owner

What exactly is binding you to a lazy bytestring on the Postgrest side? If it's the builder, then there's a plethora of strict builders which are more efficient, if it's the JSON construction, then again, there are packages like "jsonifier" which are more efficient as well.

In my experience I haven't seen a problem where lazy bytestring or lazy text are not inferior to alternative solutions. I know that other people grow to a similar conclusion as well.

@robx
Copy link
Author

robx commented Jun 16, 2022

What exactly is binding you to a lazy bytestring on the Postgrest side? If it's the builder, then there's a plethora of strict builders which are more efficient, if it's the JSON construction, then again, there are packages like "jsonifier" which are more efficient as well.

In my experience I haven't seen a problem where lazy bytestring or lazy text are not inferior to alternative solutions. I know that other people grow to a similar conclusion as well.

Thanks for your reply! My understanding on the Postgrest side is still limited -- I suspect there may well be no particularly good reason to be using lazy bytestrings, I'll definitely experiment with switching out the type fully.

The concrete context is that over all, we're validating and copying JSON data from an HTTP request body to postgres, without reencoding. The body is "naturally" chunked -- presently, it is read fully using WAI's strictRequestBody, which reasonably returns a lazy byte string. Ultimately/ideally, we'll probably want to stream the body, and the optimal solution might well involve bypassing hasql's encoder entirely. But this is for me to figure out :).

@nikita-volkov
Copy link
Owner

nikita-volkov commented Jun 16, 2022

BTW, implementing this won't have the effect you want. Because the haskell binding to libpq expects a strict bytestring, which is because the libpq itself (the C lib) expects a ptr to data for the whole param. So the lazy bytestring will get converted to strict down the road either way.

@robx
Copy link
Author

robx commented Jun 16, 2022

BTW, implementing this won't have the effect you want. Because the haskell binding to libpq expects a strict bytestring, which is because the libpq itself (the C lib) expects a ptr to data for the whole param. So the lazy bytestring will get converted to strict down the road either way.

Right, the best we can do is to copy the body directly to the buffer that's passed to libpq (unless we bypass libpq and write postgres protocol directly...). But we could ideally copy directly from the request to that buffer. Right now, we read to lazy bytestring (copy 1), convert that to strict (copy 2), pass it through hasql/postgresql-binary/bytestring-strict-builder (copy 3), convert it to a C-string to pass to libpq (copy 4).

My change here avoids "copy 2". I suppose another way to avoid it would be for bytestring-strict-builder to be smart enough to avoid the copy if it's a single bytestring -- does that make sense, and would you think that's a preferable change?

@nikita-volkov
Copy link
Owner

another way to avoid it would be for bytestring-strict-builder to be smart enough to avoid the copy if it's a single bytestring -- does that make sense, and would you think that's a preferable change?

It does. I'll think about it

@robx
Copy link
Author

robx commented Jun 30, 2022

Just to give a small status update on this:

  • It seems largely incidental that Postgrest uses unknown to encode the body. I'm considering switching it to use jsonBytes instead (specifically because that allows binary encoding, which doesn't require zero termination when passing to libpq). So this PR as such does seem like a bit of an arbitrary change -- there's as much reason to add unknownLazy as there is jsonBytesLazy (and the latter is rather what I need now).
  • I do think by now that the fact that we want to encode a lazy bytestring is reasonable. This HTTP request body is naturally read chunk by chunk.

(Here's a hacky commit that adds jsonBytesLazy: robx@54b5e48.)

@robx
Copy link
Author

robx commented Aug 10, 2022

Closing in favour of #149.

@robx robx closed this Aug 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants