New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for setBinaryStream with unknown length #220

Closed
wants to merge 2 commits into
base: master
from

Conversation

Projects
None yet
4 participants
@alexismeneses
Contributor

alexismeneses commented Nov 23, 2014

Proposed fix for #213

By protocol design, it's actually impossible to send a BYTEA to the backend without knowing its size beforehand.

So in this fix, when an inputStream without length is used as a prepared statement parameter using setBinaryStream(), the given stream is consumed immediately and buffered either in memory or on disk.
This way, the size is known and the data is read from the buffer instead.

The toggle between memory and disk is set at 50kb (see MAX_MEMORY_CACHE_BYTES), any opinion about that is welcome.

Also, the on-disk buffer is done using regular Java tempfile, deleted when the end of stream is reached. I wonder if it can be an issue in some situations/environments?

@ringerc

This comment has been minimized.

Member

ringerc commented Dec 1, 2014

I've raised this as an issue for the v4 protocol and will add it to the protocol TODO if there's general agreement that it'd be a good change to make.

In the mean time I think that locally buffering on the client is the only choice we really have.

MAX_MEMORY_CACHE_BYTES should be MAX_MEMORY_BUFFER_BYTES. It isn't a cache.

Otherwise, I think it looks good from a general reading. I'm a bit concerned about how it'll work under a SecurityManager, but I think PgJDBC in general is likely to already have plenty of issues in a SecurityManager context.

I don't think there are any BC implications here, since your code only gets hit for code paths that'd previously fail with a not-implemented exception.

If you don't mind renaming that constant then I'll merge this.

@ringerc

This comment has been minimized.

Member

ringerc commented Dec 1, 2014

Submitted comment before I finished writing. See edits above.

@ringerc ringerc added the enhancement label Dec 1, 2014

@alexismeneses

This comment has been minimized.

Contributor

alexismeneses commented Dec 1, 2014

I did the constant name change as you've suggested.

Agreed that it could be a good improvement to plan for V4 protocol. I look forward to hear what will be decided on that point.

@sehrope

This comment has been minimized.

Contributor

sehrope commented Dec 1, 2014

I don't like the idea of the driver creating and managing temp files. If it needs to be done it should be explicit.

I find it hard to believe that you'd be saving data to a BLOB in the database without knowing the size in advance. I'm guessing what you really have is BLOB data coming from a stream with a known length.

For example, if you're dealing with a file upload in an HTTP request, the data is usually represented as a stream but it also indicates the size of the data as part of the request. Ditto for the more basic case of reading a file from the local file system (obviously you already know the length because you're the one opening it).

Rather than have this handle the buffering internally why not have a separate class, say PG that can be set via setObject(...) that includes both the InputStream and the size of the stream? You could then add special handling for that object type when binding parameters to send the appropriate length and read from the stream (and say throw an exception if the number of bytes read doesn't match).

If the length is really unknown then any code that deals with buffering it to disk to figure out it's size would live outside the driver. It also pushes out the magic 50KB memory buffer size out of the driver so that app's can pick whatever size they feel is appropriate.

The only down sides I see to this are that it's not JDBC-spec compatible and a client app that uses it would be Postgres-specific. I honestly don't care too much about the former (JDBC has plenty of non-sensical issues already) and would prefer something that lets me write better software with Postgres rather than kludging things behind the scenes to be DB-agnostic.

@ringerc

This comment has been minimized.

Member

ringerc commented Dec 2, 2014

I don't like it either, but as you said, it's an issue with the JDBC interface its self.

Sensible clients will and should use setBlob(InputStream, long) or setBlob(Blob) instead, but it appears that at least some code in the wild uses the length-less interface. It'd be useful to support it for those clients.

I don't like creating tempfiles either. However, I also know we don't have a ton of choice when someone uses this API. I guess arguably we should just buffer the whole lot in RAM, and if it doesn't fit, tell the client to use a better interface.

It might be safer to default to buffering in RAM, and to offer a JDBC parameter that permits the client to specify that tempfiles should be used above a certain threshold and maybe override the tempfile path. That way nobody will get surprised by the driver making tempfiles as they'll have to explicitly enable it, but the interface will still work for smaller inputs.

@sehrope

This comment has been minimized.

Contributor

sehrope commented Dec 2, 2014

I like the idea of implementing the setBlob(..., InputStream, long). It's a perfect match for the use case I describe (stream input with a known length).

I still don't like the idea of buffering things (in RAM or otherwise) but agree that a configurable RAM buffer with an error message is better than overflowing to temp files. At the very least the latter would need a way to indicate where the temp file should be created. You can use the default temp directory but on a lot of systems that's a relatively small RAM disk and this wouldn't work for anything sizable (which is the whole point right?).

@alexismeneses: Does that cover your use case as well? i.e. is the length of the BLOB known in advance with the data presented as a stream or do you really have no idea the size of the input?

@alexismeneses

This comment has been minimized.

Contributor

alexismeneses commented Dec 2, 2014

@sehrope: setBlob setBinaryStream() with length is already implemented. The point of this PR is to implement also the length-less variant because it seems that there are frameworks using it (see linked issue).

That's something I understand, because stacking up some frameworks in apps, they are sometimes too far from http-request to be able to retrieve length and unfortunately JDK lacks an InputStream-with-length class that would allow to forward it stack to stack.

Also agree that we could add a parameter that could be the maximum size to buffer into RAM before switching to tempfiles, with a default of 0 meaning unlimited. I think I'll first try to find time to finish the centralization of all parameters into constants before doing it.

[edit: changed setBlob to setBinaryStream() at the beginning of the comment.]

@davecramer

This comment has been minimized.

Member

davecramer commented Dec 26, 2014

@alexismeneses do you have time to see why the merge is failing ?

@davecramer

This comment has been minimized.

Member

davecramer commented Dec 26, 2014

fixed merge errors with #237

@davecramer davecramer closed this Dec 26, 2014

@alexismeneses

This comment has been minimized.

Contributor

alexismeneses commented Dec 29, 2014

@davecramer is fixing the merge conflict still relevant? (I've seen you closed both #220 and #237 but it seems unmerged though).

Theorically taking the whole build.xml from master branch should just be ok (the only conflicting change is switching junit to version 4.11 which has already been merged into the master thanks to another PR)

@davecramer

This comment has been minimized.

Member

davecramer commented Dec 29, 2014

Thanks for catching this. I seem to remember waiting for Travis-CI to
finish another and must have been distracted

Dave Cramer

On 29 December 2014 at 08:03, Alexis Meneses notifications@github.com
wrote:

@davecramer https://github.com/davecramer is fixing the merge conflict
still relevant? (I've seen you closed both #220
#220 and #237
#237 but it seems unmerged though).

Theorically taking the whole build.xml from master branch should just be
ok (the only conflicting change is switching junit to version 4.11 which
has already been merged into the master thanks to another PR)


Reply to this email directly or view it on GitHub
#220 (comment).

vlsi added a commit that referenced this pull request Nov 24, 2016

feat: add support for PreparedStatement.setCharacterStream(int, Reader)
In pgjdbc 9.4 (1211) and earlier, the PreparedStatement method
setCharacterStream(int, Reader) throws a not implemented
exception. The setCharacterStream(int, Reader, int) method does not
throw an exception, but it materializes the Reader as a String and
then uses the same code path as setString(int, String). Materializing
the readers can cause memory usage issues (OutOfMemoryError) when
working with large text values.
This patch implements the setCharacterStream(int, Reader) method
using in memory buffers for small streams and temp files for large
streams. To do so, it uses code added in pull request #220 that was
used to implement setBinaryStream(int, InputStream). This patch
adds a utility class to convert the Reader's UTF-16 character stream
into a UTF-8 binary stream.
The setCharacterStream(int, Reader) method implemented in this patch
supports both simple and extended query modes. However, in simple
mode, it materializes the reader as a String just like
setCharacterStream(int, Reader, int).

fixes #671
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment