Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor Blob performance over high latency connection #2957

Closed
wegendt-bosch opened this issue Sep 28, 2023 · 18 comments
Closed

Poor Blob performance over high latency connection #2957

wegendt-bosch opened this issue Sep 28, 2023 · 18 comments

Comments

@wegendt-bosch
Copy link

Describe the issue
The PreparedStatement class has poor performance for storing Blobs as oids over high latency network connections, since it uses a fixed buffer size of 4 KB.

Driver Version?
latest

Java Version?
any

OS Version?
Win 10

PostgreSQL Version?
latest

To Reproduce
Steps to reproduce the behaviour:
Create a Blob and store it as oid with a network with high latency (e.g. over internet)

Expected behaviour
A clear and concise description of what you expected to happen.
And what actually happens
Transfer rates approaching network speed

Possible fix:
make PgPreparedStatement::createBlob use a configurable buffer size via e.g. org/postgresql/driverconfig.properties. Suggested name: binaryTransferBufferSize

@wegendt-bosch wegendt-bosch changed the title Poor Blob performance over high latency Poor Blob performance over high latency connection Sep 28, 2023
@davecramer
Copy link
Member

Care to create a PR ?

@wegendt-bosch
Copy link
Author

I need to go through some hoops since I'm in some corp, so it will take some time, but if you haven't done it till then I will definitely!

@vlsi
Copy link
Member

vlsi commented Nov 17, 2023

@wegendt-bosch could you please clarify how you use createBlob?
Do you pass limit parameter? What is the type of InputStream you pass?
I guess we could use Math.min(64*1024, length) for the buffer size, and it might be enough.

I am not fond of adding yet another configuration source, so if we consider making blob buffer size configurable, we should probably use connection properties for that.

What is the source of the InputStream? Is it FileInputStream or something else?


Yet another direction for reducing the latency would be pipelining the requests. As createBlob is a single JDBS call, it could pipeline lowrite requests, and there's no need for the driver to wait for the response after each lowrite request. The driver could send a bunch of data, and eventually monitor the progress.

@vlsi
Copy link
Member

vlsi commented Nov 17, 2023

One more question: do you have any benchmarks for the buffer size in your environment?
What is the sweet spot, and what if we just use your value rather than hard-coded 4096?

@wegendt-bosch
Copy link
Author

Hi,

I don't call the method myself, I use hibernate and that calls it. Therefore I can't use connection properties, since hibernate does not forward those AFAIK.

Using a buffer size of 128KB is about 10 times faster, but I'm not sure you want to reserve that much RAM for transfers in local networks, this really is just for high-ping environments.

The source of the input stream is a hibernate-internal stream. If you want to pursue a different direction you should probably do it instead of me, I'm just showing you my solution I'm using in a fork until this bug is resolved :)

@wegendt-bosch
Copy link
Author

wegendt-bosch commented Nov 20, 2023

Math.min(64*1024, length)

You probably mean Math.max, right?

@vlsi
Copy link
Member

vlsi commented Nov 20, 2023

You probably mean Math.max, right?

I mean we should:

  1. Use blob.length if the blob is small. For instance, if the blob is 100 bytes, we should not create 256KiB buffer
  2. Use a limited a mount of memory. For instance, if the blobl is 100 megabytes, we should probably use a fixed buffer of 64 or 256 KiB otherwise we might run into OutOfMemoryError

That is why I suggested min


The source of the input stream is a hibernate-internal stream. If you want to pursue a different direction you should probably do it instead of me, I'm just showing you my solution I'm using in a fork until this bug is resolved :)

A slightly different idea might be storing the InputStream as is, and then read that stream somewhere like in

private static int copyStream(InputStream inputStream, OutputStream outputStream, int limit)
throws IOException {
int totalLength = 0;
byte[] buffer = new byte[2048];
int readLength = inputStream.read(buffer);
while (readLength > 0) {
totalLength += readLength;
outputStream.write(buffer, 0, readLength);
if (totalLength >= limit) {
return -1;
}

Do you know the exact "hibernate-internal stream" or can you suggest a reproducer?
If "hibeernate-internal stream" is something like ByteArrayInputStream, then we could probably borrow the byte array from it (with contents!) rather than read it bite-by-bite.

@wegendt-bosch
Copy link
Author

In hibernate's case, it is a org.hibernate.engine.jdbc.internal.BinaryStreamImpl extends ByteArrayInputStream, but since you are maintaining a library I'd assume you'd want to solve the general case, too, so at least have performant handling for arbitrary streams, and as an extension optimize specific cases, too (which would then be an extension of my proposed changes).

I don't see what's the difference of the copyStream method and the one in createBlob, the blocking/non-performant call is the one to outputStream.write with small buffers, no?

I'll add the Math.min to allocating the buffer to handle small blobs :)

@wegendt-bosch
Copy link
Author

wegendt-bosch commented Nov 20, 2023

to reproduce with hibernate, create an entity holding a blob:

image

and assign a blob using some hibernate blob creator, e.g.:

blobHolder.theBlob = NonContextualLobCreator.INSTANCE.createBlob(inputStream.readAllBytes())

then store it in postgres:

session.save(blobHolder)

@vlsi
Copy link
Member

vlsi commented Nov 20, 2023

org.hibernate.engine.jdbc.internal.BinaryStreamImpl extends ByteArrayInputStream, but since you are maintaining a library I'd assume you'd want to solve the general case

Right you are. However, we can still consider adding optimization like if (inputStream extends ByteArrayInputStream) { skip buffering, and just reuse the stream as if it was already buffered }. I think it would keep the semantics, and it would avoid excessive copying of the data between buffers. It might be quite common for the users to pass ByteArrayInputStream instances, so the optimization might be helpful for generic case, not just Hibernate.

@wegendt-bosch
Copy link
Author

Hm, it seems that ByteArrayInputStream does not expose its array, I'd guess this is since it also is a sliced view of the array, i.e. it has members offset, pos and count.

@vlsi
Copy link
Member

vlsi commented Nov 21, 2023

Hm, it seems that ByteArrayInputStream does not expose its array

It exposes java.io.ByteArrayInputStream#transferTo(OutputStream out).

@wegendt-bosch
Copy link
Author

ah sure, I can use that

@wegendt-bosch
Copy link
Author

Compiler says, transferTo is available since Java9, I think you want to keep Java8 compatibility...?
image

@vlsi
Copy link
Member

vlsi commented Nov 21, 2023

Compiler says, transferTo is available since Java9, I think you want to keep Java8 compatibility

We would like to keep Java 8 compatibility indeed.
However, we can probably go for reflection for "pre Java 9" and interface call for Java 9+.

@vlsi
Copy link
Member

vlsi commented Nov 21, 2023

ByteArrayInputStream.buf field is protected, so we don't even need to use reflection to get its value. We can craft ByteBufferInputStreamBufferExposer extends ByteArrayInputStream which could expose transferTo.

@wegendt-bosch
Copy link
Author

I don't think you can extend existing objects, I think for Java8 reflection is the only way you could achieve this

@wegendt-bosch
Copy link
Author

wegendt-bosch commented Dec 8, 2023

Fixed by #3044

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants