Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add caller push of binary data #953

Closed
wants to merge 4 commits into from

Conversation

tomdcc
Copy link
Contributor

@tomdcc tomdcc commented Sep 21, 2017

Allows a caller to push binary data to an output stream rather than
having to provide a byte array or input stream. This gives the caller
more control over buffering strategy and allows explicit cleanup of
off-heap buffers or other non-garbage-collected resources.

Originally discussed on-list here.

@codecov-io
Copy link

@codecov-io codecov-io commented Sep 21, 2017

Codecov Report

Merging #953 into master will increase coverage by 0.08%.
The diff coverage is 87.71%.

@@             Coverage Diff              @@
##             master     #953      +/-   ##
============================================
+ Coverage     68.54%   68.62%   +0.08%     
- Complexity     3876     3891      +15     
============================================
  Files           178      180       +2     
  Lines         16192    16223      +31     
  Branches       2642     2632      -10     
============================================
+ Hits          11098    11133      +35     
+ Misses         3864     3862       -2     
+ Partials       1230     1228       -2

@tomdcc
Copy link
Contributor Author

@tomdcc tomdcc commented Sep 21, 2017

FWIW the only failing test now is unrelated to these changes

@Override
public void writeTo(OutputStream stream) throws IOException {
// this _does_ involve some copying to a temporary buffer, but that's unavoidable
// as OutputStream itself only accepts single bytes or heap allocated byte arrays
Copy link
Member

@vlsi vlsi Sep 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean we should focus on transferTo(WritableByteChannel target) kind of API instead of sticking with `OutputStream?

Copy link
Contributor Author

@tomdcc tomdcc Sep 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered just trying to support someone setting a ByteBuffer as a parameter. But it has some issues.

Firstly, our internal socket isn't an NIO-based socket AFAICS, so we don't have a native writable channel to pass a byte buffer to. So we'd be reduced to doing what this class does and use a Channels wrapper, but if you look at the implementation it's just copying bytes into an intermediate buffer, exactly the same as we'd be doing if someone passes an InputStream. See comment further down for why that's not ideal.

I didn't want to go so far as to change our underlying socket implementation, but if / when we do that I think we could just accept a ByteBuffer as a parameter directly and then probably this interface would just be legacy.

private void insertStream(ByteStreamWriter writer) throws Exception {
PreparedStatement updatePS = con.prepareStatement(TestUtil.insertSQL("images", "img", "?"));
try {
updatePS.setObject(1, writer);
Copy link
Member

@vlsi vlsi Sep 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically speaking, JDBC has a standard API for this: java.sql.PreparedStatement#setBinaryStream(int, java.io.InputStream, long)

Have you considered something like implements InputStream, ByteStreamWriter?

Copy link
Contributor Author

@tomdcc tomdcc Sep 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue with this interface is that it gives the caller no control over buffering strategy and makes cleanup harder. Passing in an InputStream means that the driver will always copy the bytes to a fixed-length buffer( currently 8k) and then copy them out, see PgStream.sendStream(). The caller then has to infer that it's safe to clean up the underlying resource by waiting for the expected number of bytes to be read.

In contrast the writer interface lets the caller write data straight to the BufferedOutputStream that wraps the socket. If it's writing more data than the buffered stream has then that buffer will get flushed and the data will be copied over by the underlying native write implementation in SocketOutputStream in a single go.

The caller also knows when the write has finished (since they're in control) and can clean up any off-heap resources immediately.

Copy link
Contributor Author

@tomdcc tomdcc Sep 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reading, I think that you're suggesting that the ByteStreamWriter should also implement InputStream as a means to pass it in. Is that right? The thing is, it isn't an InputStream and the JDBC API has a perfectly valid method to pass proprietary objects through: setObject() :). Can you explain why you don't like that approach?

@vlsi
Copy link
Member

@vlsi vlsi commented Sep 25, 2017

@tomdcc Does it make sense to have an abstraction over OutputStream for writeTo(OutputStream stream) API in order to have control over that and evolve properly?

@tomdcc
Copy link
Contributor Author

@tomdcc tomdcc commented Sep 26, 2017

@vlsi What would that look like? If it's not an OutputStream then it'll have to have a very similar interface, won't it?

Or are you suggesting that we could use that interface to allow writing to other channel types in the future, e.g.:

interface ByteTarget {
  // current implementation
  OutputStream getOutputStream();

  // some future implementation when we've got a NIO socket to use
  GatheringByteChannel getByteChannel();
}

interface public interface ByteStreamWriter {
  int getLength();
  void writeTo(ByteTarget target) throws IOException;
}

@davecramer
Copy link
Member

@davecramer davecramer commented Sep 26, 2017

@tomdcc the challenge is that once we publish a public API we have to live with it for the foreseeable future. By abstracting it we can future proof it a bit hopefully.

@tomdcc
Copy link
Contributor Author

@tomdcc tomdcc commented Sep 27, 2017

@davecramer Understood, thanks. @vlsi Is the above interface the kind of thing that you were thinking of?

@tomdcc
Copy link
Contributor Author

@tomdcc tomdcc commented Oct 5, 2017

@davecramer
Copy link
Member

@davecramer davecramer commented Oct 5, 2017

@tomdcc I was kind of hoping more people would comment on this. Alas this doesn't seem to happen.
I'm OK with this interface.

@vlsi

tomdcc added 2 commits Oct 4, 2018
Allows a caller to push binary data to an output stream rather than
having to provide a byte array or input stream. This gives the caller
more control over buffering strategy and allows explicit cleanup of
off-heap buffers or other non-garbage-collected resources.
@tomdcc
Copy link
Contributor Author

@tomdcc tomdcc commented Oct 4, 2018

Sorry this sat for so long as our internal priorities shifted and I never got to giving it more time. I've now rebased it and introduced the above interface, as discussed.

Ping @vlsi @davecramer

@paplorinc
Copy link
Contributor

@paplorinc paplorinc commented Dec 23, 2019

Rebased in #1659

davecramer pushed a commit that referenced this issue Dec 26, 2019
* feat: add caller push of binary data

Allows a caller to push binary data to an output stream rather than
having to provide a byte array or input stream. This gives the caller
more control over buffering strategy and allows explicit cleanup of
off-heap buffers or other non-garbage-collected resources.

* feat: Introduce ByteStreamTarget

* Fix rename in PGStream

* Change copyright dates of added files to 2020

Co-authored-by: Tom Dunstan <tomdcc@users.noreply.github.com>
@paplorinc
Copy link
Contributor

@paplorinc paplorinc commented Dec 27, 2019

@davecramer
Copy link
Member

@davecramer davecramer commented Dec 27, 2019

closed with #1659

@davecramer davecramer closed this Dec 27, 2019
@tomdcc tomdcc deleted the byte-stream-writer branch Jan 6, 2020
davecramer pushed a commit to davecramer/pgjdbc that referenced this issue Jul 5, 2021
)

* feat: add caller push of binary data

Allows a caller to push binary data to an output stream rather than
having to provide a byte array or input stream. This gives the caller
more control over buffering strategy and allows explicit cleanup of
off-heap buffers or other non-garbage-collected resources.

* feat: Introduce ByteStreamTarget

* Fix rename in PGStream

* Change copyright dates of added files to 2020

Co-authored-by: Tom Dunstan <tomdcc@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants