Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimise MySqlDataReader.GetStream #592

Closed
bgrainger opened this issue Nov 27, 2018 · 6 comments
Closed

Optimise MySqlDataReader.GetStream #592

bgrainger opened this issue Nov 27, 2018 · 6 comments

Comments

@bgrainger
Copy link
Member

bgrainger commented Nov 27, 2018

The default implementation of DbDataReader.GetStream creates a new MemoryStream and copies the data returned by GetBytes into it, in a loop.

A better implementation would be to create a read-only MemoryStream directly on the backing store for the row.

This does raise a question about the lifetime of the Stream, though: its backing memory would be overwritten as soon as DbDataReader.Read is called again. Is there any documentation on whether consumers can use the returned Stream after calling Read again, or after closing the DbDataReader? (Since, with Connector/NET, these behaviours are implicitly allowed due to the way the Stream is created, there may well be existing code that relies on it.)

@roji
Copy link

roji commented Nov 28, 2018

@bgrainger I really don't think the Stream is supposed to live beyond the row, so once DbDataReader.Read() is called I think it's perfectly OK to dispose it. I can't find any specific information on this in the docs, but in any case this is the Npgsql behavior.

Note that part of the idea is for this not to copy data needlessly, so creating a new byte[] seems like a bad idea... Assuming you have the data buffered somewhere in your driver, ideally you would return a stream over that - a MemoryStream should be fine I think.

@roji
Copy link

roji commented Nov 28, 2018

Reading your description again, I may have misunderstood (I don't know anything about your internals). Assuming that you have the entire column contiguously in memory - which should be the case unless CommandBehavior.SequentialAccess was specified - then you can simply return a MemoryStream over that; you'll get a good implementation of Read(Span<byte>) for free.

Obviously if you don't have the entire column buffered in memory things become more complicated.

@bgrainger
Copy link
Member Author

then you can simply return a MemoryStream over that

Right, of course. I had forgotten about the MemoryStream(byte[], int, int, bool) constructor, which would be perfectly suitable here.

@bgrainger
Copy link
Member Author

I really don't think the Stream is supposed to live beyond the row, so once DbDataReader.Read() is called I think it's perfectly OK to dispose it.

@roji I completely agree; that's what I would expect from the API. However, this isn't documented, and since the existing implementation (in DbDataReader.GetStream) returns a Stream that is safe to use even after the DbDataReader is disposed, it seems plausible that consumers are relying on that. OTOH, MySqlDataReader.GetStream in Connector/NET doesn't actually work so that eases any compatibility concerns I had.

@bgrainger bgrainger self-assigned this Nov 28, 2018
@roji
Copy link

roji commented Nov 28, 2018

I really don't think the Stream is supposed to live beyond the row, so once DbDataReader.Read() is called I think it's perfectly OK to dispose it.

@roji I completely agree; that's what I would expect from the API. However, this isn't documented, and since the existing implementation (in DbDataReader.GetStream) returns a Stream that is safe to use even after the DbDataReader is disposed, it seems plausible that consumers are relying on that. OTOH, MySqlDataReader.GetStream in Connector/NET doesn't actually work so that eases any compatibility concerns I had.

Yeah, unfortunately not a lot is documented in ADO.NET, but that's also a reflection of the fact that different providers really do behavior differently, sometimes for valid reasons (i.e. because the database/protocol really works differently).

For GetStream(), if the stream returned is valid after Read(), that means the column is fully buffered in memory, either because the entire resultset is buffered (which could result in big memory usage) or that a copy is made... Neither seems to make much sense to me. A good quick check would be to see how SQL Client behaves - unfortunately I don't have access to my dev machine at the moment to check... But of course if your aim is to be compatible with the Oracle provider then its lack of support is good news for you :)

@bgrainger
Copy link
Member Author

Fixed in 0.48.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants