Make timeouts aggregated #1041

roji · 2016-04-22T14:56:30Z

Following discussion in #689 (comment), it seems to make sense to make our CommandTimeout behave lie SqlClient's (see docs).

This means that all socket read time is aggregated for a single ADO.NET API call (e.g. Read()). Currently we simply set the socket's ReceiveTimeout to the CommandTimeout, which means that if multiple socket reads happen within the same Read(), the timeout is reset each time. It seems to make sense to aggregate the CommandTimeout across all of Read()'s network reads instead.

Comments from anyone?

roji · 2016-04-28T08:57:44Z

Note: check the situation with the connection timeout as well, are we doing it right?

realityexists · 2016-10-31T08:42:30Z

Should CommandTimeout be the time per read or the time for all reads? I can see a few scenarios:

The user just wants an overall timeout for reading all results. Obviously the SqlClient behaviour is simpler here, but the user could easily implement an overall timeout themselves - just start a Stopwatch and check the time elapsed after each Read().
The user want a timeout for the first result, but then is happy to take as long as they need to reading the results. An overall timeout doesn't work for this, so the current Npgsql behaviour is better.
The user wants a timeout for the first result and a different (usually much shorter) timeout for each subsequent result. Current Npgsql behaviour is better for this, too, as long as it doesn't cache the value of CommandTimeout, so user code could change it after ExecuteReader has returned (and possibly even before each Read).

Compatibility with SqlClient is good, but backwards compatibility with Npgsql 3.1 is better. Combined with the current behaviour being more flexible, I'd say leave it like this.

roji · 2016-10-31T08:58:09Z

The user just wants an overall timeout for reading all results. Obviously the SqlClient behaviour is simpler here, but the user could easily implement an overall timeout themselves - just start a Stopwatch and check the time elapsed after each Read().

If by "reading all results" you mean all result rows of a command, then this isn't what SqlClient does (or what this issue was proposing to do) - SqlClient applies timeouts to a single ADO.NET method invocation (e.g. Read()), and only aggregates I/O time spent performing on that invocation for the purpose of timeouts.

As you say, if the user wants a client-side timeout for reading all results they can implement it themselves.

The user want a timeout for the first result, but then is happy to take as long as they need to reading the results. An overall timeout doesn't work for this, so the current Npgsql behaviour is better.

I definitely don't view this as a very valid/interesting scenario - one of the key points of timeouts is to avoid network issues which would block your program forever. I understand where you're coming from - you're thinking about long-running (or deadlocked) queries rather than network issues, but the latter must be addressed.

One more point - it's not necessarily correct to assume that PostgreSQL takes a long time upfront processing your query, and then spits results back quickly. AFAIK depending on exactly what you're doing it may return the first result very quickly but take long afterwards.

The user wants a timeout for the first result and a different (usually much shorter) timeout for each subsequent result. Current Npgsql behaviour is better for this, too, as long as it doesn't cache the value of CommandTimeout, so user code could change it after ExecuteReader has returned (and possibly even before each Read).

Even if this issue is implemented and we start aggregating I/O time, this would still be possible as aggregation only happens within a single ADO.NET method call (as I said above). So you would be free to play around with CommandTimeout between calls.

To summarize, this issue actually doesn't change behavior in a very significant way, since it only aggregates within a single method call.

realityexists · 2016-10-31T08:59:56Z

Ah, OK, ignore the above comment then - I misunderstood how it works.

baal2000 · 2018-11-25T15:25:06Z

+1 for a standard, provider-agnostic behavior.

roji · 2018-11-25T15:27:17Z

Note that as I wrote in #2239, there's nothing standard about aggregating socket read time - it just happens to be the SqlClient behavior (i.e. not documented on DbCommand.CommandTimeout). I'm not sure what other providers do.

In addition, it isn't very clear whether it really is the desirable behavior for all scenarios. Both behaviors have their pros and cons.

roji · 2020-10-01T06:28:12Z

@vonzshik you may be interested in this. I am really not sure we should actually implement this - there are pros and cons of both the aggregate and non-aggregate timeout behaviors. But as you've been working and thinking about timeouts recently, it would be good to have your thoughts.

roji · 2020-10-01T06:29:48Z

Also /cc @YohDeadfall and @Brar. Since we're redoing the timeout/cancellation story for 5.0, it may be a good idea to at least think about this a bit.

roji · 2020-10-01T07:09:55Z

Note that I'm not completely sold on the general idea of a cumulative timeout.

The two main uses I see for the timeout in general are to cancel a never-ending query (deadlocked, or simply taking too long because of perf issues), and to handle network partitions; in both those cases, no data arrives at all. As long as data does keep arriving for a query, I'm not sure users would want us to cancel that query just because the total time spent reading exceeded something.

vonzshik · 2020-10-01T07:22:07Z

As long as data does keep arriving for a query, I'm not sure users would want us to cancel that query just because the total time spent reading exceeded something.

That is the only case I see right now, when our current implementation doesn't work very well. The data is trickling down very-very slowly, but not slow enough for the timeout. If we have a 30 seconds timeout and 5 TCP reads per DataReader.Read, it might take 145 seconds (29 seconds per one TCP read) until something is returned to a user.

roji · 2020-10-01T07:28:08Z

@vonzshik yes, that's the argument in favor of applying CommandTimeout cumulatively (to all TCP reads in Read). But I'm still not convinced users really want to cancel a "trickling" query - we've never really received any feedback on this mechanism; the main use of CommandTimeout is definitely for cases where no data comes at all.

vonzshik · 2020-10-01T07:34:31Z

As long as that corner case is documented, I see no reason to make timeouts aggregated.

roji · 2020-10-01T08:08:35Z

I've added a note in npgsql/doc#47 re documentation. Let's see if @YohDeadfall and/or @Brar have other thoughts on aggregation, otherwise I think it's fine to leave this open in the backlog for users to comment on it.

Brar · 2020-10-01T11:19:11Z

it just happens to be the SqlClient behavior (i.e. not documented on DbCommand.CommandTimeout). I'm not sure what other providers do.

If there is some sort of standard or guidance on the ADO.NET level I'd generally vote to adhere to it but I'm not convinced that everything SqlClient does should automatically be assumed as the standard.

But I'm still not convinced users really want to cancel a "trickling" query

Well, they could be. Who knows?
OTOH, if they want to, I'd consider it unsupported on the sync API and really easy on the async API.
Just create your own CancellationTokenSource set your timeout on it and pass it's CancellationToken to NpgsqlDataReader.ReadAsync() in every iteration..

As I understand it, they'd currently break the connection in case of a cancellation as we currently don't support graceful (server side) cancellation on NpgsqlDataReader.ReadAsync() but we could probably change that in the future.

vonzshik · 2020-10-01T11:27:15Z

As I understand it, they'd currently break the connection in case of a cancellation as we currently don't support graceful (server side) cancellation on NpgsqlDataReader.ReadAsync() but we could probably change that in the future.

Right now, it could be emulated with NpgsqlCommand.Cancel. But with #3191, it also should be possible by cancelling the cancellation token.

vonzshik · 2020-10-01T11:33:11Z

But NpgsqlDataReader.ReadAsync() is a curios case, as most of the time it returns some data from the buffer, and doesn't do any IO.

Brar · 2020-10-01T12:41:38Z

But NpgsqlDataReader.ReadAsync() is a curios case, as most of the time it returns some data from the buffer, and doesn't do any IO.

Are you sure?
IIRC we buffer (at most, only in non-sequential mode) one row at a time, so ReadAsync() is supposed to do IO on every request.

vonzshik · 2020-10-01T12:54:28Z

But NpgsqlDataReader.ReadAsync() is a curios case, as most of the time it returns some data from the buffer, and doesn't do any IO.

Are you sure?
IIRC we buffer (at most, only in non-sequential mode) one row at a time, so ReadAsync() is supposed to do IO on every request.

For a query "SELECT * from generate_series(1, 1000000)", it takes around 542 reads (where we do FastRead), until there is some IO.

Brar · 2020-10-01T12:59:57Z

For a query "SELECT * from generate_series(1, 1000000)", it takes around 542 reads (where we do FastRead), until there is some IO.

Ah, yes.
We buffer as much as possible (as much as we get sent from the server in one request and fits in our buffer) but at least one complete row in non-sequential mode.
Thanks!

Brar · 2020-10-01T13:07:57Z

Anyways, as our consumers shouldn't make any assumptions when ReadAsync() does IO (and for big rows it could happen once per row) they probably should assume it happens on every call (as opposed to GetFieldValue<T>() et al. where you can rely on the fact that no IO happens in non-sequential mode).

bgrainger · 2020-10-01T13:09:50Z

FWIW, MySqlConnector tries to implement the same behaviour (aggregate timeouts) as SqlClient. MySQL sends each row in a result set as a single "MySQL packet" (may be bundled up into a single TCP packet). So unless the rows are huge, the data is usually already there, or retrieved with a single Receive from the Socket. If the rows are very large, multiple socket reads will be required, and the timeout applies to the total of all those operations, until the full row is buffered in memory. (MySqlConnector doesn't support "streaming" with SequentialAcccess.)

I think this makes the most sense from a user perspective: if CommandTimeout is 30, MySqlDataReader.Read() will either return true, false, or throw an exception within 30 seconds, no matter how many network reads (0, 1, or many) are required.

roji · 2020-10-03T08:14:08Z

Re ReadAsync, note that we check the cancellation token up-front regardless of if we already have a buffered row or not, so @Brar's note above is correct - users can always manage a cancellation token themselves and pass it to all operations.

But the main question here is what we think the driver behavior should actually be - I'm honestly not sure. Ignoring the SqlClient behavior for second, CommandTimeout could mean "total time spent reading", or it could mean "time spent reading without receiving any data". My personal feeling/assumption is that most/all users really need the 2nd (see #1041 (comment)), but implementing the second may also be fine (most users are unlikely to care about the difference). @bgrainger any thoughts on why aggregate timeouts are better?

However, if we do decide to implement this, it's a very good to do it for 5.0, along with all the rest of the timeout changes (otherwise we'll have another significant behavioral change later). So I'll move this issue to 5.0 for now - we can either implement or close it.

vonzshik · 2020-10-03T08:28:41Z

I personally prefer our current implementation, as it's actually more flexible. If you don't have any problems with the network, you won't see any difference. If you do have them, we've also got you here.

roji · 2020-10-03T08:45:55Z

If you don't have any problems with the network, you won't see any difference. If you do have them, we've also got you here.

You're right that if there are network problems, there's no difference.

But if there aren't network problems, I do think there's the question of "trickling" queries, where results come in little by little. Does an average user want to cancel such a query after X seconds, even though data is still coming? We have no data here to answer the question, but my gut tells me the average user would not want to cancel it. And if they do, as @Brar pointed out they can always manage a cancellation token themselves for the duration of the operation.

vonzshik · 2020-10-03T08:52:51Z

But if there aren't network problems, I do think there's the question of "trickling" queries, where results come in little by little.

Isn't that the definition of the issues with the network? AFAIK, this isn't PG's behavior.

You're right that if there are network problems, there's no difference.

I'm talking about the case, when a query takes too long to complete on a PG side. We will still be stuck, waiting for a response from the PG, and eventually fail with a timeout.

roji · 2020-10-03T09:00:00Z

Depending on exactly what the query does, I'm pretty sure it's possible to receive a single row, then wait, then receive another row, then wait, etc. - with perfect network conditions. See Robert Haas's answer here.

I think this is the only relevant scenario for discussion on whether to aggregate or not.

vonzshik · 2020-10-03T09:10:45Z

Depending on exactly what the query does, I'm pretty sure it's possible to receive a single row, then wait, then receive another row, then wait, etc. - with perfect network conditions. See Robert Haas's answer here.

Wouldn't it happen during DataReader.Read?

roji · 2020-10-03T09:14:55Z

Yes, that's a very good point. That probably means that within a single Read (or any other single operation), it's not important whether we aggregate or not. The distinction would have been important had we applied CommandTimeout to the entire command, but we don't do that.

I'd like to wait for @bgrainger's input on this - am interested if things are somehow different MySQL. But otherwise it seems fine to close this.

bgrainger · 2020-10-03T16:27:10Z

CommandTimeout could mean "total time spent reading", or it could mean "time spent reading without receiving any data". My personal feeling/assumption is that most/all users really need the 2nd (see #1041 (comment)), but implementing the second may also be fine (most users are unlikely to care about the difference).

"the 2nd ... the second may also" is confusing; not sure which one you meant.

That probably means that within a single Read (or any other single operation), it's not important whether we aggregate or not. The distinction would have been important had we applied CommandTimeout to the entire command, but we don't do that.

I agree with this.

It's hard for me to speak in broad generalities since there are so many varieties of MySQL Server (MySQL, MariaDB, Amazon RDS, Aurora, etc.; v5.6, v5.7, v8.0) but what I've observed is that MySQL tends to take a long time to compute the first row in a result set (this is time in ExecuteReader) then constantly streams rows back. There may be multiple calls to Socket.Receive to read the packets that make up the resultset header, but during Read, assuming the rows are small enough, I would expect each one to be retrieved with one call to Receive.

Unfortunately I don't have any hard data, but I suspect that it generally doesn't matter whether timeouts are aggregated for a single call to ExecuteReader or Read. Perhaps there are pathological cases where Socket.Receive returns just one byte every time, and it's necessary to use aggregated timeouts to prevent extremely long operations. Maybe the SqlClient team chose that approach out of an abundance of caution?

bgrainger · 2020-10-03T16:34:13Z

MySqlConnector does implement the aggregate timeout approach but IIRC it's more out of worry that that might be necessary in extreme situations, rather than knowing of particular cases where it's necessary for correctness.

roji · 2020-10-03T16:55:36Z

@bgrainger thanks for your input.

what I've observed is that MySQL tends to take a long time to compute the first row in a result set (this is time in ExecuteReader) then constantly streams rows back.

On the PG side, we do know that rows may have long delays in between them (#1041 (comment)). However, as @vonzshik correctly pointed out, aggregated timeouts only matter if we think that are going to be delays within data for a single row, which really does seem very unlikely.

I'm going to close this issue as not needed, until we get some evidence that says otherwise.

roji self-assigned this Apr 22, 2016

roji added the enhancement label Apr 22, 2016

roji added this to the 3.1 milestone Apr 22, 2016

roji modified the milestones: 3.2, 3.1 May 5, 2016

roji modified the milestone: 3.2 Sep 24, 2016

roji mentioned this issue Oct 30, 2016

Connection is closed after command timeout #1344

Closed

roji added this to the Backlog milestone Jun 8, 2018

roji mentioned this issue Nov 25, 2018

Command timeout is falsely identified as "transient" error by NpgsqlException.IsTransient. #2239

Closed

roji changed the title ~~Make frontend timeouts aggregated~~ Make timeouts aggregated Oct 1, 2020

roji mentioned this issue Oct 1, 2020

Document the new timeout and cancellation behavior npgsql/doc#47

Open

roji modified the milestones: Backlog, 5.0.0 Oct 3, 2020

roji closed this as completed Oct 3, 2020

roji removed this from the 5.0.0 milestone Oct 3, 2020

roji removed the enhancement label Oct 3, 2020

roji removed their assignment Oct 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make timeouts aggregated #1041

Make timeouts aggregated #1041

roji commented Apr 22, 2016

roji commented Apr 28, 2016

realityexists commented Oct 31, 2016

roji commented Oct 31, 2016

realityexists commented Oct 31, 2016

baal2000 commented Nov 25, 2018

roji commented Nov 25, 2018

roji commented Oct 1, 2020

roji commented Oct 1, 2020

roji commented Oct 1, 2020

vonzshik commented Oct 1, 2020

roji commented Oct 1, 2020

vonzshik commented Oct 1, 2020

roji commented Oct 1, 2020

Brar commented Oct 1, 2020

vonzshik commented Oct 1, 2020

vonzshik commented Oct 1, 2020 •

edited

Loading

Brar commented Oct 1, 2020

vonzshik commented Oct 1, 2020

Brar commented Oct 1, 2020 •

edited

Loading

Brar commented Oct 1, 2020

bgrainger commented Oct 1, 2020

roji commented Oct 3, 2020

vonzshik commented Oct 3, 2020

roji commented Oct 3, 2020

vonzshik commented Oct 3, 2020

roji commented Oct 3, 2020

vonzshik commented Oct 3, 2020

roji commented Oct 3, 2020

bgrainger commented Oct 3, 2020

bgrainger commented Oct 3, 2020

roji commented Oct 3, 2020

Make timeouts aggregated #1041

Make timeouts aggregated #1041

Comments

roji commented Apr 22, 2016

roji commented Apr 28, 2016

realityexists commented Oct 31, 2016

roji commented Oct 31, 2016

realityexists commented Oct 31, 2016

baal2000 commented Nov 25, 2018

roji commented Nov 25, 2018

roji commented Oct 1, 2020

roji commented Oct 1, 2020

roji commented Oct 1, 2020

vonzshik commented Oct 1, 2020

roji commented Oct 1, 2020

vonzshik commented Oct 1, 2020

roji commented Oct 1, 2020

Brar commented Oct 1, 2020

vonzshik commented Oct 1, 2020

vonzshik commented Oct 1, 2020 • edited Loading

Brar commented Oct 1, 2020

vonzshik commented Oct 1, 2020

Brar commented Oct 1, 2020 • edited Loading

Brar commented Oct 1, 2020

bgrainger commented Oct 1, 2020

roji commented Oct 3, 2020

vonzshik commented Oct 3, 2020

roji commented Oct 3, 2020

vonzshik commented Oct 3, 2020

roji commented Oct 3, 2020

vonzshik commented Oct 3, 2020

roji commented Oct 3, 2020

bgrainger commented Oct 3, 2020

bgrainger commented Oct 3, 2020

roji commented Oct 3, 2020

vonzshik commented Oct 1, 2020 •

edited

Loading

Brar commented Oct 1, 2020 •

edited

Loading