-
Notifications
You must be signed in to change notification settings - Fork 818
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make timeouts aggregated #1041
Comments
Note: check the situation with the connection timeout as well, are we doing it right? |
Should
Compatibility with SqlClient is good, but backwards compatibility with Npgsql 3.1 is better. Combined with the current behaviour being more flexible, I'd say leave it like this. |
If by "reading all results" you mean all result rows of a command, then this isn't what SqlClient does (or what this issue was proposing to do) - SqlClient applies timeouts to a single ADO.NET method invocation (e.g. As you say, if the user wants a client-side timeout for reading all results they can implement it themselves.
I definitely don't view this as a very valid/interesting scenario - one of the key points of timeouts is to avoid network issues which would block your program forever. I understand where you're coming from - you're thinking about long-running (or deadlocked) queries rather than network issues, but the latter must be addressed. One more point - it's not necessarily correct to assume that PostgreSQL takes a long time upfront processing your query, and then spits results back quickly. AFAIK depending on exactly what you're doing it may return the first result very quickly but take long afterwards.
Even if this issue is implemented and we start aggregating I/O time, this would still be possible as aggregation only happens within a single ADO.NET method call (as I said above). So you would be free to play around with CommandTimeout between calls. To summarize, this issue actually doesn't change behavior in a very significant way, since it only aggregates within a single method call. |
Ah, OK, ignore the above comment then - I misunderstood how it works. |
+1 for a standard, provider-agnostic behavior. |
Note that as I wrote in #2239, there's nothing standard about aggregating socket read time - it just happens to be the SqlClient behavior (i.e. not documented on In addition, it isn't very clear whether it really is the desirable behavior for all scenarios. Both behaviors have their pros and cons. |
@vonzshik you may be interested in this. I am really not sure we should actually implement this - there are pros and cons of both the aggregate and non-aggregate timeout behaviors. But as you've been working and thinking about timeouts recently, it would be good to have your thoughts. |
Also /cc @YohDeadfall and @Brar. Since we're redoing the timeout/cancellation story for 5.0, it may be a good idea to at least think about this a bit. |
Note that I'm not completely sold on the general idea of a cumulative timeout. The two main uses I see for the timeout in general are to cancel a never-ending query (deadlocked, or simply taking too long because of perf issues), and to handle network partitions; in both those cases, no data arrives at all. As long as data does keep arriving for a query, I'm not sure users would want us to cancel that query just because the total time spent reading exceeded something. |
That is the only case I see right now, when our current implementation doesn't work very well. The data is trickling down very-very slowly, but not slow enough for the timeout. If we have a 30 seconds timeout and 5 TCP reads per |
@vonzshik yes, that's the argument in favor of applying CommandTimeout cumulatively (to all TCP reads in Read). But I'm still not convinced users really want to cancel a "trickling" query - we've never really received any feedback on this mechanism; the main use of CommandTimeout is definitely for cases where no data comes at all. |
As long as that corner case is documented, I see no reason to make timeouts aggregated. |
I've added a note in npgsql/doc#47 re documentation. Let's see if @YohDeadfall and/or @Brar have other thoughts on aggregation, otherwise I think it's fine to leave this open in the backlog for users to comment on it. |
If there is some sort of standard or guidance on the ADO.NET level I'd generally vote to adhere to it but I'm not convinced that everything
Well, they could be. Who knows? As I understand it, they'd currently break the connection in case of a cancellation as we currently don't support graceful (server side) cancellation on |
Right now, it could be emulated with |
But |
Are you sure? |
For a query |
Ah, yes. |
Anyways, as our consumers shouldn't make any assumptions when |
FWIW, MySqlConnector tries to implement the same behaviour (aggregate timeouts) as SqlClient. MySQL sends each row in a result set as a single "MySQL packet" (may be bundled up into a single TCP packet). So unless the rows are huge, the data is usually already there, or retrieved with a single Receive from the Socket. If the rows are very large, multiple socket reads will be required, and the timeout applies to the total of all those operations, until the full row is buffered in memory. (MySqlConnector doesn't support "streaming" with I think this makes the most sense from a user perspective: if |
Re ReadAsync, note that we check the cancellation token up-front regardless of if we already have a buffered row or not, so @Brar's note above is correct - users can always manage a cancellation token themselves and pass it to all operations. But the main question here is what we think the driver behavior should actually be - I'm honestly not sure. Ignoring the SqlClient behavior for second, CommandTimeout could mean "total time spent reading", or it could mean "time spent reading without receiving any data". My personal feeling/assumption is that most/all users really need the 2nd (see #1041 (comment)), but implementing the second may also be fine (most users are unlikely to care about the difference). @bgrainger any thoughts on why aggregate timeouts are better? However, if we do decide to implement this, it's a very good to do it for 5.0, along with all the rest of the timeout changes (otherwise we'll have another significant behavioral change later). So I'll move this issue to 5.0 for now - we can either implement or close it. |
I personally prefer our current implementation, as it's actually more flexible. If you don't have any problems with the network, you won't see any difference. If you do have them, we've also got you here. |
You're right that if there are network problems, there's no difference. But if there aren't network problems, I do think there's the question of "trickling" queries, where results come in little by little. Does an average user want to cancel such a query after X seconds, even though data is still coming? We have no data here to answer the question, but my gut tells me the average user would not want to cancel it. And if they do, as @Brar pointed out they can always manage a cancellation token themselves for the duration of the operation. |
Isn't that the definition of the issues with the network? AFAIK, this isn't PG's behavior.
I'm talking about the case, when a query takes too long to complete on a PG side. We will still be stuck, waiting for a response from the PG, and eventually fail with a timeout. |
Depending on exactly what the query does, I'm pretty sure it's possible to receive a single row, then wait, then receive another row, then wait, etc. - with perfect network conditions. See Robert Haas's answer here. I think this is the only relevant scenario for discussion on whether to aggregate or not. |
Wouldn't it happen during |
Yes, that's a very good point. That probably means that within a single Read (or any other single operation), it's not important whether we aggregate or not. The distinction would have been important had we applied CommandTimeout to the entire command, but we don't do that. I'd like to wait for @bgrainger's input on this - am interested if things are somehow different MySQL. But otherwise it seems fine to close this. |
"the 2nd ... the second may also" is confusing; not sure which one you meant.
I agree with this. It's hard for me to speak in broad generalities since there are so many varieties of MySQL Server (MySQL, MariaDB, Amazon RDS, Aurora, etc.; v5.6, v5.7, v8.0) but what I've observed is that MySQL tends to take a long time to compute the first row in a result set (this is time in Unfortunately I don't have any hard data, but I suspect that it generally doesn't matter whether timeouts are aggregated for a single call to |
MySqlConnector does implement the aggregate timeout approach but IIRC it's more out of worry that that might be necessary in extreme situations, rather than knowing of particular cases where it's necessary for correctness. |
@bgrainger thanks for your input.
On the PG side, we do know that rows may have long delays in between them (#1041 (comment)). However, as @vonzshik correctly pointed out, aggregated timeouts only matter if we think that are going to be delays within data for a single row, which really does seem very unlikely. I'm going to close this issue as not needed, until we get some evidence that says otherwise. |
Following discussion in #689 (comment), it seems to make sense to make our CommandTimeout behave lie SqlClient's (see docs).
This means that all socket read time is aggregated for a single ADO.NET API call (e.g. Read()). Currently we simply set the socket's ReceiveTimeout to the CommandTimeout, which means that if multiple socket reads happen within the same Read(), the timeout is reset each time. It seems to make sense to aggregate the CommandTimeout across all of Read()'s network reads instead.
Comments from anyone?
The text was updated successfully, but these errors were encountered: