Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize implementation of Postgresql async notifications #270

Closed
franciscojunior opened this issue Jul 9, 2014 · 36 comments
Closed

Optimize implementation of Postgresql async notifications #270

franciscojunior opened this issue Jul 9, 2014 · 36 comments
Assignees
Milestone

Comments

@franciscojunior
Copy link
Member

We were discussing async support in Npgsql in #121 (comment) and the issue of the real time notifications appeared.

@roji asked me to create another issue so we can discuss how we should handle it.

Other places where attempts to fix this problem appeared (for reference): #61 , #46 , #107

@Emill
Copy link
Contributor

Emill commented Jul 16, 2014

My idea:

Use BeginRead with a zero length buffer as argument on the socket together with a callback. As @roji supposed in his async "proof of concept" code, that will wait until more data is available and then run the callback. In the callback, the notification message can be read and posted to the user's event handler. This event is preferably run on a new worker task in the thread pool, so the async callback can return immediately. Then a new BeginRead must be started again, to receive new notifications.

When the user runs a query, we don't want to listen anymore for notifications, so abort the pending read operation using a CancellationToken. Some locking (or queuing) must also be added since the callback may at the moment be reading a notification message and this must be completed before we can continue. The case when the callback has just started but the reading of the notification message has not begun yet, and at the same time the user starts a query, must also be handled correctly. Since a notification message can be received any time, it must still be handled correctly while a query runs, while processing backend responses, as it does today.

When the user's query has run and completed, a BeginRead must be started again, to receive new notifications. Using the "using" pattern with IDisposable, that can take care of stopping/restarting notifications, just like today. This works for the sync methods, but when async is later implemented, operations should preferably be queued if there is an ongoing reading of notification messages. Those operations should be run when the reading of the notification messages has completed, i.e. BeginGetReader or whatever should not block because we wait until the processing of notification messages has completed before returning.

That way we won't allocate a thread that mostly does nothing.
I think running the event on a separate worker thread is good because then Npgsql can continue its work whatever the user does in the event.

@roji
Copy link
Member

roji commented Jul 17, 2014

Thanks for all the really intelligent thoughts @Emill, it's really great to have you helping out on npgsql. Here are some comments on the above.

I'd agree with your proposal, except that as far as I know, cancelling an async operation isn't possible before .NET 4. CancellationToken is something that's introduced along with Task, and the old Begin/End methods don't support anything resembling cancellation.

However, that may not be a big problem. If we use a zero-length BeginRead, then it's no big deal if it continues even after the user starts a query. When the async notification callback is invoked, we can simply test the state of the connector. If it is currently handling a query, we do nothing, otherwise we read and fire the event. When the query completes we relaunch the zero-length BeginRead.

One thing though... The zero-length read is something I haven't found serious documentation for, and having Begin/End reads like this interspersed with query code makes me uneasy. It may be safer to simply implement a single, global async notification thread which uses Select() to wait on all connections, and checks the state as described above to avoid interfering with queries. That would be a 100% safe way to implement this.

@Emill
Copy link
Contributor

Emill commented Jul 17, 2014

However, that may not be a big problem. If we use a zero-length BeginRead, then it's no big deal if it continues even after the user starts a query. When the async notification callback is invoked, we can simply test the state of the connector. If it is currently handling a query, we do nothing, otherwise we read and fire the event. When the query completes we relaunch the zero-length BeginRead.

Yes, that is a good idea.

I found http://int64.org/2009/05/15/tips-for-efficient-io/

An undocumented feature of WSARecv is that you can request a 0-byte receive, which will complete when data has arrived. Then you can issue another WSARecv or use non-blocking I/O to pull out whatever is available. This lets us get notified when data can be received without actually locking memory.

BeginRead internally uses WSARecv. I don't know his sources though.

I also found http://stackoverflow.com/questions/4988168/wsarecv-and-wsabuf-questions which cites http://www.microsoft.com/mspress/books/sampchap/5726a.aspx#124 that is as a source from Microsoft.

You post a zero byte read on all the connections and then post a real read when the zero byte read completes and you know that you have data available.

Regarding

It may be safer to simply implement a single, global async notification thread which uses Select() to wait on all connections, and checks the state as described above to avoid interfering with queries. That would be a 100% safe way to implement this.

That will work for a small number of connections, but if there are 1000 connections, Select() must traverse all those 1000 connections, which might degrade performance. I don't know how serious this is though. That means when something happens on one connection, the Select must be restarted with all 1000 connections. As it is now, or with zero-read-async, only one listener must be set up.

@roji
Copy link
Member

roji commented Jul 17, 2014

@Emill, good points. I think Select() itself doesn't traverse all the connections (i.e. the actual implementation in the kernel is efficient), but it's true we'd have to manage the lists and traverse them in .NET.

I think it's a matter of the what kind of load we expect to manage. If we don't expect async notifications all the time on a large number of connections (and I think that's the case), ReadAsync may be the more efficient choice. On the other hand, if not a lot of async notifications actually happen, then even the Select() overhead doesn't matter so much anymore... Plus there's the "unknown" matter of how safe it is to use the undocumented 0-length reads.

I'd say we hold off with the implementation of this feature until the async work in #121 is complete, since the two have some interaction. Then we can make a decision between 0-length ReadAsync or Select()... Sounds good?

@Emill
Copy link
Contributor

Emill commented Jul 17, 2014

@Emill, good points. I think Select() itself doesn't traverse all the connections (i.e. the actual implementation in the kernel is efficient), but it's true we'd have to manage the lists and traverse them in .NET.

Both mono and .NET traverses all connections.

I think it's a matter of the what kind of load we expect to manage. If we don't expect async notifications all the time on a large number of connections (and I think that's the case), ReadAsync may be the more efficient choice. On the other hand, if not a lot of async notifications actually happen, then even the Select() overhead doesn't matter so much anymore... Plus there's the "unknown" matter of how safe it is to use the undocumented 0-length reads.

Yes. The Select will return when there is data on any connection, not only async notifications. Also, it must be stopped and restarted during a query.

I'd say we hold off with the implementation of this feature until the async work in #121 is complete, since the two have some interaction. Then we can make a decision between 0-length ReadAsync or Select()... Sounds good?

Ok. Just remember to make the async code ready to whatever choice is made later ;)

@roji roji added refactor and removed research labels Aug 31, 2014
@roji roji added this to the 3.0 milestone Aug 31, 2014
@roji roji self-assigned this Aug 31, 2014
@roji roji changed the title How to implement Npgsql realtime notifications aka sync notifications Optimize implementation of Postgresql async notifications Aug 31, 2014
@roji roji mentioned this issue Oct 7, 2014
@janslavsky
Copy link

Hello,

I am using also mechanism of NOTIFY/LISTEN events from PostgreSQL DB. I facing this problem for more then 1,5 year. Is there any solution which I can test on my database? If there are multiple events fired sometimes takes a long time to receive them on client side.

Thanks for help
Jan

@roji
Copy link
Member

roji commented Oct 10, 2014

@janslavsky, there is no solution yet, but a lot of work that has gone into implementing async should help here. I'll probably be working on the notification optimization sometime soon, we definitely won't release 3.0 without it.

But can you please describe exactly what problems you're encountering with the current implementation?

@janslavsky
Copy link

Hello roji,

thanks for information. Here is description of my use-case. I am using trigger after update / insert on my table to fire notify events to subscribed clients where I want to handle that some IDs were inserted or updated. Clients are C# based application using npgsql version 2.1.1.0. I have a problem when I do a faster update of multiple entries in database - then events delivered to clients are delayed. Looks like npgsql is collecting them for some "unknown" time period and then sends them. Can I configure it somehow to fire events to clients one by one immediately? Basically I use this mechanism to avoid timers and in loops checking changes in some code tables. So I want to query DB just in time when it is really necessary - something was changed or inserted - then I am caching just modified or newly inserted data.

Thanks for your support.

@roji
Copy link
Member

roji commented Oct 13, 2014

@janslavsky, thanks for the info. I'll be looking at the notification implementation soon, I'll keep your scenario in mind. I don't think there should be any reason for Npgsql to "buffer" events rather than firing them as they come on.

One question though: are your "subscribers" simply sitting around and doing nothing on the database? Or are they actively executing queries on the database during the time they're supposed to receive these events? If you're doing database operations there's a strong likelihood the "buffering" effect comes from there - try to see if the same happens with a database-idle case.

@franciscojunior
Copy link
Member Author

This delay is caused exactly because the notification thread isn't being able to process many notifications sent at the same time, because of the buffering of the bufferednetworkstream and how the notification thread checks if there are notifications to be handled.

Then, the notifications will be sent only when the user sends any query to the server or when a new notification is sent.

@janslavsky , please, try this patch in a 2.2.1 source code base and see if it works for you:
franciscojunior@9a966e3?diff=unified

With this patch you should get all the notifications delivered correctly.

I hope it helps.

@janslavsky
Copy link

@franciscojunior Thanks for patch, I am going to test it.
@roji I have 2 separate services. One service it once a month doing some changes on database. Second service holds data from databse in memory cache and provides them to clients - so this service is not going to database if it is not necessary. But when first service inserts, updates some data notify event is fired to second service which read this whole row entity and add / update / delete from / to memory cache. I basically use this scenario when I do not want to have another dependency btw services. Only depencecy btw them is database. I also do not want to query db when it is not really necessary.

@ALL: I like your work very much. Npgsql saved me a lot of coding hours 👍

@roji
Copy link
Member

roji commented Oct 15, 2014

OK, so your event consumer doesn't is in general database-idle. This really shouldn't be a problem, I'll work on async notifications very soon and post here.

@janslavsky
Copy link

@franciscojunior I tested your patch, but this patch works only if SyncNotification is set to true. I cannot have a SyncNotification set to true because it is blocking other operations via npgsql. Where is the code which is firering events for async notifications? Bacause you patch is active only if notification thread is added in NpgsqlConnection.cs -> Open ()
if (SyncNotification)
{
connector.AddNotificationThread();
}
But what is firing notify enevts from databse to subscribers in case that SyncNotification is set to false?
Thanks for information

@roji
Copy link
Member

roji commented Oct 17, 2014

@janslavsky, if SyncNotification is set to false, then notifications are only received and emitted as part of query processing. In other words, if you have a reader and read some query's result, and in there is a notification message, it will get fired.

Note that there's a good chance that work on this issue (the optimization) will change some of this behavior.

@janslavsky
Copy link

@roji thanks for clarification. Then I continue doing fake query every 5 seconds to minimize this behaviour. My opinion is that events should be delivered to clients always when they are fired on DB without any delay - thet is basic idea of the events.

@franciscojunior
Copy link
Member Author

@franciscojunior I tested your patch, but this patch works only if SyncNotification is set to true. I cannot have a SyncNotification set to true because it is blocking other operations via npgsql. Where is the code which is firering events for async notifications? Bacause you patch is active only if notification thread is added in NpgsqlConnection.cs -> Open ()
if (SyncNotification)
{
connector.AddNotificationThread();
}
But what is firing notify enevts from databse to subscribers in case that SyncNotification is set to false?
Thanks for information

The code which fires the notification is the following:
https://github.com/npgsql/npgsql/blob/master/Npgsql/NpgsqlConnector.cs#L973

As @roji said, if you aren't using the SyncNotification, your notifications will be sent only when you get some data from the server.

@roji thanks for clarification. Then I continue doing fake query every 5 seconds to minimize this behaviour. My opinion is that events should be delivered to clients always when they are fired on DB without any delay - thet is basic idea of the events.

That's what SyncNotification is for. If you specify it, Npgsql will send you the notifications as they are generated by the server. The syncnotification thread will only run when your client isn't sending any query to the server and so it won't block you for too long. The only time you will be blocked is if you try to run a query while the notification is being run, but this would be for a very short time (assuming, of course that you don't spend too much time in your notification callback).

So, I think using SyncNotification is worthwhile in your case. Together with the patch I sent you, you will get all your notifications as they are created.

I hope it helps.

roji added a commit that referenced this issue Oct 23, 2014
Which is currently incompatible with async reader
execution.

Relates to #270
Relates to #386
@Emill
Copy link
Contributor

Emill commented Nov 6, 2014

I just read the first post I did in this thread and I have updated my idea.
First, it seems like there is no option to cancel an async read operation, except for closing the whole socket.
We must support the fact that both sync and async code could interfere with the notifications.
Today we use Monitor.Enter (which is what the compiler inserts for lock (...) { ... }) when the notification thread should be blocked. But that doesn't work for async since the exit of the lock could occur in a different thread, but Monitor requires the same thread to enter and exit. Also, Monitor.Enter blocks which we don't want with async.

An alternative for this is to use the SemaphoreSlim class (http://msdn.microsoft.com/en-us/library/system.threading.semaphoreslim(v=vs.110).aspx) which seems to be the preferred way to use locks in async code. I think we can't get away from locking since the thread that reads and processes a notification must complete before we can start a new query.

What about something like this:

When the connector opens:
Create a semaphore with initial count 1.
Call ReadAsync with buffer length 0 (we just want to be notified when there is data available) and set a continuation function handler (see below).

When we want to do some work (for example run a query):
Acquire the semaphore.
Do all the work. If we encounter a notification here, take care of it and raise an event to the user.
When we are done, finally, release the semaphore.

Continuation function handler:
If there was an error, the socket has been closed, so just return.
Acquire the semaphore.
While there is data available to read from the NpgsqlBufferStream (or the underlying stream, check with .DataAvailable): the messages that are read should be notification messages, so raise an event to the user for each such message. Note that we should use the already allocated NpgsqlBufferStream here, since no other thread uses it.
Release the semaphore.
Call ReadAsync with length 0 again on the buffer and the same continuation function.

I think this will avoid all race conditions.

If the user hasn't enabled SyncNotifications, we don't have to allocate a semaphore or start async read operation.

If the user has enabled SyncNotifications, this approach will not use an extra active thread for the notifications. SemaphoreSlim also supports async waiting, so no thread will be active while the code is waiting to acquire the semaphore. This also avoids the problem that Socket.Select traverses the whole list of sockets (which is an O(n^2) operation if there are n sockets and we have one message to receive from each socket).

Note that calling AsyncRead with size 0 on the NpgsqlBufferStream, that should not complete until there is at least one byte that can be read immediately from the stream, i.e. either return immediately if there is at least one byte in the buffer or call ReadAsync with size 0 on the underlying stream. Or just create another separate method to achieve the same functionality, like WaitForDataAsync().

We can also when we want to trigger an event run that on a WorkerThread, so it won't block the rest of the logic in Npgsql.

Comments?

@roji
Copy link
Member

roji commented Nov 7, 2014

First, it seems like there is no option to cancel an async read operation, except for closing the whole socket.

Actually, since we moved to .NET 4.0 we can now use the cancellation token to cancel the async read, as you originally suggested. This may not be so important though (see below).

We must support the fact that both sync and async code could interfere with the notifications.
Today we use Monitor.Enter (which is what the compiler inserts for lock (...) { ... }) when the notification thread should be blocked. But that doesn't work for async since the exit of the lock could occur in a different thread, but Monitor requires the same thread to enter and exit. Also, Monitor.Enter blocks which we don't want with async.

An alternative for this is to use the SemaphoreSlim class (http://msdn.microsoft.com/en-us/library/system.threading.semaphoreslim(v=vs.110).aspx) which seems to be the preferred way to use locks in async code. I think we can't get away from locking since the thread that reads and processes a notification must complete before we can start a new query.

You're right, I was thinking along those lines too. Your proposed flow also looks good to me.

When we start doing a new query, it may still be a good idea to cancel the async notification read with a cancellation token - to be sure that that read is gone - and only then go on to write the query. I'm not sure what happens if there are two reads pending on the same socket, but it seems like a risk situation - what happens if both of them are release... We can study this and see how everything behaves (this kind of thing could be OS-dependent), but it's probably safer and cleaner to cancel the async read before actually sending any actual query data.

Note that calling AsyncRead with size 0 on the NpgsqlBufferStream, that should not complete until there is at least one byte that can be read immediately from the stream, i.e. either return immediately if there is at least one byte in the buffer or call ReadAsync with size 0 on the underlying stream. Or just create another separate method to achieve the same functionality, like WaitForDataAsync().

With the NpgsqlBufferedStream I'm currently working on, there will of course have to be an EnsureAsync(). The nice thing about ensure is that it blocks us (synchronously or asynchronously) without actually consuming the I/O. So to wait for async notifications we should be able to just do EnsureAsync(5) as usual (to read the message type and length).

We can also when we want to trigger an event run that on a WorkerThread, so it won't block the rest of the logic in Npgsql.

Makes sense to me.

One last comment: I'd like to rename the SyncNotifications connstring param to AsyncNotifications - after all that's what Postgresql calls them... We can leave SyncNotifications as a deprecated synonym as well.

@Emill
Copy link
Contributor

Emill commented Nov 7, 2014

See
http://referencesource.microsoft.com/#mscorlib/system/io/stream.cs,e224b4bec8748849,references.
Cancellation tokens do not work.

I tested to have two pending async read operations. It seemed to work...

Ensure(5) won't work since then that might run at the same time as another read in the main thread, which will screw up the destination buffer. I think we have to do a read of size 0. That will never mess up any buffer if there are two pending reads.

@martinsramko
Copy link

hi guys,
I was playing around with SyncNotification patch from @franciscojunior.

If I do open separate connection to the server and listen for notification it works till the point where multiple notification gets triggered. Some of the notify events got 'stuck' on the server and were not delivered until I've performed a dummy command on the same connection. However, that dummy request sometime triggers "An unhandled exception of type 'System.NotSupportedException'" at System.IO.BufferedStream.WriteByte(Byte value)

Therefore, instead of performing a dummy command, I do open a separate connection to the server and perform 'notify , 'ping'. That command flushed 'stuck' notification from server back to the client.

Only thing left was, to detect socket closed from server. Due to fact that I was not performing any operation on the listening connection, I never got notification regarding exception occurred.

I've added a new event 'NotificationPollError' to NpgsqlConnector into NpgsqlContextHolder.ProcessServerMessages()
as
....
catch (IOException ex)
{
this.connector._notificationException = ex;
this.connector.FireNotificationPollError(ex);
}
Same event was added to NpgsqlConnection and it's bound to connector event.

So,

  • Do you see any drawback of this approach ?
  • What about the connection pool, regarding this sitting connection ?

@roji
Copy link
Member

roji commented Nov 28, 2014

@martinsramko, we're in the process of a pretty major overhaul for Npgsql 3.0, which will have to include a rewrite of how notifications get handled - the plan is to eliminate the extra thread needed for each connection that wants to do notifications, as is currently the case, and also permit .NET 4.5 async support (with which the current locking approach is incompatible).

You can follow the discussion on implementation above. The modification you're proposing continues with the current 2.2 implementation, so it's a bit less relevant...

@akontsevich
Copy link

Thanks developers for their effort. However when you plan to issue Npgsql 3.0? We continue to experience problems when running many queries simultaneously in heavy load environment for about 1.5 year. And problem is not only in notifications: #252 (comment)

P.S. Other question: are there some temporal workaround available like connection string settings, poling settings, etc?

@roji
Copy link
Member

roji commented Dec 5, 2014

@akontsevich, I feel your pain. We still have a lot of work for 3.0 so it unfortunately won't be ready any time soon, and it's difficult to go hunting for a synchronization bug in a version that is no longer relevant... If you could somehow narrow the bug down and provide a somewhat reliable repro it would make our life much easier.

@akontsevich
Copy link

@roji what do You mean not relevant? Now we use latest Npgsql version - 2.2.3. I mentioned the bug here: #252 (comment). If I enable debug for Npgsql with line numbers, will help You to discover the problem?

@roji
Copy link
Member

roji commented Dec 5, 2014

I'm sorry, I expressed myself badly... I mean that a fix for this problem on 2.2.3 will no longer be relevant for 3.0, because that code is getting replaced... 2.2.3 is definitely the relevant version.

Debug/line numbers won't help much, because I can tell from the stack where the exception is thrown. The difficult thing is to understand how the problem gets triggered - what is the exact usage scenario.

Note that from the stack you sent it seems like a query is being sent while incoming data still hasn't been read. This could be an internal Npgsql issue, but it may also be a usage issue; can you check that you never access the same connection from two threads at the same time, in any way?

@akontsevich
Copy link

@roji Quite possible. It is about our web service. We call it very frequently and ServiceStack seems allocate separate thread for each call. However service object uses same data access object which contains same connection as well. If this is usage issue, how can I solve this in this case? Create/open new connection for every service call?

@franciscojunior
Copy link
Member Author

If I do open separate connection to the server and listen for notification it works till the point where multiple notification gets triggered. Some of the notify events got 'stuck' on the server and were not delivered until I've performed a dummy command on the same connection. However, that dummy request sometime triggers "An unhandled exception of type 'System.NotSupportedException'" at System.IO.BufferedStream.WriteByte(Byte value)

That means Npgsql read the notification from the wire but for some reason couldn't deliver it to the client. So the notification is inside a buffer used by the networkstream. Or worst, my patch wasn't able to cover all cases to replace the bufferedstream. If you are using my patch and you are still getting this problem, this means my patch didn't work for all situations :(

Only thing left was, to detect socket closed from server. Due to fact that I was not performing any operation on the listening connection, I never got notification regarding exception occurred.

Hmmmmm, nice catch. Indeed the notification thread was never supposed to deliver exceptions as it would only handle notifications. I think this is a bug in Npgsql or maybe a design oversight.

I've added a new event 'NotificationPollError' to NpgsqlConnector into NpgsqlContextHolder.ProcessServerMessages()
as
....
catch (IOException ex)
{
this.connector._notificationException = ex;
this.connector.FireNotificationPollError(ex);

}
Same event was added to NpgsqlConnection and it's bound to connector event.

I'd need to check this approach, but I think it would provide a way for the user to be notified of problems which occur in the notification thread. The problem is that you could end up receiving two exceptions: one in the notification thread as an event and another in the main thread as a thrown exception.

What about the connection pool, regarding this sitting connection ?

The connection pool will see this spare connection you are using just like any other connection.

@roji
Copy link
Member

roji commented Dec 5, 2014

@akontsevich, I don't know about your specific web application, but in general every thread allocated by the web framework gets its own connection for the lifetime of the request. At the end of the request the connection is closed, which normally means returned to the connection pool. The web framework (e.g. ServiceStack) should never create a situation where several concurrent request threads are somehow allocated the same database connection - that would mean it's seriously broken.

Try to see if your application code itself handles connections or threads in an explicit way, e.g. keeping a reference to a connection that could be reused by another request in some way. If you can provide any sort of a reproducible error, we can definitely work from there...

@akontsevich
Copy link

@roji the problem is in my logging - it uses separate connection from a service: if I have single connection in it I get above error, if I open a connection for every log record (which is not good) then I trap into another Npgsql error - try to execute a command on closed connection (seems 2nd log query tries to execute a command while previous closing it). Adding lock to my data access object does not help: seems Npgsql requires some time to actually open/close a connection.

@roji
Copy link
Member

roji commented Dec 5, 2014

@akontsevich, that seems to make sense, I'm pretty sure it's a usage issue on your side and not an Npgsql bug.

First, if you don't open a connection for every log record, I think that means you're using the same database connection from two concurrent threads. The moment two log records happen to be emitted at the same time - boom.

You should definitely open a connection for every log record, and have connection pooling on (the default) - this means you don't really connect. I don't know about your particular log setup, but you should definitely concentrate on the second error - executing a command on a closed connection. Make sure connection pooling isn't turned off in your connection string (the Pooling parameter should not be set to false).

@drusellers
Copy link

Looking forward to this support. :)

@roji
Copy link
Member

roji commented Jan 21, 2015

Note @Emill's comment in #453 about DataAvailable not working reliably with Microsoft SslStream

@roji
Copy link
Member

roji commented Mar 29, 2015

@Emill, don't forget we still need to find a solution for SemaphoreSlim which is only available in .NET 4.5

@roji roji modified the milestones: 3.0, 3.0-beta1 Mar 30, 2015
@roji
Copy link
Member

roji commented May 24, 2015

Closing this as all the work has been done and we're probably going to support .NET 4.5 only

@roji roji closed this as completed May 24, 2015
@drusellers
Copy link

DOPE

@roji roji modified the milestones: 3.0-beta1, 3.0 Aug 7, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants