-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clustering spends 1/3 of its time waiting for DB to check for cancellation #369
Comments
The links you provided give many uncool suggestions along with good ones. Yes, it's reasonable to solve this specific issue (polling a boolean field) I don't know how far we should generalize, though. Async only works if we don't care about state--but we usually do. On the Also: timings on dev and prod can vary tons. Enjoy life,
|
Yep, It's Complicated. I don't mean to suggest that we use adbcj, just that it shows the idioms around this. Re server, there is certainly no point in having more concurrent threads than there are DB connections in the connection pool. Any request past that should simply block. But it's easy to imagine that e.g. the two threads in the server (I believe this is how many we have now?) are both waiting on writes and therefore the server cannot respond to a read which may be on an entirely different table, e.g. client wants to update progress. Or the client might want to view a route which doesn't even hit the DB, such as the main page. In other words, if implemented well, this would be roughly equivalent to running more server processes. I don't see how the DB would be able to handle an additional server process yet not able to handle a server with more threads. |
I suggest we adopt a messaging system and let job cancellations be one of the messages. Then we won't have to worry about polling the db. |
Indeed Jonas, that is the best answer. |
We have other issues on Pivotal that target this problem. And the polling doesn't eat up 1/3 of the time any more. |
Timings on my dev machine, importing drones set:
Current version:
18:02:13.567 INFO WORKER - Clustered documents, time: 31.445 seconds
18:03:13.333 INFO WORKER - Clustered documents, time: 28.492 seconds
18:06:19.625 INFO WORKER - Clustered documents, time: 28.352 seconds
18:08:20.393 INFO WORKER - Clustered documents, time: 30.624 seconds
With cancellation check (j.state == CANCELLED) commented out:
18:13:46.790 INFO WORKER - Clustered documents, time: 25.431 seconds
18:14:44.100 INFO WORKER - Clustered documents, time: 22.024 seconds
18:15:52.829 INFO WORKER - Clustered documents, time: 21.495 seconds
18:18:19.440 INFO WORKER - Clustered documents, time: 21.642 seconds
To confirm, I'm also seeing this in VisualVM where we spend as much time in org.postgresql.core.VisibleBufferedInputStream.readMore() as we do in org.overviewproject.clustering.KMeansDocumentOps.distance()
We should be reading the cancel state on another thread while the clustering is happening. Could be done using actors to wrap DB requests, but not really necessary. See e.g. http://stackoverflow.com/questions/4087696/is-asynchronous-jdbc-call-possible and http://www.gamlor.info/wordpress/2012/05/async-sql-and-akka/
The text was updated successfully, but these errors were encountered: