-
Notifications
You must be signed in to change notification settings - Fork 826
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
timeout on lengthy postgis queries #632
Comments
[springmeyer] great patch. planning to take a look at this soon for inclusion in mapnik2 |
[springmeyer] I wonder whether it also might be beneficial to have a timeout control on mapnik.render() which could generically track any layer query no matter the datasource, and return early if the timeout was exceeded. |
@tokumine - I know you had played a bit with this patch. Did it ultimately work for you? Should we apply as is? |
It applied cleanly and worked when I last touched it, but that was a long time ago now. We decided to go with a global timeout for simplicity in the end (though may come back to per query timeouts at some point). I'd probably want to reapply to trunk and test again to be sure. I'm going to the London OSM hackday next saturday, so could make a test harness for it then if it's needed? |
Cool, thanks for the followup. A test harness would be awesome, but anytime. |
@tokumine - fyi: I added a very basic test runner for postgis that can be added to: https://github.com/mapnik/mapnik/blob/master/tests/python_tests/postgis_test.py |
Index: include/mapnik/datasource.hpp
===================================================================
--- include/mapnik/datasource.hpp (revision 2688)
+++ include/mapnik/datasource.hpp (working copy)
@@ -65,6 +65,21 @@
}
};
+class MAPNIK_DECL datasource_query_cancelled_exception : public datasource_exception
+{
+private:
+ std::string message_;
+public:
+ datasource_query_cancelled_exception(const std::string& message=std::string("no reason"))
+ :message_(message) {}
+
+ ~datasource_query_cancelled_exception() throw() {}
+ virtual const char* what() const throw()
+ {
+ return message_.c_str();
+ }
+};
+
class MAPNIK_DECL datasource : private boost::noncopyable
{
public:
Index: plugins/input/postgis/connection.hpp
===================================================================
--- plugins/input/postgis/connection.hpp (revision 2688)
+++ plugins/input/postgis/connection.hpp (working copy)
@@ -41,10 +41,12 @@
PGconn *conn_;
int cursorId;
bool closed_;
+ int statement_timeout_;
public:
Connection(std::string const& connection_str)
:cursorId(0),
- closed_(false)
+ closed_(false),
+ statement_timeout_(0)
{
conn_=PQconnectdb(connection_str.c_str());
if (PQstatus(conn_) != CONNECTION_OK)
@@ -93,7 +95,13 @@
}
s << "\nFull sql was: '" << sql << "'\n";
- }
+ }
+ if (result) {
+ /* Detect whether the error was due to statement_timeout or pg_cancel_backend() */
+ const char *val = PQresultErrorField(result, PG_DIAG_SQLSTATE);
+ if (val && (strcmp(val,"57014") == 0)) /* SQLSTATE ERRCODE_QUERY_CANCELED */
+ throw mapnik::datasource_query_cancelled_exception( s.str() );
+ }
throw mapnik::datasource_exception( s.str() );
}
@@ -129,6 +137,21 @@
return s.str();
}
+ int statement_timeout(int statement_timeout)
+ {
+ /* Sets the statement_timeout on this current connection and returns the previous timeout value */
+ int old_timeout = statement_timeout_;
+ if (statement_timeout_ != statement_timeout)
+ {
+ statement_timeout_ = statement_timeout;
+ std::ostringstream s;
+ s << "SET statement_timeout= " << statement_timeout_ << ";";
+ if (!execute(s.str()))
+ throw mapnik::datasource_exception( "PSQL Error: Setting statement_timeout failed." );
+ }
+ return old_timeout;
+ }
+
~Connection()
{
if (!closed_)
Index: plugins/input/postgis/postgis.cpp
===================================================================
--- plugins/input/postgis/postgis.cpp (revision 2688)
+++ plugins/input/postgis/postgis.cpp (working copy)
@@ -83,6 +83,7 @@
scale_denom_token_("!scale_denominator!"),
persist_connection_(*params_.get<mapnik::boolean>("persist_connection",true)),
extent_from_subquery_(*params_.get<mapnik::boolean>("extent_from_subquery",false)),
+ statement_timeout_(*params_.get<int>("statement_timeout",0)),
// params below are for testing purposes only (will likely be removed at any time)
force2d_(*params_.get<mapnik::boolean>("force_2d",false)),
st_(*params_.get<mapnik::boolean>("st_prefix",false))
@@ -377,6 +378,8 @@
boost::shared_ptr<IResultSet> postgis_datasource::get_resultset(boost::shared_ptr<Connection> const &conn, const std::string &sql) const
{
+ conn->statement_timeout(statement_timeout_);
+
if (cursor_fetch_size_ > 0)
{
// cursor
Index: plugins/input/postgis/postgis.hpp
===================================================================
--- plugins/input/postgis/postgis.hpp (revision 2688)
+++ plugins/input/postgis/postgis.hpp (working copy)
@@ -74,6 +74,7 @@
const std::string scale_denom_token_;
bool persist_connection_;
bool extent_from_subquery_;
+ int statement_timeout_;
// params below are for testing purposes only (will likely be removed at any time)
bool force2d_;
bool st_;
|
subscribe |
To add to the discussion, the "global timeout" approach mentioned by @tokumine has the limit that does NOT affect pgbouncer reconnect attempts, which by default keep queries in a queue until success, so before any backend can enforce the "statement_timeout". For this reason a mapnik-implemented timeout is very needed. Another reason is that mapnik-enforced timeout would be mapnik-specific, despite the global timeout for the user mapnik connects as. I'm going to try the patch next. |
Sorry, after eyereading the patch I see it would not help with my case, as the timeout is still implemented using "SET STATEMENT_TIMEOUT". Instead what I need is something that aborts outside of the database, and fully within the mapnik process. |
So I think the options available are:
The first one should use less resources... |
+1 to trying the async libpq interface. |
I have some work in progress, but the failing testsuite is making it harder to test (#2181) Could anyone check that for me ? |
As the query execution function is in an .hpp file (connection.hpp), I'm afraid any change to the implementation would require recompilation of all exiting clients, in case they inlined the old version. Also, I found there's already an "async" interface in connection.hpp, and is also already used, altought not always (get_resultset uses sync or async based on haveing received a context). |
NOTE: I'm cointnuing my work-in-progress in the 2.2.x branch, because there are more failures coming out with using 2.3.x from cartodb/Windshaft which I don't want to distract the work, for now. Not sure if you want a ticket for those failures (compatibility, most likely) right now. |
I hit a wall with my work with handling of query cancelation. Timing out leaves the connection in a "busy" state which makes next attempts to send a query fail with a "another query currently in progress" message, while cancelling a query on a pgbouncer-hold connection would again block defeating the purpose of this work. I guess it'll take some instance variable to remember the "pending" state, which btw is also available already in the 2.3.x branch... |
Ok I've resorted to close the connection on timeout. It seems to be effective for the pgbouncer case. |
Hello @strk, have you tried PQgetCancel / PQcancel ( http://www.postgresql.org/docs/8.4/static/libpq-cancel.html ) instead of closing the connection ? |
@abonnasseau I've tried those, but as the problem is an "hanging" connection, those commands fail to complete, unless we get back to an async mode and timeout the timeout handler... |
@abonnasseau the test I'm performing involves pgbouncer with its default configuration and shutting down its backend. pgbouncer puts any incoming request "on hold" till next reconnection attempt (15 seconds later). My goal is having mapnik early return with an error rather than hanging forever. |
Here's the version for 2.3.x, but it doesn't enforce the timeout in the explicitly-async interfaces yet: strk@bd61d7e |
BTW, I see that also getAsyncResult is calling close() on error, only after clearAsyncResult, but at that point the former would not be needed at all, right ? |
More advancement in #2255 -- the code still does not enforce the timeout in all code paths, specifically it's not in the getAsyncResult. |
@strk - just checking in here. What are the next steps? |
Merging #2255, I'd think. But I won't have time before the second week of September for acting on your feedback there. |
Rendering requests generated by the 'export' tab on the OSM web site sometimes cause Mapnik to issue queries which run for over 10 minutes. When this happens I'd like to be able to automatically cancel these and return an error.
The patch attached is my first attempt at implementing this in mapnik2. It allows a timeout parameter to be added to the postgis datasource parameters:
1000
The timeout is specified in miliseconds which matches the underlying Postgres "SET statement_timeout=..." request, ie. 1000 is a one second timeout.
If the timeout triggers then the code throws "mapnik::datasource_query_cancelled_exception".
The patch seems to work OK but may need a bit more work. Ideally I'd like some way to set the timeout after an XML style has been loaded so I can use the same XML file for the main rendering daemon and the export tab, but only enable the timeout for the export rendering.
The text was updated successfully, but these errors were encountered: