Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

timeout on lengthy postgis queries #632

Open
artemp opened this issue Oct 11, 2011 · 25 comments
Open

timeout on lengthy postgis queries #632

artemp opened this issue Oct 11, 2011 · 25 comments

Comments

@artemp
Copy link
Member

artemp commented Oct 11, 2011

Rendering requests generated by the 'export' tab on the OSM web site sometimes cause Mapnik to issue queries which run for over 10 minutes. When this happens I'd like to be able to automatically cancel these and return an error.

The patch attached is my first attempt at implementing this in mapnik2. It allows a timeout parameter to be added to the postgis datasource parameters:

1000

The timeout is specified in miliseconds which matches the underlying Postgres "SET statement_timeout=..." request, ie. 1000 is a one second timeout.

If the timeout triggers then the code throws "mapnik::datasource_query_cancelled_exception".

The patch seems to work OK but may need a bit more work. Ideally I'd like some way to set the timeout after an XML style has been loaded so I can use the same XML file for the main rendering daemon and the export tab, but only enable the timeout for the export rendering.

@artemp
Copy link
Member Author

artemp commented Oct 11, 2011

[springmeyer] great patch. planning to take a look at this soon for inclusion in mapnik2

@artemp
Copy link
Member Author

artemp commented Oct 11, 2011

[springmeyer] I wonder whether it also might be beneficial to have a timeout control on mapnik.render() which could generically track any layer query no matter the datasource, and return early if the timeout was exceeded.

@springmeyer
Copy link
Member

@tokumine - I know you had played a bit with this patch. Did it ultimately work for you? Should we apply as is?

@tokumine
Copy link

It applied cleanly and worked when I last touched it, but that was a long time ago now. We decided to go with a global timeout for simplicity in the end (though may come back to per query timeouts at some point). I'd probably want to reapply to trunk and test again to be sure.

I'm going to the London OSM hackday next saturday, so could make a test harness for it then if it's needed?

@springmeyer
Copy link
Member

Cool, thanks for the followup. A test harness would be awesome, but anytime.

@springmeyer
Copy link
Member

@tokumine - fyi: I added a very basic test runner for postgis that can be added to: https://github.com/mapnik/mapnik/blob/master/tests/python_tests/postgis_test.py

@springmeyer
Copy link
Member

Index: include/mapnik/datasource.hpp
===================================================================
--- include/mapnik/datasource.hpp   (revision 2688)
+++ include/mapnik/datasource.hpp   (working copy)
@@ -65,6 +65,21 @@
     }
 };

+class MAPNIK_DECL datasource_query_cancelled_exception : public datasource_exception
+{
+private:
+    std::string message_;
+public:
+    datasource_query_cancelled_exception(const std::string& message=std::string("no reason"))
+        :message_(message) {}
+
+    ~datasource_query_cancelled_exception() throw() {}
+    virtual const char* what() const throw()
+    {
+        return message_.c_str();
+    }
+};
+
 class MAPNIK_DECL datasource : private boost::noncopyable
 {
 public:        
Index: plugins/input/postgis/connection.hpp
===================================================================
--- plugins/input/postgis/connection.hpp    (revision 2688)
+++ plugins/input/postgis/connection.hpp    (working copy)
@@ -41,10 +41,12 @@
       PGconn *conn_;
       int cursorId;
       bool closed_;
+      int statement_timeout_;
    public:
       Connection(std::string const& connection_str)
          :cursorId(0),
-         closed_(false)
+           closed_(false),
+           statement_timeout_(0)
       {
          conn_=PQconnectdb(connection_str.c_str());
          if (PQstatus(conn_) != CONNECTION_OK)
@@ -93,7 +95,13 @@
                  }

                  s << "\nFull sql was: '" <<  sql << "'\n";
-             } 
+             }
+             if (result) {
+                 /* Detect whether the error was due to statement_timeout or pg_cancel_backend() */
+                 const char *val = PQresultErrorField(result, PG_DIAG_SQLSTATE);
+                 if (val && (strcmp(val,"57014") == 0)) /* SQLSTATE ERRCODE_QUERY_CANCELED */
+                     throw mapnik::datasource_query_cancelled_exception( s.str() );
+             }
              throw mapnik::datasource_exception( s.str() );
          }

@@ -129,6 +137,21 @@
           return s.str();
       }

+      int statement_timeout(int statement_timeout)
+      {
+          /* Sets the statement_timeout on this current connection and returns the previous timeout value */
+          int old_timeout = statement_timeout_;
+          if (statement_timeout_ != statement_timeout) 
+          {
+              statement_timeout_ = statement_timeout;
+              std::ostringstream s;
+              s << "SET statement_timeout= " << statement_timeout_ << ";";
+              if (!execute(s.str()))
+                throw mapnik::datasource_exception( "PSQL Error: Setting statement_timeout failed." );
+          }
+          return old_timeout;
+      }
+
       ~Connection()
       {
          if (!closed_)
Index: plugins/input/postgis/postgis.cpp
===================================================================
--- plugins/input/postgis/postgis.cpp   (revision 2688)
+++ plugins/input/postgis/postgis.cpp   (working copy)
@@ -83,6 +83,7 @@
       scale_denom_token_("!scale_denominator!"),
       persist_connection_(*params_.get<mapnik::boolean>("persist_connection",true)),
       extent_from_subquery_(*params_.get<mapnik::boolean>("extent_from_subquery",false)),
+      statement_timeout_(*params_.get<int>("statement_timeout",0)),
       // params below are for testing purposes only (will likely be removed at any time)
       force2d_(*params_.get<mapnik::boolean>("force_2d",false)),
       st_(*params_.get<mapnik::boolean>("st_prefix",false))
@@ -377,6 +378,8 @@

 boost::shared_ptr<IResultSet> postgis_datasource::get_resultset(boost::shared_ptr<Connection> const &conn, const std::string &sql) const
 {
+    conn->statement_timeout(statement_timeout_);
+
     if (cursor_fetch_size_ > 0)
     {
         // cursor
Index: plugins/input/postgis/postgis.hpp
===================================================================
--- plugins/input/postgis/postgis.hpp   (revision 2688)
+++ plugins/input/postgis/postgis.hpp   (working copy)
@@ -74,6 +74,7 @@
       const std::string scale_denom_token_;
       bool persist_connection_;
       bool extent_from_subquery_;
+      int statement_timeout_;
       // params below are for testing purposes only (will likely be removed at any time)
       bool force2d_;
       bool st_;

@strk
Copy link
Contributor

strk commented Mar 5, 2014

subscribe

@strk
Copy link
Contributor

strk commented Mar 6, 2014

To add to the discussion, the "global timeout" approach mentioned by @tokumine has the limit that does NOT affect pgbouncer reconnect attempts, which by default keep queries in a queue until success, so before any backend can enforce the "statement_timeout". For this reason a mapnik-implemented timeout is very needed.

Another reason is that mapnik-enforced timeout would be mapnik-specific, despite the global timeout for the user mapnik connects as. I'm going to try the patch next.

@strk
Copy link
Contributor

strk commented Mar 6, 2014

Sorry, after eyereading the patch I see it would not help with my case, as the timeout is still implemented using "SET STATEMENT_TIMEOUT". Instead what I need is something that aborts outside of the database, and fully within the mapnik process.

@strk
Copy link
Contributor

strk commented Mar 6, 2014

So I think the options available are:

The first one should use less resources...

@springmeyer
Copy link
Member

+1 to trying the async libpq interface.

@strk
Copy link
Contributor

strk commented Mar 10, 2014

I have some work in progress, but the failing testsuite is making it harder to test (#2181) Could anyone check that for me ?

@strk
Copy link
Contributor

strk commented Mar 10, 2014

As the query execution function is in an .hpp file (connection.hpp), I'm afraid any change to the implementation would require recompilation of all exiting clients, in case they inlined the old version.

Also, I found there's already an "async" interface in connection.hpp, and is also already used, altought not always (get_resultset uses sync or async based on haveing received a context).

@strk
Copy link
Contributor

strk commented Mar 10, 2014

NOTE: I'm cointnuing my work-in-progress in the 2.2.x branch, because there are more failures coming out with using 2.3.x from cartodb/Windshaft which I don't want to distract the work, for now. Not sure if you want a ticket for those failures (compatibility, most likely) right now.

@strk
Copy link
Contributor

strk commented Mar 10, 2014

I hit a wall with my work with handling of query cancelation. Timing out leaves the connection in a "busy" state which makes next attempts to send a query fail with a "another query currently in progress" message, while cancelling a query on a pgbouncer-hold connection would again block defeating the purpose of this work.

I guess it'll take some instance variable to remember the "pending" state, which btw is also available already in the 2.3.x branch...

@strk
Copy link
Contributor

strk commented Mar 10, 2014

Ok I've resorted to close the connection on timeout. It seems to be effective for the pgbouncer case.
If you want to test it, a version with hard-coded timeout of 4 seconds is here: strk@1041522

@lexman
Copy link
Contributor

lexman commented Mar 10, 2014

Hello @strk, have you tried PQgetCancel / PQcancel ( http://www.postgresql.org/docs/8.4/static/libpq-cancel.html ) instead of closing the connection ?

@strk
Copy link
Contributor

strk commented Mar 10, 2014

@abonnasseau I've tried those, but as the problem is an "hanging" connection, those commands fail to complete, unless we get back to an async mode and timeout the timeout handler...

@strk
Copy link
Contributor

strk commented Mar 10, 2014

@abonnasseau the test I'm performing involves pgbouncer with its default configuration and shutting down its backend. pgbouncer puts any incoming request "on hold" till next reconnection attempt (15 seconds later). My goal is having mapnik early return with an error rather than hanging forever.

@strk
Copy link
Contributor

strk commented Mar 10, 2014

Here's the version for 2.3.x, but it doesn't enforce the timeout in the explicitly-async interfaces yet: strk@bd61d7e

@strk
Copy link
Contributor

strk commented Mar 10, 2014

BTW, I see that also getAsyncResult is calling close() on error, only after clearAsyncResult, but at that point the former would not be needed at all, right ?

@strk
Copy link
Contributor

strk commented May 26, 2014

More advancement in #2255 -- the code still does not enforce the timeout in all code paths, specifically it's not in the getAsyncResult.

@springmeyer
Copy link
Member

@strk - just checking in here. What are the next steps?

@strk
Copy link
Contributor

strk commented Aug 19, 2014

Merging #2255, I'd think. But I won't have time before the second week of September for acting on your feedback there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants