Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Improve flexibility and reliability of PostGIS SQL/Query handling #260

Closed
artemp opened this Issue · 16 comments

1 participant

@artemp
Owner

The postgis input plugin uses function postgis_datasource::table_from_sql to obtain the table name used in a select query. This function searches for the last occurrence of "from" in the query, and takes the next word as the table name.

This method fails, if, after the real "from", the query contains another from, even if that is part of a word (e.g. ST_GeomFromText function). It might also fail (or yields questionable results), if the query joins two or more tables, or has sub-selects (although handling this might require larger changes and thus be a different topic).

The attached patch works in my case, by checking for the last occurrence of " from " (ie. with spaces around). But it might need further improvement, because cases are possible, where the from is not delimited by spaces.

If unhandled, the layers containing the described queries are not shown at all (silently ignored).

@artemp
Owner

[artem] Good catch! Yes, current logic doesn't work with all "select .." variants, but your patch is an improvement, thanks!
Applied in r1028 and pushed to 0.7.0 for further refinements.

@artemp
Owner

[kunitoki] I think we should stop a bit and think this a little better. The '''table_from_sql''' function is replicated among all the database accessing input plugins: what about writing a wicked one that should be shared among all the plugins and put in a ''utils'' namespace inside the mapnik trunk instead of copy and paste the code among plugins ?

@artemp
Owner

[springmeyer] cc'ed rcoup as he mentioned interest in refactoring the 'table' param

@artemp
Owner

[artem] Replying to [comment:3 kunitoki]:

I think we should stop a bit and think this a little better. The '''table_from_sql''' function is replicated among all the database accessing input plugins: what about writing a wicked one that should be shared among all the plugins and put in a ''utils'' namespace inside the mapnik trunk instead of copy and paste the code among plugins ?

I agree, duplication is bad. We need to figure out how comprehensive this function should be. It might be a good idea to implement a proper (grammar based) parser for SQL "SELECT" statement.

@artemp
Owner

[artem] Replying to [comment:5 artem]:

Replying to [comment:3 kunitoki]:

I think we should stop a bit and think this a little better. The '''table_from_sql''' function is replicated among all the database accessing input plugins: what about writing a wicked one that should be shared among all the plugins and put in a ''utils'' namespace inside the mapnik trunk instead of copy and paste the code among plugins ?

I agree, duplication is bad. We need to figure out how comprehensive this function should be. It might be a good idea to implement a proper (grammar based) parser for SQL "SELECT" statement.

[http://savage.net.au/SQL/ SQL BNF grammar]

@artemp
Owner

[springmeyer] pushing this discussion to 0.6.2

@artemp
Owner

[matt_c] This is indeed a major design flaw. It has caused me hard-to-debug problems on almost every SQL statement for every layer I've tried to add. I have found mapnik difficult to use because of this and had to resort to lots of head scratching and map2nik.py debugging.

Layer content simply does not appear if the table detection from SQL fails. Running nik2img.py to debug I see that mapnik is generating this SQL "SELECT AsBinary?("") ..." instead of "SELECT AsBinary?("columnName")".

I regularly use sub-selects in my WHERE criteria and the word "from" more than once, perfectly valid SQL and this flaky table-detection always prevents the layer from appearing..

Writing a proper (grammar based) SQL parser would ideally include some or all dialect-specific extensions to allow SQL to be written for all database targets. Furthermore grammer-based parsing is not a trivial task, so IMHO a good workaround is probably the best solution for now.

In all of my use cases, the geometry is a real column (for speed) in a real table. It is the selection criteria that can be non-trivial and varied (often including table joins and sub-select queries).

In cases like this, searching for the last 'from' keyword is worse than searching for the first 'from' keyword. If it was searching for the first 'from', it would work in the above use cases (but still wouldn't be right, ofc :)

Can I suggest that, at least until we parse the SQL properly, a new config option be added to complement 'geometry_field': called 'geometry_table'. This would allow workaround when simple table detection from the SQL query is not viable / gonna fail.

If geometry_table is specified, it needn't search the SQL for a table at all..

@artemp
Owner

[matt_c] I've prepared a minor patch to add the explicit 'geometry_table' setting.
I'll upload the patch once I have validated it.

@artemp
Owner

[springmeyer] additionally, whomever takes this on see the patch from Hans:

http://lists.berlios.de/pipermail/mapnik-devel/2009-July/000937.html

{{{
#!diff
--- mapnik-0.6.1/plugins/input/postgis/postgis.cpp 2009-07-14 11:08:55.000000000 +0200
+++ mapnik-0.6.1-patched/plugins/input/postgis/postgis.cpp 2009-07-17 11:39:44.000000000 +0200
@@ -82,6 +82,9 @@

multiple_geometries_ = *params_.getmapnik::boolean("multiple_geometries",false);

  • schema_name_ = *params.getstd::string("schema_name","");
  • table_name_ = *params.getstd::string("table_name","");
    +
    boost::optionalstd::string ext = params_.getstd::string("extent");
    if (ext)
    {
    @@ -130,25 +133,32 @@

     desc_.set_encoding(conn->client_encoding());
    
  •     std::string table_name=table_from_sql(table_);
    
  • std::string schema_name="";
  • std::string::size_type idx=table_name.find_last_of('.');
  • if (table_name_.length()==0)
  • {
  • table_name_=table_from_sql(table_);
  • std::string::size_type idx=table_name_.find_last_of('.'); if (idx!=std::string::npos) {
  • schema_name=table_name.substr(0,idx);
  • table_name=table_name.substr(idx+1);
  • if (schema_name_.length()==0)
  • {
  • schema_name_=table_name_.substr(0,idx);
  • };
  • table_name_ =table_name_.substr(idx+1); } else {
  • table_name=table_name.substr(0);
  • table_name_=table_name_.substr(0); }
  • }; +
  •     geometryColumn_=geometry_field_;
    
     std::ostringstream s;
     s << "select f_geometry_column,srid,type from ";
    
  • s << GEOMETRY_COLUMNS <<" where f_table_name='" << table_name<<"'";
  •     s << GEOMETRY_COLUMNS <<" where f_table_name='" << table_name_<<"'";
    
  •     if (schema_name.length() > 0)
    
  • s <<" and f_table_schema='"<< schema_name <<"'";
  • if (schema_name_.length() > 0)
  •        s <<" and f_table_schema='"<< schema_name_ <<"'";
    
     if (geometry_field_.length() > 0)
        s << " and f_geometry_column = '" << geometry_field_ << "'";
    

    @@ -361,21 +371,20 @@
    {
    PoolGuard,shared_ptr > > guard(conn,pool);
    std::ostringstream s;

  •     std::string table_name = table_from_sql(table_);
     boost::optional<std::string> estimate_extent = params_.get<std::string>("estimate_extent");
    
     if (estimate_extent && *estimate_extent == "true")
     {
        s << "select xmin(ext),ymin(ext),xmax(ext),ymax(ext)"
          << " from (select estimated_extent('" 
    
  • << table_name <<"','"
  • << table_name_ <<"','" << geometryColumn_ << "') as ext) as tmp"; } else { s << "select xmin(ext),ymin(ext),xmax(ext),ymax(ext)" << " from (select extent(" <<geometryColumn_<< ") as ext from "
  • << table_name << ") as tmp";
  •          << table_name_ << ") as tmp";
     }
    
     shared_ptr<ResultSet> rs=conn->executeQuery(s.str());
    

    --- mapnik-0.6.1/plugins/input/postgis/postgis.hpp 2009-07-14 11:08:55.000000000 +0200
    +++ mapnik-0.6.1-patched/plugins/input/postgis/postgis.hpp 2009-07-17 11:40:42.000000000 +0200
    @@ -1,3 +1,4 @@
    +
    /*****************************************************************************

    • This file is part of Mapnik (c++ mapping toolkit) @@ -60,6 +61,8 @@ const std::string geometry_field_; const int cursor_fetch_size_; const int row_limit_;
  • std::string schema_name_;
  • std::string table_name_; std::string geometryColumn_; int type_; int srid_; }}}
@artemp
Owner

[matt_c] I've attached a patch, here's a good fix for now that I've been using for a while.

The patch adds an optional geometry_table config option to complement geometry_field. If specified, Mapnik won't do a dumb search on the SQL specified in 'table' for the table name, so complex SQL queries can be used that contain multiple 'from' statements, without any trouble.

The patch only addressese the PostGIS datasource for now.
The sqllite datasource has a similar problem.

The patch is backwards-compatible, it will not change the current behaviour until geometry_table is specified.

@artemp
Owner

[springmeyer] Matt,

Thanks for the patch. I'll try to look over this soon.

Also note the fixes applied for #376.

@artemp
Owner

[springmeyer] See also #426, likely a duplicate of this ticket.

@artemp
Owner

[springmeyer] Thanks Numenor, Hans, and matt_c. I've combined the offerings of matt_c and hans's patches, along with the discussion at #426, and come up with the solution (extra parameters) added in r1473, notably 'geometry_table' to ensure that Mapnik does not need try to parse the table out of the subselect. Also, in cases where the layer extent can be hardcoded, 'geometry_field' and 'srid' can alternately be supplied to avoid the need for the table parsing as well.

Closing this ticket as I have tested against a variety of example queries people have offered in this ticket, on the mailing list, and that I have generated by rendering graphics from django models and querysets. Ideally we'd have a postgres testing framework in place so you could verify from yourself but that was not a task I could accomplish given other priorities.

Please open further tickets if other issues arise or you have additional feedback.

Cheers,

Dane

@artemp
Owner

[springmeyer] schema support was the one missing piece to the fixed mentioned above, now addressed in #500 and r1631

@artemp artemp closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.