Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve flexibility and reliability of PostGIS SQL/Query handling #260

Closed
artemp opened this issue Oct 11, 2011 · 16 comments
Closed

Improve flexibility and reliability of PostGIS SQL/Query handling #260

artemp opened this issue Oct 11, 2011 · 16 comments

Comments

@artemp
Copy link
Member

artemp commented Oct 11, 2011

The postgis input plugin uses function postgis_datasource::table_from_sql to obtain the table name used in a select query. This function searches for the last occurrence of "from" in the query, and takes the next word as the table name.

This method fails, if, after the real "from", the query contains another from, even if that is part of a word (e.g. ST_GeomFromText function). It might also fail (or yields questionable results), if the query joins two or more tables, or has sub-selects (although handling this might require larger changes and thus be a different topic).

The attached patch works in my case, by checking for the last occurrence of " from " (ie. with spaces around). But it might need further improvement, because cases are possible, where the from is not delimited by spaces.

If unhandled, the layers containing the described queries are not shown at all (silently ignored).

@artemp
Copy link
Member Author

artemp commented Oct 11, 2011

[artem] Good catch! Yes, current logic doesn't work with all "select .." variants, but your patch is an improvement, thanks!
Applied in r1028 and pushed to 0.7.0 for further refinements.

@artemp
Copy link
Member Author

artemp commented Oct 11, 2011

[kunitoki] I think we should stop a bit and think this a little better. The '''table_from_sql''' function is replicated among all the database accessing input plugins: what about writing a wicked one that should be shared among all the plugins and put in a ''utils'' namespace inside the mapnik trunk instead of copy and paste the code among plugins ?

@artemp
Copy link
Member Author

artemp commented Oct 11, 2011

[springmeyer] cc'ed rcoup as he mentioned interest in refactoring the 'table' param

@artemp
Copy link
Member Author

artemp commented Oct 11, 2011

[artem] Replying to [comment:3 kunitoki]:

I think we should stop a bit and think this a little better. The '''table_from_sql''' function is replicated among all the database accessing input plugins: what about writing a wicked one that should be shared among all the plugins and put in a ''utils'' namespace inside the mapnik trunk instead of copy and paste the code among plugins ?

I agree, duplication is bad. We need to figure out how comprehensive this function should be. It might be a good idea to implement a proper (grammar based) parser for SQL "SELECT" statement.

@artemp
Copy link
Member Author

artemp commented Oct 11, 2011

[artem] Replying to [comment:5 artem]:

Replying to [comment:3 kunitoki]:

I think we should stop a bit and think this a little better. The '''table_from_sql''' function is replicated among all the database accessing input plugins: what about writing a wicked one that should be shared among all the plugins and put in a ''utils'' namespace inside the mapnik trunk instead of copy and paste the code among plugins ?

I agree, duplication is bad. We need to figure out how comprehensive this function should be. It might be a good idea to implement a proper (grammar based) parser for SQL "SELECT" statement.

[http://savage.net.au/SQL/ SQL BNF grammar]

@artemp
Copy link
Member Author

artemp commented Oct 11, 2011

[springmeyer] pushing this discussion to 0.6.2

@artemp
Copy link
Member Author

artemp commented Oct 11, 2011

[matt_c] This is indeed a major design flaw. It has caused me hard-to-debug problems on almost every SQL statement for every layer I've tried to add. I have found mapnik difficult to use because of this and had to resort to lots of head scratching and map2nik.py debugging.

Layer content simply does not appear if the table detection from SQL fails. Running nik2img.py to debug I see that mapnik is generating this SQL "SELECT AsBinary?("") ..." instead of "SELECT AsBinary?("columnName")".

I regularly use sub-selects in my WHERE criteria and the word "from" more than once, perfectly valid SQL and this flaky table-detection always prevents the layer from appearing..

Writing a proper (grammar based) SQL parser would ideally include some or all dialect-specific extensions to allow SQL to be written for all database targets. Furthermore grammer-based parsing is not a trivial task, so IMHO a good workaround is probably the best solution for now.

In all of my use cases, the geometry is a real column (for speed) in a real table. It is the selection criteria that can be non-trivial and varied (often including table joins and sub-select queries).

In cases like this, searching for the last 'from' keyword is worse than searching for the first 'from' keyword. If it was searching for the first 'from', it would work in the above use cases (but still wouldn't be right, ofc :)

Can I suggest that, at least until we parse the SQL properly, a new config option be added to complement 'geometry_field': called 'geometry_table'. This would allow workaround when simple table detection from the SQL query is not viable / gonna fail.

If geometry_table is specified, it needn't search the SQL for a table at all..

@artemp
Copy link
Member Author

artemp commented Oct 11, 2011

@artemp
Copy link
Member Author

artemp commented Oct 11, 2011

@artemp
Copy link
Member Author

artemp commented Oct 11, 2011

[matt_c] I've prepared a minor patch to add the explicit 'geometry_table' setting.
I'll upload the patch once I have validated it.

@artemp
Copy link
Member Author

artemp commented Oct 11, 2011

[springmeyer] additionally, whomever takes this on see the patch from Hans:

http://lists.berlios.de/pipermail/mapnik-devel/2009-July/000937.html

{{{
#!diff
--- mapnik-0.6.1/plugins/input/postgis/postgis.cpp 2009-07-14 11:08:55.000000000 +0200
+++ mapnik-0.6.1-patched/plugins/input/postgis/postgis.cpp 2009-07-17 11:39:44.000000000 +0200
@@ -82,6 +82,9 @@

multiple_geometries_ = *params_.getmapnik::boolean("multiple_geometries",false);

  • schema_name_ = *params.getstd::string("schema_name","");
  • table_name_ = *params.getstd::string("table_name","");

boost::optionalstd::string ext = params_.getstd::string("extent");
if (ext)
{
@@ -130,25 +133,32 @@

 desc_.set_encoding(conn->client_encoding());
  •     std::string table_name=table_from_sql(table_);
    
  •     std::string schema_name="";
    
  •     std::string::size_type idx=table_name.find_last_of('.');
    
  •     if (table_name_.length()==0)
    
  •     {
    
  •       table_name_=table_from_sql(table_);
    
  •       std::string::size_type idx=table_name_.find_last_of('.');
     if (idx!=std::string::npos)
     {
    
  •        schema_name=table_name.substr(0,idx);
    
  •        table_name=table_name.substr(idx+1);
    
  •          if (schema_name_.length()==0)
    
  •          {
    
  •            schema_name_=table_name_.substr(0,idx);
    
  •          };
    
  •          table_name_ =table_name_.substr(idx+1);
     }
     else
     {
    
  •        table_name=table_name.substr(0);
    
  •          table_name_=table_name_.substr(0);
     }
    
  •     };
    
  •     geometryColumn_=geometry_field_;
    
     std::ostringstream s;
     s << "select f_geometry_column,srid,type from ";
    
  •     s << GEOMETRY_COLUMNS <<" where f_table_name='" << table_name<<"'";
    
  •     s << GEOMETRY_COLUMNS <<" where f_table_name='" << table_name_<<"'";
    
  •     if (schema_name.length() > 0)
    
  •        s <<" and f_table_schema='"<< schema_name <<"'";
    
  •     if (schema_name_.length() > 0)
    
  •        s <<" and f_table_schema='"<< schema_name_ <<"'";
    
     if (geometry_field_.length() > 0)
        s << " and f_geometry_column = '" << geometry_field_ << "'";
    

    @@ -361,21 +371,20 @@
    {
    PoolGuard<shared_ptr,shared_ptr<Pool<Connection,ConnectionCreator> > > guard(conn,pool);
    std::ostringstream s;

  •     std::string table_name = table_from_sql(table_);
     boost::optional<std::string> estimate_extent = params_.get<std::string>("estimate_extent");
    
     if (estimate_extent && *estimate_extent == "true")
     {
        s << "select xmin(ext),ymin(ext),xmax(ext),ymax(ext)"
          << " from (select estimated_extent('" 
    
  •          << table_name <<"','" 
    
  •          << table_name_ <<"','" 
          << geometryColumn_ << "') as ext) as tmp";
     }
     else 
     {
        s << "select xmin(ext),ymin(ext),xmax(ext),ymax(ext)"
          << " from (select extent(" <<geometryColumn_<< ") as ext from " 
    
  •          << table_name << ") as tmp";
    
  •          << table_name_ << ") as tmp";
     }
    
     shared_ptr<ResultSet> rs=conn->executeQuery(s.str());
    

    --- mapnik-0.6.1/plugins/input/postgis/postgis.hpp 2009-07-14 11:08:55.000000000 +0200
    +++ mapnik-0.6.1-patched/plugins/input/postgis/postgis.hpp 2009-07-17 11:40:42.000000000 +0200
    @@ -1,3 +1,4 @@
    +
    /*****************************************************************************

    • This file is part of Mapnik (c++ mapping toolkit)
      @@ -60,6 +61,8 @@
      const std::string geometry_field_;
      const int cursor_fetch_size_;
      const int row_limit_;
  •  std::string schema_name_;
    
  •  std::string table_name_;
    

    std::string geometryColumn_;
    int type_;
    int srid_;
    }}}

@artemp
Copy link
Member Author

artemp commented Oct 11, 2011

[matt_c] I've attached a patch, here's a good fix for now that I've been using for a while.

The patch adds an optional geometry_table config option to complement geometry_field. If specified, Mapnik won't do a dumb search on the SQL specified in 'table' for the table name, so complex SQL queries can be used that contain multiple 'from' statements, without any trouble.

The patch only addressese the PostGIS datasource for now.
The sqllite datasource has a similar problem.

The patch is backwards-compatible, it will not change the current behaviour until geometry_table is specified.

@artemp
Copy link
Member Author

artemp commented Oct 11, 2011

[springmeyer] Matt,

Thanks for the patch. I'll try to look over this soon.

Also note the fixes applied for #376.

@artemp
Copy link
Member Author

artemp commented Oct 11, 2011

[springmeyer] See also #426, likely a duplicate of this ticket.

@artemp
Copy link
Member Author

artemp commented Oct 11, 2011

[springmeyer] Thanks Numenor, Hans, and matt_c. I've combined the offerings of matt_c and hans's patches, along with the discussion at #426, and come up with the solution (extra parameters) added in r1473, notably 'geometry_table' to ensure that Mapnik does not need try to parse the table out of the subselect. Also, in cases where the layer extent can be hardcoded, 'geometry_field' and 'srid' can alternately be supplied to avoid the need for the table parsing as well.

Closing this ticket as I have tested against a variety of example queries people have offered in this ticket, on the mailing list, and that I have generated by rendering graphics from django models and querysets. Ideally we'd have a postgres testing framework in place so you could verify from yourself but that was not a task I could accomplish given other priorities.

Please open further tickets if other issues arise or you have additional feedback.

Cheers,

Dane

@artemp
Copy link
Member Author

artemp commented Oct 11, 2011

[springmeyer] schema support was the one missing piece to the fixed mentioned above, now addressed in #500 and r1631

@artemp artemp closed this as completed Oct 11, 2011
springmeyer pushed a commit that referenced this issue May 23, 2014
springmeyer pushed a commit that referenced this issue May 23, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant