-
Notifications
You must be signed in to change notification settings - Fork 293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run SQL query on OGR layers #834
Comments
Nice idea! PR welcome. |
This issue is very similar to hypertidy/vapour#34. Maybe it would be of use here? |
There's a more complete working example in a branch of my fork here: https://github.com/barryrowlingson/sf/tree/execute_sql - I've no idea of the quality so I'm not sure I should hit the PR button. I can do if it helps, but I don't really know what I'm doing with C++... |
Looks OK; I wonder whether the Also, would it be possible to do selections based on geometries? Maybe not, or at least not portably, since GDAL on MacOS X does not compile against GEOS (where dynamic linking not allowed by CRAN, static not made possible by GDAL devs); @rsbivand ? Nevertheless, it might be nice-to-have on platforms that do support it? |
You might not want to select all columns from an OGR data source. I was following the I also think my patch needs to release the result set from the ExecuteSQL call, according to the documentation. https://www.gdal.org/classGDALDataset.html#ab2c2b105b8f76a279e6a53b9b4a182e0 |
In my experience it's better to scan the extents and attributes, and short circuit on a limited number of features or on attribute values - ExecuteSQL is less efficient than that (for OGRSQL). I don't want all this loaded in this one mega package, but I'm happy to share lessons from vapour for those of you that do. Some layer names cause tricky escaping problems, hence efforts to read FID values directly - I just got stuck on 64-bit int issues for a complete treatment. |
Here's my suggestion for sf, let the user take responsibility for what they get from:
That covers a lot of desrable customization IMO, and might help trigger greater PR input. All those ideas are implemented in vapour at least to a proof of concept level. I want them as an independent API, so I'm figuring it out independently. |
Can st_sfc create a vector of bbox? Probably not, is there anything wrong with that? I'll try |
Nonstarter, raw vectors of bounds much better. NVM, I'll work on the other stuff. |
demo(nc, ask = FALSE)
st_as_sfc(st_bbox(nc)) |
Let's go @barryrowlingson ! Any blocker here? I finally got DBI/dbplyr support to work so I'm keen to have this in sf proper |
I'm planning to release a new sf version soon. Could one of you come up with a PR to do this, preferably with one or two tests? |
I'm close, didn't want to stomp on @barryrowlingson here. There's an issue with the geometry column though, because (I think) ## gets no 'geom' so no geometry column
st_read("nc.gpkg", query = "SELECT NAME FROM 'nc.gpkg'") whereas ## gets 'geometry column
st_read("nc.shp", query = "SELECT NAME FROM nc") I guess I'll push the PR so it can be investigated |
Go for it Michael, I rate your competency above mine for this.
…On Wed, Sep 26, 2018 at 3:14 PM, Michael Sumner ***@***.***> wrote:
I'm close, didn't want to stomp on @barryrowlingson
<https://github.com/barryrowlingson> here.
There's an issue with the geometry column though, because (I think)
CPL_read_ogr is getting the geom from the fields, rather than with
poFeature->GetGeometryRef. I haven't tracked down if it can be made to
work with GPKG, with for example
## gets no 'geom' so no geometry column
st_read("nc.gpkg", query = "SELECT NAME FROM 'nc.gpkg'")
whereas
## gets 'geometry column
st_read("nc.shp", query = "SELECT NAME FROM nc")
I guess I'll push the PR so it can be investigated
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#834 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AA2QlLroF2IrdVyudrt4lKR09h7HIZ8aks5ue4uygaJpZM4WT6he>
.
|
Thanks so much for the fast action!! |
No worries, I'm definitely worried about this though - did you notice in the PR comment? gpkg_layer <- "nc.gpkg"
## we specify geom
nc_gpkg_sql = st_read(system.file("gpkg/nc.gpkg", package = "sf"),
query = sprintf("SELECT NAME, SID74, FIPS, geom FROM '%s' WHERE BIR74 > 20000", gpkg_layer))
## no geom, no sf (shp doesn't need the select geom)
nc_gpkg_no_geom = st_read(system.file("gpkg/nc.gpkg", package = "sf"),
query = sprintf("SELECT NAME, SID74, FIPS FROM '%s' WHERE BIR74 > 20000", gpkg_layer))
|
Yes. Can we use > basename("gpkg/nc.gpkg")
[1] "nc.gpkg" |
I think that's peculiar to GPKG, usually it's basename without extension, or arbitrary, multiple, and unrelated to the filename. The layer name doesn't change when the actual geopackage file name is changed, so it must be baked in on creation. |
Ah: sometimes (shape) we need no quotes around layer, other times (gpkg) we do. Is that it? Yes, for gpkg layer name is baked into the gpkg file, not derived from its name. |
I though single quotes might be a safe default, but no - double quotes seem to be: ## fails
st_read(system.file("shape/nc.shp", package="sf"),
query = "SELECT NAME, SID74, FIPS FROM 'nc' WHERE BIR74 > 20000")
# ok
st_read(system.file("shape/nc.shp", package="sf"),
query = "SELECT NAME, SID74, FIPS FROM \"nc\" WHERE BIR74 > 20000")
# ok
st_read(system.file("gpkg/nc.gpkg", package="sf"),
query = "SELECT NAME, SID74, FIPS, geom FROM 'nc.gpkg' WHERE BIR74 > 20000") That's definitely driver dependent sometimes, i.e. Access, Manifold and SQL Server afaik will take square brackets |
This works with the PostGIS driver: st_read("PG:dbname=postgis", "sids",
query = "SELECT NAME, BIR74, FIPS, wkb_geometry FROM \"sids\" WHERE BIR74 > 20000") |
So, we settle with double quotes? I guess this has to go into the docs, as users supply a full |
Yes I don't think it's worth wrapping here, there is just too many possibilities, and DBI has facilities for escaping. The geom-as-field is the major problem, because while it's easy enough to discover interactively 'select * from layer limit 1' that won't work for DBI. But, I think it's not for right now because it will need pretty big changes. |
An SO questioner: https://stackoverflow.com/questions/52086882/issue-with-query-in-st-read/52102296#52102296 wanted to filter shapefiles with an SQL query, as you can do in
ogr2ogr
for example, but thequery
parameter is only valid for DBIObject reads inst_read
.I answered by showing how to build a VRT file with the SQL query, which works fine but is a bit of a pain.
Could gdal_read.cpp use the
ExecuteSQL
method on an optional passed SQL query string? I experimented and had success with putting:on line 225, which successfully reduced my read data from 6000 features to 707. I'm not sure I completely understand what the ExecuteSQL method does, or what the returned OGRLayer is (see https://www.gdal.org/classGDALDataset.html#a5b65948b1e15fa63e96c0640eb6c5d7c) or what the spatial filter or dialect parameters can do (I left them NULL) but its a proof-of-concept. Real code would have to pass the query string through, changing the parameters in the C++ and the R and so on, and my Rcpp is inadequate to get this all right, so I just hardcoded one SQL test string.
I can't see any discussion here about implementing this already, apologies if this has already been excluded as a possible enhancement.
B
The text was updated successfully, but these errors were encountered: