Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

st_write arguments #53

Closed
Nowosad opened this issue Nov 6, 2016 · 22 comments
Closed

st_write arguments #53

Nowosad opened this issue Nov 6, 2016 · 22 comments

Comments

@Nowosad
Copy link
Contributor

Nowosad commented Nov 6, 2016

Similar to #39, but about st_write().
Now, st_read() accepts filename, but to write a file using st_write there is a need to specify layer argument:

st_write(nc, "nc.shp")
Error in CPL_write_ogr(obj, dsn, layer, driver, as.character(dataset_options),  : 
  argument "layer" is missing, with no default

Is it possible to add this new functionality also for st_write()?

@edzer
Copy link
Member

edzer commented Nov 6, 2016

I now give the default value basename(dsn) to argument layer, so that

st_write(nc, "nc.shp")

writes layer nc.shp (which the driver molds into nc anyway, looking at ogrinfo nc.shp), but

st_write(nc, "nc.gpkg", , "GPKG")

writes a real nc.gpkg layer into the file nc.gpkg. This is just a default, and pretty pragmatic. See 560322a

@Nowosad
Copy link
Contributor Author

Nowosad commented Nov 6, 2016

Right now it works well for shapefiles, but if you try to use it for different formats - you always get shapefile as a result:

> st_write(nc, "nc.gpkg")
Writing layer nc.gpkg to data source nc.gpkg using driver "ESRI Shapefile"
features:       100
fields:         14
geometry type:  MULTIPOLYGON
> st_write(nc, "nc.sqlite")
Writing layer nc.sqlite to data source nc.sqlite using driver "ESRI Shapefile"
features:       100
fields:         14
geometry type:  MULTIPOLYGON

@edzer
Copy link
Member

edzer commented Nov 6, 2016

I see your point - sth like drive = guess_driver(dsn), and look at the extension. I agree that that could be more useful than shapefiles all day long. I guess we could guess them from

  • kml
  • kmz
  • shp
  • gpkg
  • sqlite
  • osm
  • nc
  • geojson
  • gpx
  • dxf
  • svg
  • xls
  • xlsx
  • ods
  • jml
  • e00

Who's willing to wite a guess_driver function?

@Nowosad
Copy link
Contributor Author

Nowosad commented Nov 6, 2016

Do you think about something like:

dsn <- "nc.shp"
guess_driver <- function(dsn){
        extension <- tools::file_ext(dsn)
        if (extension=="shp"){
                driver <- "ESRI Shapefile"
        }

}

If yes - I could try to do that.

@edzer
Copy link
Member

edzer commented Nov 6, 2016

I'd suggest using switch - plenty of examples in sf - insteaf of many ifs, but yes, please go ahead.

@mpadge
Copy link
Contributor

mpadge commented Nov 6, 2016

Rather than switching between 190 drivers, wouldn't it be better to store the table of OGR vector format short names and just grep the file extension? Anomalies could then be dealt with as they arise (which they likely will regardless, for example with ".shp"). This could also avoid ambiguities:

> sf::st_drivers()$name [ grep ("sql", sf::st_drivers()$name, ignore.case=FALSE) ]
[1] SQLite       MSSQLSpatial PostgreSQL

A simple if could be used to demand that cases with multiple grep values specify the desired driver precisely.

@Nowosad
Copy link
Contributor Author

Nowosad commented Nov 6, 2016

@mpadge that's a good idea, but do you know any source of table with short names and extensions of vector formats?

@Nowosad
Copy link
Contributor Author

Nowosad commented Nov 6, 2016

Prototype of guess_driver() function:

library('tibble')
extensions <- tribble(
        ~ext, ~name,
        'kml', 'KML',
        'kmz', 'LIBKML',
        'shp', 'ESRI Shapefile',
        'gpkg', 'GPKG',
        'sqlite', 'SQLite',
        'osm', 'OSM',
        'nc', 'netCDF',
        'geojson', 'GeoJSON',
        'gpx', 'GPX',
        'dxf', 'DXF',
        'svg', 'SVG',
        'xls', 'XLS',
        'xlsx', 'XLSX',
        'ods', 'ODS',
        'jml', 'JML',
        'e00', 'AVCE00'
)
guess_driver <- function(dsn){
        extension <- tools::file_ext(dsn)
        if(extension=="") stop("File extension missing")
        driver <- extensions$name[grep(extension, extensions$ext, ignore.case=FALSE)]
        if(length(driver)==0) stop("Cannot guess a proper driver")
        return(driver)
}

guess_driver('nc.shp')
guess_driver('nc')
guess_driver('nc.exe')
guess_driver('nc.gpkg')

Of course, using this implementation there is a need for storing extensions data.frame somewhere. Moreover, this data.frame should also use sf::st_drivers to check if driver exist.

What do you think?

@mpadge
Copy link
Contributor

mpadge commented Nov 6, 2016

You should only need to specify the exceptions and leave the rest to grepping st_drivers():

sub_from <- c ("shp", "nc") # list of exceptions (needs more entries)
sub_to <- c ("ESRI Shapefile", "netCDF")
driver <- grep (tools::file_ext (dsn), sub_from, ignore.case=TRUE)
if (!is.null (driver))
    driver <- sub_to [driver]
else
    driver <- sf::st_drivers()$name [ grep (ext, sf::st_drivers()$name, ignore.case=FALSE) ] 
if (length (driver) == 0) stop ("Cannot guess an appropriate driver")
else if (length (driver) > 1) stop ("Please specify desired driver: ", driver)

This is also non-tibbled to minimise dependencies as noted by Edzer here. The lists of sub_from and sub_to could simply be hard coded because they'll likely contain only

> length (which (nchar (drivers) > 3))
[1] 49

from 190 entries, or one quarter of all drivers.

@Nowosad
Copy link
Contributor Author

Nowosad commented Nov 6, 2016

I'm not sure if driver <- sf::st_drivers()$name [ grep (ext, sf::st_drivers()$name, ignore.case=FALSE) ] is a good idea, because st_drivers()$name is a mix of names and extensions.
I agree that vectors are better than tibble in this case. My new approach:

file_ext <- c("kml", "kmz", "shp", "gpkg", "sqlite", "osm", "nc", "geojson",
              "gpx", "dxf", "svg", "xls", "xlsx", "ods", "jml", "e00")
file_drv <- c("KML", "LIBKML", "ESRI Shapefile", "GPKG", "SQLite", "OSM", "netCDF", "GeoJSON",
              "GPX", "DXF", "SVG", "XLS", "XLSX", "ODS", "JML", "AVCE00")

guess_driver <- function(dsn){
        extension <- tools::file_ext(dsn)
        if(extension=="") stop("Cannot guess an appropriate driver - file extension missing")
        driver <- file_drv[grep(extension, file_ext, ignore.case=TRUE)]
        if (length(driver) == 0){
                stop ("Cannot guess an appropriate driver")
        } else if (length(driver) > 1){
                stop ("Please specify desired driver: ", driver)
        } else if (!(driver %in% sf::st_drivers()$name)){
                stop ("The chosen GDAL driver is missing")
        } else{
                return(driver)
        }
}

guess_driver('nc.shp')
guess_driver('nc')
guess_driver('nc.exe')
guess_driver('nc.gpkg')
guess_driver('nc.kmz')

@edzer
Copy link
Member

edzer commented Nov 7, 2016

I agree; we also don't know in advance which drivers might be present. I think this is a good proposal.

@mpadge
Copy link
Contributor

mpadge commented Nov 7, 2016

Here is the exhaustive list of all OGR vector drivers with write support (ordered according to appearance in the OGR table). These driver support writing to files, with this list ignoring a number of database drivers which first require connections to be established, so anyone using them may be presumed to know what they want anyway.

number file extension Driver (short) name found with grep?
1 .e00 AVCD00
2 bna BNA +
3 dxf DXF +
4 csv CSV +
5 csw CSW +
6 gdb OpenFileGDB +
7 shp ESRI Shapefile
8 geojson GeoJSON +
9 gxt Geoconcept
10 rss GeoRSS +
11 gml GML +
12 gmt GMT +
13 gps GPSBabel
14 gpx GPX +
15 gtm GPSTrackMaker
16 ili "Interlis 1" and "Interlis 2"
17 jml JML +
18 kml KML +
19 nc NetCDF
20 ods ODS +
21 xls XLSX +
22 xlsx XLSX +
23 vdv VDV +
24 map WAsP

Using my previous grep (file_ext, st::sf_drivers ()$name, ...) would thus require hard-coding only 8 of all 24 possibilities.

Notes:

  1. Not all drivers support writing (including KMZ, OSM, XLS, ... and any others in the previous list of @Nowosad but not found here)
  2. LIBKML does not appear in sf::st_drivers()
  3. The MapInfo File driver can also write, potentially for mif, mid, tab, dat, map, id, and ind file extensions (with map also being used by the WAsP driver)
  4. Both SQLite and PostgreSQL could probably be hard-coded?

TODO prior to implementing guess_driver: Try writing with a file extension that greps a driver that does not support writing and see what happens. Now who wants to implement it in a PR? @Nowosad? Or Edzer?

@edzer
Copy link
Member

edzer commented Nov 7, 2016

This speaks against using grep, in favour of @Nowosad 's proposal.

Related: @Nowosad noticed in #54 that LIBKML was not on his machine. Does GDAL prefer LIBKML (based on google's lib) over KML when it comes to reading KML?

@mpadge
Copy link
Contributor

mpadge commented Nov 7, 2016

Attempt to write to driver without write support:

Creation of dataset name.ext failed
Error in eval(substitute(expr), envir, enclos) : Creation failed.
In addition: Warning message:
In CPL_write_ogr(obj, dsn, layer, driver, as.character(dataset_options),  :
   GDAL Error 6: GDALDriver::Create() ... no create method implemented for this format.

That seems an entirely informative error message, so no risks there.

Regarding LIBKML: http://gdal.org/drv_libkml.html states that: "Note that if you build and include this LIBKML driver, it will become the default reader of KML for ogr, overriding the previous KML driver. You can still specify either KML or LIBKML as the output driver via the command line." (And yeah, my previous comments simply reflect me not having libkml-dev installed prior to GDAL compile.)

But wouldnt' grepping still be better, because the table of extension-driver matches would only need 8 entires instead of 24? (Or 9 entires including disambiguating KML from LIBKML) Note also that the above column of found with grep indicates umbiguous 1-1 matches. And the above error would arise whenever grep found some other inappropriate driver, for example for entirely unknown file extensions, and this seems pretty failsafe.

@edzer
Copy link
Member

edzer commented Nov 7, 2016

This is bikeshedding. With a fixed table (or two vectors), we see, and fix what we do. We can then deselect those not present or without write support, using st_drivers().

@rsbivand
Copy link
Member

rsbivand commented Nov 7, 2016

Note that checking for write capability needs to be done dynamically, as different GDAL installs will have different external dependencies and versions of those dependencies. I don't feel that guessing file extensions is smart at all, users should state the driver the want to use explicitly, then they will not be caught out.

@mpadge
Copy link
Contributor

mpadge commented Nov 7, 2016

This version of guess_driver() can not fail because it only matches guesses to the list of installed drivers returned by the GDALDriverManager. If they're not installed they won't be called. I do think being able to guess drivers makes life easier because the GDAL driver names are not always obvious, while file extensions generally are.

guess_driver <- function(dsn){
    ext_map <- matrix (c (
                       "e00",   "AVCE00",
                       "shp",   "ESRI Shapefile",
                       "gxt",   "Geoconcept",
                       "gps",   "GPSBabel",
                       "gtm",   "GPSTrackMaker",   
                       "nc",    "NetCDF",
                       "map",   "WAsP"),
                       nrow=2)
    extension <- tools::file_ext(dsn)
    if(extension=="") 
        stop("Cannot guess an appropriate driver - file extension missing")
    driver <- ext_map [2, grep (extension, ext_map [1,], ignore.case=TRUE)]
    drivers <- sf::st_drivers()$name # will list vailable drivers only
    if (length (driver) > 0)
    {
        if (!driver %in% drivers)
            driver <- NULL
    } else
    {
        driver <- drivers [ grep (extension, drivers, ignore.case=TRUE) ]
    }

    if (length(driver) == 0){
        stop ("Cannot guess an appropriate driver: Please specify")
    } else if (length(driver) > 1){
        stop (paste ("Please specify desired driver: ", driver))
    } else{
        return(driver)
    }
}

The function could then be implemented by modifying st_write() simply by:

  1. Removing default value for driver argument
  2. Inserting on penultimate line prior to calling CPL_write_ogr:
if (missing (driver)) driver <- guess_driver (driver)

@mpadge
Copy link
Contributor

mpadge commented Nov 7, 2016

And actually I'd suggest that appropriate drivers should be 'guessed' even when they are specified because:

  1. This will avoid inadvertently requesting a driver that is not installed - fulfilling the dynamic checking task suggested by @rsbivand in a way that st_write() currently does not; and
  2. Driver names as passed to GDAL/OGR are case sensitive and often very camel-like; guessing by grepping with ignore.case=TRUE will match regardless of case.

The final line above thus might be better as

if (!missing (driver))
    drivers <- sf::st_drivers()$name
    gdal_driver <- drivers [grep (driver, drivers, ignore.case=TRUE)]
    if (length (gdal_driver) == 0) stop (paste ("Driver", driver, "not available"))
    else if (length (gdal_driver) > 1) stop (paste ("Please specify driver:", gdal_driver))
else
    gdal_driver <- guess_driver (dsn)

and then passing gdal_driver to CPL_write_ogr

@edzer edzer closed this as completed in 26439f8 Nov 7, 2016
@edzer
Copy link
Member

edzer commented Nov 7, 2016

It's not that I don't like PRs, but I think my fix does the right thing. Please comment, or add by (PR) extension/driver combinations.

@mpadge : your

> guess_driver("nc.g")
Error in guess_driver("nc.g") : 
  Please specify desired driver:  GeoconceptPlease specify desired driver:  GPSBabelPlease specify desired driver:  GPSTrackMaker
In addition: Warning message:
In if (!driver %in% drivers) driver <- NULL :
  the condition has length > 1 and only the first element will be used

@Nowosad
Copy link
Contributor Author

Nowosad commented Nov 7, 2016

@edzer what's the purpose of adding "osm", "OSM", # NO WRITE to the code if it is not possible to write them?

@mpadge
Copy link
Contributor

mpadge commented Nov 7, 2016

@Nowosad Error is more informative that way:

> st_write (stuff, "file.osm")
Writing layer `junk.osm' to data source `junk.osm' using driver `OSM'
Creation of dataset junk.osm failed.
Error in eval(substitute(expr), envir, enclos) : Creation failed.
In addition: Warning messages:
1: In guess_driver(dsn) :
    guess_driver OSM is available but reports it will not allow writing
2: In CPL_write_ogr(obj, dsn, layer, driver, as.character(dataset_options),  :
    GDAL Error 6: GDALDriver::Create() ... no create method implemented for this format.

> st_write (stuff, "file.osm", driver="OSM")
# ... differs at
In addition: Warning message:
In CPL_write_ogr(obj, dsn, layer, driver, as.character(dataset_options),  :
    GDAL Error 6: GDALDriver::Create() ... no create method implemented for this format.

edzer added a commit that referenced this issue Nov 7, 2016
Signed-off-by: Edzer Pebesma <edzer.pebesma@uni-muenster.de>
@edzer
Copy link
Member

edzer commented Nov 7, 2016

@Nowosad I removed this; it was just to remind myself to write tests for drivers that don't write. Anyway, there's little you can test here, with drivers evolving and all platforms supporting sth different.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants