Request for additional geospatial libraries ... #213

Open
dlebauer opened this Issue Jan 25, 2017 · 75 comments

Projects

None yet

8 participants

@dlebauer

@dlebauer commented on Tue Jan 24 2017

If I take the ropensci docker and add geospatial libraries (and dependencies) for packages such as rLiDAR, rgl, geomorph, ncdf4, rgdal, etc, would these additions be useful in this ropensci docker repository, or would it be of interest within rocker-org as a separate repository with geospatial additions?

@dlebauer dlebauer referenced this issue in rocker-org/ropensci Jan 25, 2017
Closed

Request for additional geospatial libraries ... #6

@cboettig
Contributor
cboettig commented Jan 25, 2017 edited

@dlebauer I think a rocker/geospatial image along those lines would be pretty compelling. Can I suggest you make a PR against rocker-org/rocker-versioned that adds a geospatial directory with such a Dockerfile (and if interested, a Dockerfile for specific R versions? See the makefile setup in one of the other dirs like verse). I'd think the image might build off of tidyverse (or verse? but tex stuff probably not needed?) rather than ropensci.

@rkrug
rkrug commented Jan 25, 2017

@dlebauer @cboettig Yes - a rocker/geospatial would be really useful. Is there an open issue or mailing list where this can be discussed (which programs, which versions, ...)? I would be interested in helping setting it up.

@cboettig
Contributor

@dlebauer @rkrug Okay, I put together a draft version of this in rocker-versioned, see:

Note the Dockerfile builds gdal libs from source since sf package needs a newer version (>=2) than can be found in debian:jessie. I don't know the spatial suite very well, so I've just added those packages marked as core in the Spatial taskview as a starting point, please feel free to add new packages in a PR and we'll figure out what if any other external lib dependencies we need to add as well. Feel free to continue comments in this thread or we can move to an issue in rocker-versioned. Thanks for the input!

@dlebauer

Thanks!

Before submitting a PR are there guidelines for narrow vs comprehensive scope?

I was thinking rLiDAR, rgl, geomorph, ncdf4, rgdal
I think learned of the first few from @lwasser s NEON tutorials.

Also postgresql + PostGIS. dplyr provides an awesome interface but RPostgreSQL is older

Suggestions @serbinsh @lwasser @sckott?

@nuest
nuest commented Jan 26, 2017

On narrow vs comprehensive: May I suggest the maintainers of CRAN task views Spatial @rsbivand and Spatiotemporal @edzer take a look at the list of included packages?

@rkrug
rkrug commented Jan 26, 2017 edited

@dlebauer @nuest @cboettig @rsbivand @edzer

The most comprehensive approach would be to install all packages in the task view

RUN r -e "install.packages('ctv')" \ 
    && r -l "ctv" -e "install.views('Spatial')"

But I would suggest to have more than one image, i.e. images for different applications. For example: If I use GRASS and R, I normally don't need postgresql. If I do remote sensing, I need different packages than point pattern analysis.

So my suggestion would be to four or five packages. From my perspective, these would be:

  1. Base package - for analysis of spatial data in R but does not include any "special" deb packages like GRASS or postgresql - but obviously gdal, proj4, ...; only base packages installed. This could be the one the others are based on.
  2. Task View - which includes all components to run all task view packages
  3. GRASS - includes all components to interface R and GRASS and run R as well as GRASS
  4. remote sensing - no idea what is needed there
@rsbivand

Let's not complicate things before the foundations are firm. The proposed listing is not starting by installing key dependencies for GDAL, in particular SQLite. Without SQLite, the GDAL GPKG driver will be absent.

So first establish which GDAL external dependencies to meet. My GDAL configure runs with:

--with-odbc --with-geos=yes --with-curl --with-spatialite=no --with-sqlite=yes

but it may also be helpful to accommodate HDF4, HDF5 and NetCDF. Since GRASS needs GDAL too, getting GDAL right first may be crucial. Also see the GDAL github test rig.

@edzer
edzer commented Jan 26, 2017

Agreed - the setdiff between gdal drivers in ppa:ubuntugis-unstable and in this docker image is

> setdiff(n, n0)
 [1] "BAG"          "CartoDB"      "DODS"         "EPSILON"      "Geomedia"    
 [6] "GMT"          "GPKG"         "HDF4"         "HDF4Image"    "HDF5"        
[11] "HDF5Image"    "Interlis 1"   "Interlis 2"   "JP2OpenJPEG"  "LIBKML"      
[16] "MBTiles"      "MSSQLSpatial" "MySQL"        "NAS"          "netCDF"      
[21] "ODBC"         "OGDI"         "OGR_DODS"     "OGR_OGDI"     "OSM"         
[26] "PGeo"         "Rasterlite"   "SQLite"       "VFK"          "Walk"        
[31] "WEBP"         "XLS"  

That doesn't mean everything, but as Roger suggests there are a lot of useful things in this list.

@rkrug
rkrug commented Jan 26, 2017

I just looked at the homebrew recipe for gdal2 which is possibly a good starting point as it effectively covers all options. The first ones should be satisfied:

  depends_on "libpng"
  depends_on "jpeg"
  depends_on "giflib"
  depends_on "libtiff"
  depends_on "libgeotiff"
  depends_on "proj"
  depends_on "geos"

  depends_on "sqlite" # To ensure compatibility with SpatiaLite.
  depends_on "freexl"
  depends_on "libspatialite"

  depends_on "postgresql" => :optional
  depends_on "mysql" => :optional

  depends_on "homebrew/science/armadillo" if build.with? "armadillo"

these ones depend on certain options.

  if build.with? "libkml"
    depends_on "autoconf" => :build
    depends_on "automake" => :build
    depends_on "libtool" => :build
  end

  if build.with? "complete"
    # Raster libraries
    depends_on "homebrew/science/netcdf" # Also brings in HDF5
    depends_on "jasper"
    depends_on "webp"
    depends_on "homebrew/science/cfitsio"
    depends_on "epsilon"
    depends_on "libdap"
    depends_on "libxml2"
    depends_on "openjpeg"

    # Vector libraries
    depends_on "unixodbc" # OS X version is not complete enough
    depends_on "xerces-c"

    # Other libraries
    depends_on "xz" # get liblzma compression algorithm library from XZutils
    depends_on "poppler"
    depends_on "podofo"
    depends_on "json-c"
  end

  depends_on :java => ["1.7+", :optional, :build]

  if build.with? "swig-java"
    depends_on "ant" => :build
    depends_on "swig" => :build
  end
@rkrug
rkrug commented Jan 26, 2017 edited

@rsbivand @edzer I have just submitted a pull request which should address the gdal compilation questions.

@cboettig
Contributor

Thanks everyone for the feedback.

I agree with @rsbivand that getting the appropriate foundation in the gdal configuration options is probably the first step. I'm not much of a user of gdal-based stack, so community input all the more essential here. Though we're aiming for images with an intentional set of libraries rather than a kitchen sink, it's probably best to err on the side of inclusivity, particularly for such configurations where there is no easy way to add that functionality in later if it's not compiled in to start.

@edzer Can you show me how you are running that setdiff command?

@rkrug after a closer look, I'm not sure that the brew-inspired configuration options are really the best starting place, and would love to hear others weigh in. According to http://trac.osgeo.org/gdal/wiki/BuildingOnUnix it looks like many of the basic image libraries are pre-packaged (libz, libtiff, libgeotiff, libpng, libgif, and libjpeg), though maybe it's worth using native shared libs all the same. It would seem more natural to grab the debian build-deps rather than copy configuration options from brew, so I've pushed that into the draft geospatial image so folks can take a look at that.

This still gives a slightly different list of supported formats than I get from debian repos version of libgdal-dev, on the rocker/geospatial image I see:

# gdal-config --formats
gxf gtiff hfa aigrid aaigrid ceos ceos2 iso8211 xpm sdts raw dted mem jdem envisat elas fit vrt usgsdem l1b nitf bmp airsar rs2 ilwis rmf leveller sgi srtmhgt idrisi gsg ingr ers jaxapalsar dimap gff cosar pds adrg coasp tsx terragen blx msgn til r northwood saga xyz hf2 kmlsuperoverlay ctg e00grid zmap ngsgeoid iris map cals safe sentinel2 mrf wcs wms plmosaic wmts grib bsb jpeg2000 netcdf hdf4 ogdi gif jpeg png pcraster pcidsk rik ozi pdf rasterlite mbtiles postgisraster arg

while the prebuilt version shows:

gdal-config --formats
gxf gtiff hfa aigrid aaigrid ceos ceos2 iso8211 xpm sdts raw dted mem jdem envisat elas fit vrt usgsdem l1b nitf bmp pcidsk airsar rs2 ilwis rmf leveller sgi srtmhgt idrisi gsg ingr ers jaxapalsar dimap gff cosar pds adrg coasp tsx terragen blx msgn til r northwood saga xyz hf2 kmlsuperoverlay ctg e00grid zmap ngsgeoid iris map webp epsilon wcs wms dods grib bsb jpeg2000 netcdf hdf5 hdf4 ogdi gif jpeg png pcraster rik ozi pdf rasterlite mbtiles postgisraster arg

(e.g. includes hdf5), so not quite sure why the lists are different. Anyway, would like more input on what configuration options we should include and how to test for them. It sounds like maybe SQLite, HDF5, netcdf, and maybe GRASS would be worth having built in support for out of the box, though probably not everything included in ppa:ubuntugis-unstable?

@rkrug
rkrug commented Jan 27, 2017

@cboettig I agree with your assessments. Below find some comments

internal versus external libraries: I only used the homebrew configuration as it has a good set of formats supported - I did not consider the internal versus external libraries and as I see it, there is no compelling reason not to use the internal libraries, so these options can be skipped.

configuration gdal: @rsbivand I think we should aim for a complete set of formats in gdal. The other option would be to use a "most used subset" and to provide a script which would install gdal (same version) with the complete set of formats; although I don't know if rgdal needs to be re-installed afterwards?
I think the following config options should be added to support additional formats (which would be in line with the homebrew build):

    --with-sqlite3 \
    --with-freexl \
    --with-spatialite \
    --with-odbc \
    --with-curl \

I don't know about these as I have never used the GRASS plugin.

    --without-grass \
    --without-libgrass \

dependencies libgdal-dev: As far as you said, this is gdal 1.x in jessie - is this correct? In this case, I don't know if this really installs the dependencies of gdal 2.x?

At raster formats and vector formats there are lists of supported formats and if they are compiled by default.

@rsbivand
rsbivand commented Jan 27, 2017 edited

I would not see spatialite as necessary - in my experience it is often fragile, pulls in geos and freexl which aren't needed via sqlite3. I do see expat and xerces-c as essential to get all the GML etc. formats - they are not built in. GEOJSON is declared as built in. I don't think any plugins are helpful, as you get circular dependencies (for GRASS you have to install GDAL w/o GRASS drivers, then GRASS, then GDAL again pointing to installed GRASS to get the dependencies.

@rkrug
rkrug commented Jan 27, 2017

@rsbivand yes - JSON is built in. spatialite: I like the idea - is it superseded by sqlite3? I don't think so?

I willed the expat and xerces-c libraries and options.

@rsbivand

No, spatialite seems to want to be a full GIS, with sqlite3 and GEOS among the dependencies. For most purposes, keeping the dependencies separate is very sensible , that is do not let GDAL or any of its dependencies depend on GEOS (some of the wilder ones use the GEOS C++ API, which is liable to frequent breakage. So I never build GDAL with spatialite, because the data handling spatialite tries to offer is better handled after reading, separately. sf now links separately to GDAL and GEOS to avoid cross-dependency issues. sqlite3 is essential, as are expat and xerces-c, but spatialite for me causes dependency problems that can be hard to debug.

@rkrug
rkrug commented Jan 27, 2017 edited

Thanks for the clarification - than we should leave spatialite out.

expat and xerces-c: I have added libexpat1-dev and libxerces-c-dev to the dependencies - is that all that needs to be done to enable the other formats? I will push it up to https://github.com/rkrug/rocker-versioned/blob/geospatial-gdal/geospatial/Dockerfile branch geospatial-goal once my build has finished successfully and it will be included in the pull request.

@rkrug
rkrug commented Jan 27, 2017

hdf4 / hdf5 : it seems that we can have either 4 or 5? Is this true?

@rsbivand

It's very muddled; I've seen NASA HDF4 picked up as NetCDF when NetCDF/HDF5 are available drivers, then failing on read. For this situation, I guess that NetCDF will also provide HDF5, but have never understood the logic. Maybe @mdsumner knows?

@rkrug
rkrug commented Jan 27, 2017

I have now the netcdf and hdf4 installed, but the hdf5 is not picked up during configure

checking for H5Fopen in -lhdf5... no

despite libhdf5-dev being installed.

@edzer
edzer commented Jan 27, 2017

It seems you need to install libhdf4-alt-dev rather than libhdf4-dev, see here.

@edzer
edzer commented Jan 27, 2017

My gdal-config from ubuntugis-unstable has both 4 and 5: gxf gtiff hfa aigrid aaigrid ceos ceos2 iso8211 xpm sdts raw dted mem jdem envisat elas fit vrt usgsdem l1b nitf bmp airsar rs2 ilwis rmf leveller sgi srtmhgt idrisi gsg ingr ers jaxapalsar dimap gff cosar pds adrg coasp tsx terragen blx msgn til r northwood saga xyz hf2 kmlsuperoverlay ctg e00grid zmap ngsgeoid iris map cals safe sentinel2 mrf webp epsilon wcs wms plmosaic wmts dods grib bsb openjpeg netcdf hdf5 hdf4 ogdi gif jpeg png pcraster pcidsk rik ozi pdf rasterlite mbtiles postgisraster arg

@mdsumner
mdsumner commented Jan 27, 2017 edited

@rsbivand NetCDF4 is implemented with HDF5, it's essentially a simplification of HDF5 to behave like the "classic" NetCDF format. HDF4 is just a weird old format, HDF5 is all-powerful with near-database-level-generality. (KEA is another format implemented in HDF5).

It can be extremely confusing, E.g. since raster will use ncdf4 in favour of GDAL-NetCDF (even when it's available in rgdal), you can't make it go via GDAL without manipulating the R package library used. You sometimes really need to know what pathway your read has gone through, and sometimes it's a simple tweak or just a different package that will make it work.

HDF4 and HDF5 are radically different, there's really no relationship apart from "hierarchical", but NetCDF4 and NetCDF3 are pretty compatible (despite being completely different). The main changes in NetCDF4 come down to internal compression, chunking (i.e. tiling), compound types (data.frame-like structures), and groups (a data-set-collection file-abstraction).

NASA's ocean colour site has recently transitioned from all being HDF4 to nearly all being in NetCDF4 (some of the L0 and L1 archive is still HDF4 I believe). It's a good source of files for detailed comparisons.

(There are build-level options for NetCDF4 like parallelization, OpenDAP/Thredds/DODS capability that might need consideration for rocker, and this obviously complicates the options for GDAL. Things like HTTP and dods can clash over ambiguity around the source of a URL, but I think the DRIVER:dsn syntax in GDAL avoids those problems nowadays).

@rkrug
rkrug commented Jan 27, 2017

@edzer

It seems you need to install libhdf4-alt-dev rather than libhdf4-dev, see here.

OK - I changed this - but hdf4 was enabled before as well...

I have now the following formats: gxf gtiff hfa aigrid aaigrid ceos ceos2 iso8211 xpm sdts raw dted mem jdem envisat elas fit vrt usgsdem l1b nitf bmp airsar rs2 ilwis rmf leveller sgi srtmhgt idrisi gsg ingr ers jaxapalsar dimap gff cosar pds adrg coasp tsx terragen blx msgn til r northwood saga xyz hf2 kmlsuperoverlay ctg e00grid zmap ngsgeoid iris map cals safe sentinel2 mrf wcs wms plmosaic wmts grib bsb netcdf hdf4 gif jpeg png pcraster pcidsk rik ozi pdf rasterlite mbtiles postgisraster arg

@mdsumner So what would be the best to have in gdal? And what do we need for hdf5 in addition to libhdf5-dev?

I just pushed the new version to https://github.com/rkrug/rocker-versioned/blob/geospatial-gdal/geospatial/Dockerfile branch geospatial-goal

@mdsumner
mdsumner commented Jan 27, 2017 edited

Only zlib 1.2 and szlib 2.0 afaik, but possibly libhdf5-serial-dev as per https://github.com/rouault/gdal_coverage/blob/trunk_with_coverage/.travis.yml

(I was almost certain one of the rstudio rocker images already included HDF5 a few months ago).

ubuntu-gis-unstable must include it since my install scripts don't mention HDF5 explicitly any more, but I usually include that. I don't tend to explore this much any more but can help if needed.

@mdsumner

There is some amazingly good information in this thread, I'm a latecomer catching up a bit.

@mdsumner
mdsumner commented Jan 27, 2017 edited

@dlebauer lidR and rlas should be considered over rLiDAR, especially for LAS read/write

@dlebauer

@mdsummer I'll take your advice and give those a try

@mdsumner

Raster isn't much good without ncdf4, and while I think that should be in the list with rgdal another option should include rgdal and not ncdf4.

The raster package (with ncdf4) does a much better job at higher-level abstraction over multi-dimensional NetCDF variables than GDAL does, but raster+rgdal can be used to read a much wider range of NetCDF files. ncdf4 and rgdal and raster happily co-exist, but rgdal-for-NetCDF-via-raster will be masked if ncdf4 is present.

Clear as mud

@rkrug
rkrug commented Jan 27, 2017

@mdsumner I now have

  libhdf5-dev
  libhdf5-serial-dev
  zlib1g-dev

as dependencies for hdf5, but It is still not enabled.

I get the following message in the configure output:

checking for H5Fopen in -lhdf5... no
@rkrug
rkrug commented Jan 27, 2017

I have just asked on the goal-dev list about hdf5 on Lennie.

@rkrug
rkrug commented Jan 27, 2017

HDF5 is in - there is a short coming in the HDF5 detection in 2.1 branch but has been fixed in trunk. See email from Even Rouault at http://permalink.gmane.org/gmane.comp.gis.gdal.devel/44474

@rkrug
rkrug commented Jan 28, 2017

Thanks @cboettig for merging the pull request.

Now we have to decide if there is any additional software / libraries needed before we finally can go to the R packages.

@mdsumner

thanks @rkrug, glad you got this - sorry I wasn't much help! This will be really useful

@rkrug
rkrug commented Jan 28, 2017

@mdsumner Thanks - but your input was very useful to understand things. I am sure that you can give tons of input now with additional software and packages.

@edzer
edzer commented Jan 28, 2017

@mdsumner you mentioned DODS, which afaics is not among the drivers now supported in this image. Is this critical? I can't find which external dependencies it needs, but it is in ubuntugis-unstable.

MySQL seems to be not supported by gdal on debian.

@mdsumner

Yes, having DODS is definitely helpful for pointing GDAL at NetCDF Thredds servers.

It might clash with the general HTTP driver and that should be checked. This is something that's not been used a lot from what I can see, different servers have different problems and arranging the right stack is not easy. There's a lot of amazing data available though!

Reading from Thredds (DODS/OpenDAP) is also straightforward with ncdf4 (and so via raster), and this should all coexist happily I believe as long as we use NetCDF:[address] and DODS:[address] for GDAL itself.

raster::raster() and ncdf4::nc_open() can take the Thredds address directly. This example only works with ncdf4 but I know others can work with GDAL.

library(ncdf4)
u <- "http://tds.hycom.org/thredds/dodsC/GLBu0.08/expt_91.1/ts3z"
nc <- nc_open(u)
nc
# File http://tds.hycom.org/thredds/dodsC/GLBu0.08/expt_91.1/ts3z (NC_FORMAT_CLASSIC):
#   
#   3 variables (excluding dimension variables):
#   double tau[time]   
# long_name: Tau
# units: hours since analysis
# time_origin: 2016-04-18 00:00:00
# NAVO_code: 56
# short water_temp[lon,lat,depth,time]   
# long_name: Water Temperature
# standard_name: sea_water_temperature
# units: degC
# _FillValue: -30000
# missing_value: -30000
# scale_factor: 0.00100000004749745
# add_offset: 20
# NAVO_code: 15
# short salinity[lon,lat,depth,time]   
# long_name: Salinity
# standard_name: sea_water_salinity
# units: psu
# _FillValue: -30000
# ...
# 

# nope
#library(raster)
#r <- brick(u, varname = "salinity")
#system(sprintf("gdalinfo %s", u))
@edzer
edzer commented Jan 28, 2017

@cboettig I did the setdiff rather clumsy: on one machine I did

library(sf)
d = st_drivers("") # get all
dump("d")

then I copied dumpdata.R to the other machine (container), source dumpdata.R in an R session, get the local list of drivers there the same way, safe it to d0, and then setdiff(d, d0).

@rkrug
rkrug commented Jan 31, 2017

@edzer @mdsumner @cboettig Submitted Pull request rocker-org/rocker-versioned#19 for DODS support. Don't know about any clashes though.

@rkrug
rkrug commented Jan 31, 2017

@cboettig @eddelbuettel What actually is the incentive to put everything into one RUN statement? For debugging it is much easier to have multiple, and the final result should be the same. Just wondering.

@eddelbuettel
Member

Each RUN statements creates one Docker layer. So the the result is not the same.

Having more Docker layers than needed is discouraged by the Docker documentation.

@rkrug
rkrug commented Jan 31, 2017

@eddelbuettel OK - makes sense. Found it here.

@rkrug
rkrug commented Feb 2, 2017

@rsbivand @edzer @mdsumner @dlebauer @cboettig

As we now have a gdal version which seems to cover most of the needs, we should go to the next step:

  1. which R packages should be installed to have a useful and general r-spatial docker image?
  2. is for this any additional software needed and
  3. should / can we split these in different specialised docker images (ideas would be GRASS, R, QGIS for rqgis, postgresql, ... - but which one are really needed?)

Any suggestions? Ideas? Alternatives?

@mdsumner
mdsumner commented Feb 2, 2017

@rkrug thanks for the heads up, , key ones I want are sf, sp/rgdal, rgeos, ncdf4, proj4, and rhdf5 (Bioconductor), all relying on key system-libs. then raster, maptools, lidR, rlas

Do we have libssl-dev? I always find I need that so that download options are available for various providers.

I'll see if I can organize the packages by those usage groups you've listed, but I think those groups you have make sense.

@rkrug
rkrug commented Feb 2, 2017

@mdsumner OK - will submit a pull request soon for these packages and the library.

@rkrug
rkrug commented Feb 2, 2017

@mdsumner @cboettig I am leaving rhdf5 out at the moment as I have absolutely no experience with Bioconductor and how it should be used in these rocker images.

@cboettig
Contributor
cboettig commented Feb 2, 2017

@rkrug For bioconductor packages, just list the bioconductor repository in an argument to install2.r, like so: https://github.com/rocker-org/rocker-versioned/blob/master/tidyverse/Dockerfile#L12

(Ideally the version-specific images should use the corresponding version-specific release repo, but it current release is fine for now. We can edit the Makefile later so that it swaps in the version-specific bioconductor url for the 3.3.2 and 3.3.1 release tags)

The list sounds good to me so far.

@dlebauer
dlebauer commented Feb 2, 2017

Thanks everyone for putting this together! I am planning to use it in a workshop next week ...

I would find Postgres + PostGIS useful but not essential since these can run on another server or image.

@rkrug
rkrug commented Feb 3, 2017

@dlebauer There were discussions here of running two servers in one docker container, which is not that easily possible if you want to access them from the outside. The docker way is to have single purpose / server containers, i.e. a second container for the Postgres + PostGIS server and than to make them communicate (no idea how...).

Good luck with the workshop and please give feedback.

@edzer
edzer commented Feb 3, 2017

I think the goal of many simple single docker containers is ease of administration. The goal I would see here is reproducible science. In case you'd require two docker containers to get R + PostGIS running, the ease of setting up and administering doesn't outweigh the complexity of using them, IMO.

With gdal installed from source, we may have to install PostGIS from source too, as is also done here.

@rkrug
rkrug commented Feb 3, 2017

@edzer exactly - reproducible research but also ease in teaching / workshops.

@cboettig We could overcome this problem by simply basing the image not on one which contains a studio server which is exposed, but to use a "normal" R image r-ver:latest. Then the port of the postgresql server in the image could be exposed - or am misunderstanding you?

@rkrug
rkrug commented Feb 3, 2017

@edzer forgotten one aspect why I am keen in this: making it much easier to work cross-platform in projects and get consistent results.

@rkrug rkrug added a commit to rkrug/rocker-versioned that referenced this issue Feb 3, 2017
@rkrug rkrug Add packages and library as suggested by mdsumner ac24e3c
@rkrug rkrug referenced this issue in rocker-org/rocker-versioned Feb 3, 2017
Merged

Geospatial base #20

@cboettig cboettig added a commit to rocker-org/rocker-versioned that referenced this issue Feb 6, 2017
@rkrug @cboettig rkrug + cboettig Geospatial base (#20)
* Sort libraries, options and R packages alphabetically

As suggested by https://docs.docker.com/engine/userguide/eng-image/dockerfile_best-practices/#sort-multi-line-arguments

* Add packages and library as suggested by mdsumner

see rocker-org/rocker#213 (comment)
736d1df
@cboettig cboettig added a commit to rocker-org/rocker-versioned that referenced this issue Feb 7, 2017
@rkrug @cboettig rkrug + cboettig Fix dependencies (#21)
* Sort libraries, options and R packages alphabetically

As suggested by https://docs.docker.com/engine/userguide/eng-image/dockerfile_best-practices/#sort-multi-line-arguments

* Add packages and library as suggested by mdsumner

see rocker-org/rocker#213 (comment)

* Fix missing libraries
11c5536
@rkrug
rkrug commented Feb 8, 2017

@edzer @mdsumner @rsbivand We have now the packages as suggested by @mdsumner. Are there any other suggestions of important packages which should be installed? Please see the Dockerfile for a list of packages installed.

Than there is the question of Postgres, PosGIS and GRASS. I would suggest to create separate images for that as for PostGIS we would have to run the server inside and make it accessible from the outside?

@edzer
edzer commented Feb 8, 2017 edited

Important packages missing: sf, spatstat, geoR and RandomFields.

sf was broken with GEOS 3.4.x, this has been fixed now.

@rsbivand
rsbivand commented Feb 8, 2017

Apparently spatstat, geoR and RandomFields are commented out because they suggest tcltk or suggest packages importing tcltk. Of course this doesn't mean that they use tcltk themselves unprotected - I think those dropping these packages need themselves to check that the "suggests" are properly protected in examples and tests so that presumably the absence of interactive tcltk does not terminate anything.

Maybe other packages should be in the CRAN Task View core list?

@rkrug
rkrug commented Feb 8, 2017

OK - I have added sf and RandomFields in my branch geospatial-base but as @rsbivand points out, geoR needs R to be compiled with TCLTK support. To do this, we have to compile R ourselves for this image. I couldn't download spatstat as there seem to be a problem - but I think it also needs R compiled with tcltk support.

@cboettig
Contributor
cboettig commented Feb 8, 2017

I can modify the R compile to enable opt-in tcltk support like I've done for X11 rocker-org/rocker-versioned#13 .

Re postgres support, do you just need gdal built with postgres dev libraries? As far as having a postgresql server installed, as others have said on this list already the docker way is really to pull that in through a linked postgresql image. With a short docker-compose file this is hardly more complicated than having it built into the container, and it's much more flexible. For instance, some users will prefer to use a postgres database that already exists and may not be in a container anyway, may prefer the postgres server run on a different machine, may want to link a different postgres server version, may want to finer control over the postgres database location (e.g. in the container vs as a linked volume). All these issues are well worked out following the standard docker model of linking database containers when necessary, so I think we should stick with that.

@rkrug
rkrug commented Feb 8, 2017

@cboettig I think to enable tcltk would be a good idea and make installation of these remaining CRAN Task View Spatial core packages possible - @rsbivand @edzer @mdsumner any comments?

I must say I never used Postgresql - and therefore I have no idea about the requirements. But I think the main requirement to build gdal with postgresql - and that should be relatively straight forward. @rsbivand @edzer @mdsumner Any suggestions?

Also, I am just starting to swim in docker-world and never dived deeper into networking different docker container and docker-compose. But I agree that the docker way is to have single purpose container which are linked - so let's stick to this paradigm and leave a postgresql server out.

@edzer
edzer commented Feb 8, 2017

For the gdal PostGIS driver, I believe you only need libpq-dev installed.

Header files and static library for compiling C programs to link with the libpq library in order to communicate
with a PostgreSQL database backend.
@cboettig
Contributor
cboettig commented Feb 8, 2017

Right. libpq-dev should already be in there from upstream: https://github.com/rocker-org/rocker-versioned/blob/master/tidyverse/Dockerfile#L9

@edzer
edzer commented Feb 8, 2017

Ah, sorry, the PostGIS client driver is already there:

root@838f157a4184:/# ogrinfo --formats | grep -i post
  PostgreSQL -vector- (rw+): PostgreSQL/PostGIS
@cboettig
Contributor
cboettig commented Feb 8, 2017

@rkrug okay, r-ver now builds R with tcl/tk support (once the build finishes on hub). You'll just need to add libtcl8.5 and libtk8.5 to your apt-get install list and then geoR should install no problem.

@rkrug
rkrug commented Feb 9, 2017

@cboettig Thanks - I will try it out and report back.

@edzer
edzer commented Feb 10, 2017

I tried installing with libtcl8.5 and libtk8.5, but failed; spatstat suggests rpanel, and this breaks with:

  error: Tcl/Tk support is not available on this system
Error : packagetcltkcould not be loaded
ERROR: lazy loading failed for packagerpanel

so it seems that R has been built without tcltk support. I see three options:

  1. make sure we have an R installed with tcltk support
  2. install packages without all their Suggests: dependencies
  3. drop those that have tcltk in suggests: (spatstat, geoR, RandomFields)

For 1 and 2, I don't know how to do this.

@rsbivand
  1. is the only sensible way to proceed, but means that the affected packages must use if(require()) or if(requireNamespace()) properly. They could be persuaded to do this, I think. I'll see if I can check this with an R without tlctk.
@edzer
edzer commented Feb 10, 2017

I don't think this requires changes to the packages, but to the install script. Packages in Suggests: are not loaded when the package itself is loaded.

@rsbivand

No, but if the protocol runs R CMD check, it may fail if non-present or no-capable packages are not if()-ed on require() or requireNamespace(). So far trouble with geoR which spills over to RandomFields; spatstat runs R CMD check cleanly on a system built without tcltk, the package is a stub and throws an error, and:

> capabilities("tcltk")
tcltk 
FALSE 
> library(tcltk)
Error: package or namespace load failed for ‘tcltk’:
 .onLoad failed in loadNamespace() for 'tcltk', details:
  call: fun(libname, pkgname)
  error: Tcl/Tk support is not available on this system
In addition: Warning message:
S3 methods ‘as.character.tclObj’, ‘as.character.tclVar’, ‘as.double.tclObj’, ‘as.integer.tclObj’, ‘as.logical.tclObj’, ‘as.raw.tclObj’, ‘print.tclObj’, ‘[[.tclArray’, ‘[[<-.tclArray’, ‘$.tclArray’, ‘$<-.tclArray’, ‘names.tclArray’, ‘names<-.tclArray’, ‘length.tclArray’, ‘length<-.tclArray’, ‘tclObj.tclVar’, ‘tclObj<-.tclVar’, ‘tclvalue.default’, ‘tclvalue.tclObj’, ‘tclvalue.tclVar’, ‘tclvalue<-.default’, ‘tclvalue<-.tclVar’, ‘close.tkProgressBar’ were declared in NAMESPACE but not found 

so probably geoR needs to test capabilities("tcltk").

@rsbivand

This patch fixes geoR; when applied, RandomFields and geoR both pass R CMD check on the system without tcltk:

geoR.patch.txt

and spatstat did so anyway.

@edzer
edzer commented Feb 10, 2017

Building the docker image doesn't run any R CMD checks.

@edzer
edzer commented Feb 10, 2017

Running the install script with --deps FALSE for those packages with tcltk in Suggests: should solve the problem here; I'm trying that now.

@rsbivand

OK, though I think that packages should be able themselves to handle missing capabilities, so this step should not be necessary. Could you consider passing on the patch to Paulo? - Having import(tcltk) in NAMESPACE when it is declared in Suggests: is wrong, but apparently doesn't trigger a warning on check.

@cboettig
Contributor

GeoR installs for me on the ver-r with Tcltk libs, so maybe the cascade didn't finish. I'll take a look later today

@edzer
edzer commented Feb 10, 2017

OK, in this PR everything works without tcltk, and up to geoR; geoR would need Roger's patch.

@cboettig
Contributor

I've pushed a commit along the same lines as @edzer's PR in which all the packages previously commented out due to the tcl/tk dependency are back in. @rkrug just noticed you had --dep TRUE, which installs the full suggests lists for all of those packages (of course required depends/imports are still automatically installed, this is the same behavior as the deps argument to install.packages(). Having all the suggests pulls in a lot of tangential stuff, so I've dropped it; I think you're already explicitly naming all the main packages.

Let me know if this looks good, should be in the current rocker/geospatial image now.

@edzer
edzer commented Feb 11, 2017

Sorry, I got lost. Could you pls tell me where to find this commit?

@cboettig
Contributor

It's rocker-org/rocker-versioned@33b2d7c also feel free to send your PRs directly against the rocker-versioned repo, it might be less round-about than sending to @rkrug who then sends them to me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment