Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speeding up projected maps in geoviews? #230

Closed
rsignell-usgs opened this issue Sep 19, 2018 · 18 comments
Closed

Speeding up projected maps in geoviews? #230

rsignell-usgs opened this issue Sep 19, 2018 · 18 comments

Comments

@rsignell-usgs
Copy link
Contributor

I'm visualizing public Landsat data with hvplot.xarray but I think this is likely a geoviews issue.

It takes about 3 seconds to render an image when not using the crs:
2018-09-19_17-11-28

It takes about 3 minutes to render an image when using the crs:
2018-09-19_17-12-26

Without crs it's using my dask cluster, but with crs it does not use the cluster.

This is the code I'm using:

import rasterio 
import xarray as xr
import hvplot.xarray
from dask.distributed import Client, progress
import cartopy.crs as ccrs

client = Client()

image_url = 'https://storage.googleapis.com/gcp-public-data-landsat/LC08/01/047/027/LC08_L1TP_047027_20130421_20170310_01_T1/LC08_L1TP_047027_20130421_20170310_01_T1_B4.TIF'

da = xr.open_rasterio(image_url, chunks={'band': 1, 'x': 2048, 'y': 2048})

band1 = da.sel(band=1).persist()

display(band1.hvplot(rasterize=True, width=600, height=400, cmap='viridis'))

crs = ccrs.UTM(zone='10n')
display(band1.hvplot(crs=crs, rasterize=True, width=600, height=400, cmap='viridis'))

Here is the notebook.

Is this expected?

Is there anything I can do to speed up visualization with crs specified?

@rsignell-usgs rsignell-usgs changed the title Does crs work with Dask? Speeding up projected maps in geoviews? Sep 26, 2018
@rsignell-usgs
Copy link
Contributor Author

rsignell-usgs commented Nov 1, 2018

I heard offline from @dharhas that this is a Cartopy issue. @pelson, is there any way I could speed this up?

@philippjfr
Copy link
Member

Apologies for not replying sooner, I do think this is down to cartopy but it's also possible that I'm using a suboptimal codepath. To project images we are currently calling cartopy.img_transform.warp_array, which I think is the bottleneck here.

@jbednar
Copy link
Member

jbednar commented Nov 1, 2018

Can the data in the image be projected ahead of time, before displaying it?

@philippjfr
Copy link
Member

philippjfr commented Nov 1, 2018

Yes, gv.project will do that for you but that's still 3 minutes and therefore way too slow.

@rsignell-usgs
Copy link
Contributor Author

So I'm guessing gv.project does not use Dask either, right?

@philippjfr
Copy link
Member

That's right, it mostly just calls the appropriate cartopy transform depending on the input type. I've played around parallelizing this with dask with very mixed results.

@rsignell-usgs
Copy link
Contributor Author

rsignell-usgs commented Nov 2, 2018

Just as an experiment, I tried downloading the tif image locally and using gdalwarp to warp to a regular lon/lat grid. It took 2 seconds to download the image and 3 seconds to warp onto a regular lon/lat grid, so it seems that there must be something very inefficient going on to make:

crs = ccrs.UTM(zone='10n')
display(band1.hvplot(crs=crs, rasterize=True, width=600, height=400, cmap='viridis'))

take 3 minutes.

Here's the gdalwarp procedure if you want to recreate:

jovyan@jupyter-rsignell-2dusgs:~/Landsat$ time wget https://storage.googleapis.com/gcp-public-data-landsat/LC08/01/047/027/LC08_L1TP_047027_20130421_20170310_01_T1/LC08_L1TP_047027_20130421_20170310_01_T1_B4.TIF

LC08_L1TP_047027_20130421 100%[=====================================>]  77.16M  37.4MB/s    in 2.1s

2018-11-02 13:09:55 (37.4 MB/s) - ‘LC08_L1TP_047027_20130421_20170310_01_T1_B4.TIF’ saved [80907084/80907084]

real    0m2.217s
user    0m0.096s
sys     0m0.312s
jovyan@jupyter-rsignell-2dusgs:~/Landsat$ time gdalwarp -t_srs EPSG:4326 LC08_L1TP_047027_20130421_20170310_01_T1_B4.TIF lonlat.tif
Creating output file that is 9183P x 5981L.
Processing input file LC08_L1TP_047027_20130421_20170310_01_T1_B4.TIF.
0...10...20...30...40...50...60...70...80...90...100 - done.

real    0m3.102s
user    0m2.852s
sys     0m0.248s

Even if I try the more time consuming -r regrid options in gdalwarp (the default is nearest), I can't get anything to take longer than 15 seconds:

jovyan@jupyter-rsignell-2dusgs:~/Landsat$ time gdalwarp -t_srs EPSG:4326 -r lanczos LC08_L1TP_047027_20130421_20170310_01_T1_B4.TIF lonlat.tif
Creating output file that is 9183P x 5981L.
Processing input file LC08_L1TP_047027_20130421_20170310_01_T1_B4.TIF.
0...10...20...30...40...50...60...70...80...90...100 - done.

real    0m15.343s
user    0m15.016s
sys     0m0.324s

@pelson or @scottyhq, who would have insight here?

@philippjfr
Copy link
Member

Thanks, for your experiments, I'll play around with this now. The equivalent operation using geoviews which ends up making a call to cartopy.img_transform.warp_array is this:

import geoviews as gv
import cartopy.crs as ccrs
img = gv.load_tiff('/Users/philippjfr/Downloads/LC08_L1TP_047027_20130421_20170310_01_T1_B4.TIF')
projected = gv.project(img, projection=ccrs.PlateCarree())

@philippjfr
Copy link
Member

Here's the output of profiling the project operation with %%prun:

         15066 function calls (14817 primitive calls) in 242.420 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000  242.420  242.420 {built-in method builtins.exec}
        1    0.000    0.000  242.420  242.420 <string>:2(<module>)
        1    0.000    0.000  242.420  242.420 parameterized.py:2416(__new__)
      2/1    0.000    0.000  242.419  242.419 operation.py:146(__call__)
      2/1    0.000    0.000  242.419  242.419 operation.py:113(_apply)
        1    0.000    0.000  242.419  242.419 projection.py:396(_process)
        6    0.000    0.000  242.416   40.403 dimension.py:713(map)
        1    0.143    0.143  242.414  242.414 projection.py:292(_process)
        1    1.119    1.119  242.132  242.132 img_transform.py:122(warp_array)
        1   21.111   21.111  240.060  240.060 img_transform.py:220(regrid)
        1  115.320  115.320  115.320  115.320 {method 'query' of 'scipy.spatial.ckdtree.cKDTree' objects}
        4   98.791   24.698   98.791   24.698 {method 'transform_points' of 'cartopy._crs.CRS' objects}

@rsignell-usgs
Copy link
Contributor Author

Wow @philippjfr , looks like we have some room for improvement! Great!
cc @josef-ebd

@rsignell-usgs
Copy link
Contributor Author

@philippjfr this Cartopy PR from @QuLogic looks promising!

@philippjfr
Copy link
Member

Definitely some solid improvement there, but not enough to make it very usable:

Using scipy's kdtree:

        1  115.320  115.320  115.320  115.320 {method 'query' of 'scipy.spatial.ckdtree.cKDTree' objects}

using pykdtree:

        1   53.287   53.287   53.287   53.287 {method 'query' of 'pykdtree.kdtree.KDTree' objects}

@philippjfr
Copy link
Member

Actually, after calling rasterize it's pretty usable, no reason to project the whole image afterall.

@philippjfr
Copy link
Member

philippjfr commented Nov 2, 2018

Okay, I'm now wondering whether hvplot is doing something quite suboptimal, using geoviews directly things are pretty quick:

import geoviews as gv
from holoviews.operation.datashader import rasterize
gv.extension('bokeh')

tiff = gv.load_tiff('/Users/philippjfr/Downloads/LC08_L1TP_047027_20130421_20170310_01_T1_B4.TIF')
rasterize(tiff)

It seems like hvplot is projecting before rasterizing, which would explain the slow speed.

@philippjfr
Copy link
Member

philippjfr commented Nov 2, 2018

Okay to summarize, the main issue here was that hvplot always projects all the data before it applies rasterize/datashade options. This makes sense in some cases because it means the data does not have to be reprojected every time you zoom or pan, but it can also result in a huge initial overhead like this case. My suggestion is to add an explicit project=True/False flag to hvplot and disable to automatic projection.

The secondary problem is that projecting images using cartopy is pretty slow, which is substantially improved using pykdtree instead of scipy's kdtree.

The timings now (with and without the new pykdtree implementation in cartopy) are as follows:

  1. Projecting the whole image ahead of time with pykdtree: initialization time ~200 seconds, time to rerender when zooming/panning ~180 ms
  2. Projecting the whole image ahead of time with scipy kdtree: initialization time ~240 seconds, time to rerender when zooming/panning ~180 ms
  3. Projecting after regridding with pykdtree: initialization time ~1 second, time to rerender when zooming/panning ~1 second
  4. Projecting after regridding with scipy kdtree: initialization time ~1.8 second, time to rerender when zooming/panning ~1.8 second

Once I've added the project=True/False option to hvplot the user can choose the tradeoff between a very long initialization time and faster zooming/panning.

@philippjfr
Copy link
Member

I've opened holoviz/hvplot#106 to add the project=True/False argument to hvplot.

@rsignell-usgs
Copy link
Contributor Author

So roughly a 100 fold speedup? I think we will notice that! 😜
Fantastic!

@rsignell-usgs
Copy link
Contributor Author

Here's my example with the latest release, just to show it's working in seconds now!

2018-11-15_16-43-35

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants