# Datashading Sentinel-2 raster satellite imagery

This notebook utilises the datashading procedure presented by James A. Bednar (https://anaconda.org/jbednar/landsat/notebook) on Sentinel-2 MSI data.

In [71]:
import numpy as np
import xarray as xr
import holoviews as hv
import geoviews as gv
import datashader as ds
import cartopy.crs as ccrs

from holoviews.operation.datashader import regrid, shade
from bokeh.tile_providers import STAMEN_TONER

hv.extension('bokeh', width=80)

### Load Sentinel-2 Data 

Sentinel-2 MSI data is measured in different frequency bands, revealing different types of information:

In [72]:
import pandas as pd
band_info = pd.DataFrame([
        ("01", "Coastal Aerosol",       " 20",    0.443,  "60",),
        ("02", "Blue",                  " 65",    0.490,  "10",),
        ("03", "Green",                 " 35",    0.560,  "10",),
        ("04", "Red",                   " 30",    0.665,  "10",),
        ("05", "Vegetation Red Edge",   " 15",    0.705,  "20",),
        ("06", "Vegetation Red Edge",   " 15",    0.740,  "20",),
        ("07", "Vegetation Red Edge",   " 20",    0.783,  "20",),
        ("08", "NIR",                   "115",    0.842,  "10",),
        ("8A", "Narrow NIR",            " 20",    0.865,  "20",),
        ("09", "Water vapour",          " 20",    0.945,  "60",),
        ("10", "SWIR - Cirrus",         " 20",    1.375,  "60",),
        ("11", "SWIR",                  " 90",    1.610,  "20",),
        ("12", "SWIR",                  "180",    2.190,  "20",)],
    columns=['Band', 'Name', 'Bandwidth (µm)', 'Nominal Wavelength (µm)', 'Resolution (m)'])
band_info

Unnamed: 0,Band,Name,Bandwidth (µm),Nominal Wavelength (µm),Resolution (m)
0,01,Coastal Aerosol,20,0.443,60
1,02,Blue,65,0.49,10
2,03,Green,35,0.56,10
3,04,Red,30,0.665,10
4,05,Vegetation Red Edge,15,0.705,20
5,06,Vegetation Red Edge,15,0.74,20
6,07,Vegetation Red Edge,20,0.783,20
7,08,NIR,115,0.842,10
8,8A,Narrow NIR,20,0.865,20
9,09,Water vapour,20,0.945,60


In [73]:
file_path = 'C:/Users/Administrator/Documents/S2A_MSIL1C_20170531T100031_N0205_R122_T33TTG_20170531T100536.SAFE/GRANULE/L1C_T33TTG_A010129_20170531T100536/IMG_DATA/T33TTG_20170531T100031_B%s.jp2'
bands = list(band_info.Band)
bands = [xr.open_rasterio(file_path%band).load()[0] for band in bands]

  This is separate from the ipykernel package so we can avoid doing imports until


The deepcopy covered in the following section is not necessary but a convenient way to not lose the original information when manipulating the data.

In [74]:
def deepcopy_xarr(xarr):
    from copy import deepcopy
    """
    Deepcopy for xarray that makes sure coords and attrs
    are properly deepcopied.
    With normal copy method from xarray, when i mutated
    xarr.coords[coord].data it would also mutate in the copy
    and vice versa.
    Parameters
    ----------
    xarr: DateArray

    Returns
    -------
    xcopy: DateArray
        Deep copy of xarr
    """
    xcopy = xarr.copy()

    for dim in xcopy.coords:
        xcopy.coords[dim].data = np.copy(xcopy.coords[dim].data)
    xcopy.attrs = deepcopy(xcopy.attrs)
    for attr in xcopy.attrs:
        xcopy.attrs[attr] = deepcopy(xcopy.attrs[attr])
    return xcopy
bandsOrig = [deepcopy_xarr(bands[i]) for i in range(len(bands))]

To overlay the image on a map, we need to transform the coordinates into EPSG: 3785

In [75]:
from pyproj import Proj, transform
inProj = Proj(init=bandsOrig[0].crs[6:])
outProj = Proj(init='epsg:3785')

for i in range(len(bands)):
    
    xDummy = np.array(bandsOrig[i].coords['x'])
    yDummy = np.array(bandsOrig[i].coords['y'])

    xC, yC = transform(inProj,outProj, xDummy, yDummy)
    bands[i].coords['x']=xC
    bands[i].coords['y']=yC


## Rendering Sentinel-2 data as images

The bands measured by Sentinel-2 include wavelengths covering the visible spectrum, but also other ranges, and so it's possible to visualize this data in many different ways, in both true color (using the visible spectrum directly) or false color (usually showing other bands).  Some examples are shown in the sections below.

### Just one band

Using datashader's default histogram-equalized colormapping, the full range of data is visible in the plot:

In [76]:
%opts Overlay [width=600 height=600]
tiles = gv.WMTS(STAMEN_TONER)
tiles * shade(regrid(hv.Image(bands[1])), cmap=['blue', 'white']).redim(x='Longitude', y='Latitude')

You will usually want to zoom in, which will re-rasterize the image if you are in a live notebook, and then re-equalize the colormap to show all the detail available.  If you are on a static copy of the notebook, only the original resolution at which the image was rendered will be available, but zooming will still update the map tiles to whatever resolution is requested.

The plots below use a different type of colormap processing, implemented as a custom transfer function:

In [142]:
from datashader.utils import ngjit
nodata= 1

@ngjit
def normalize_data(agg):
    out = np.zeros_like(agg)
    x_min = agg.min()
    x_max = agg.max()
    range_x = x_max - x_min
    norm = (agg - x_min) / range_x
    out = norm * 255.0
    return out

def combine_bands(r, g, b):

    size_min = min(r.shape[0], g.shape[0], b.shape[0])
    scaler = int(r.shape[0]/size_min)
    scaleg = int(g.shape[0]/size_min)
    scaleb = int(b.shape[0]/size_min)
    
    xs, ys = r['x'][::scaler], r['y'][::scaler]
    r, g, b = r[::scaler, ::scaler], g[::scaleg, ::scaleg], b[::scaleb, ::scaleb]
    r, g, b = [ds.utils.orient_array(img) for img in (r, g, b)]
    a = (np.where(np.logical_or(np.isnan(r),r<=nodata),0,255)).astype(np.uint8)    
    r = (normalize_data(r)).astype(np.uint8)
    g = (normalize_data(g)).astype(np.uint8)
    b = (normalize_data(b)).astype(np.uint8)
    col, rows = r.shape
    return hv.RGB((xs, ys[::-1], r, g, b, a), vdims=list('RGBA'))

### True Color

With the combine_bands function we can put the Red, Green and Blue bands into one image to obtain a true color image. Clipping the values above a certain threshold is necessary for the image to be more appealing to the eye.

In [136]:
lim = 5e3
true_color = combine_bands(np.clip(bands[3], 0, lim),np.clip(bands[2], 0, lim), np.clip(bands[1], 0, lim)).relabel("True Color (R=Red, G=Green, B=Blue)")
tiles * regrid(true_color)

1 1 1


Again, the raster data will only refresh to a new resolution if you are running a live notebook, because that data is not actually present in the web page; it's held in a separate Python server.

Of course, other band combinations may be visualized:

In [143]:
%%opts RGB Curve [width=350 height=350 xaxis=None yaxis=None] {+framewise}

combos = pd.DataFrame([
    (4,3,2,"True color",""),
    (13,12,4,"Urban","False color"),
    (12,9,2,"Agriculture",""),
    (12,8,2,"Agriculture 11 8 2","RGB 11 8 2"),
    (12,8,3,"RGB 11 8 3",""),
    (13,12,2,"Geology","RGB 12 11 2"),
    (4,3,1,"Bathymetric","RGB 4 3 1"),
    (13,12,9,"Penetration","Atmospheric Penetration"),
    (13,8,4,"Shortwave Infrared","RGB 12 8 4"),
    (13,12,4,"SWIR","")],
    columns=['R', 'G', 'B', 'Name', 'Description']).set_index(["Name"])
combos

def combo(name):
    c=combos.loc[name]
    return regrid(combine_bands(bands[c.R-1],bands[c.G-1],bands[c.B-1])).relabel(name)

(combo("Urban") + combo("True color") + combo("Penetration") + combo("SWIR")).cols(2)

All the various ways of combining aggregates supported by [xarray](http://xarray.pydata.org) are available for these channels, making it simple to make your own custom visualizations highlighting any combination of bands that reveal something of interest.

### Revealing the spectrum

The above plots all map some of the measured data into the R,G,B channels of an image, showing all spatial locations but only a restricted set of wavelengths. Alternatively, you could sample across all the measured wavelength bands to show the full spectrum at any given location:

In [125]:
%%opts Curve [width=800 height=300 logx=True]

band_map = hv.HoloMap({i: hv.Image(band) for i, band in enumerate(bands)})

def spectrum(x, y):
    try: 
        spectrum_vals = band_map.sample(x=x, y=y)['z']
        point = gv.Points([(x, y)], crs=ccrs.GOOGLE_MERCATOR)
        point = gv.operation.project_points(point, projection=ccrs.PlateCarree())
        label = label = 'Lon: %.3f, Lat: %.3f' % tuple(point.array()[0])
    except:
        dummy = band_map[0]
        x, y = dummy.bounds.centroid()
        point = gv.Points([(x, y)], crs=ccrs.GOOGLE_MERCATOR)
        point = gv.operation.project_points(point, projection=ccrs.PlateCarree())
        spectrum_vals = band_map.sample(x=x, y=y)['z']
        #spectrum_vals = np.zeros(12)
        label = 'Lon: %.3f, Lat: %.3f' % tuple(point.array()[0])
    
    return hv.Curve((band_info['Nominal Wavelength (µm)'].values, spectrum_vals), label=label,
                    kdims=['Wavelength (µm)'], vdims=['Reflectance']).sort()

spectrum(x=None, y=None)

We can now combine these two approaches to let you explore the full hyperspectral information at any location in the true-color image, updating the curve whenever you hover over an area of the image:

In [127]:
%%opts Curve RGB [width=450 height=450] Curve [logx=True]

tap = hv.streams.PointerXY(source=true_color)
spectrum_curve = hv.DynamicMap(spectrum, streams=[tap]).redim.range(Reflectance=(0, lim))

tiles * regrid(true_color) + spectrum_curve

(Of course, just as for the raster data resolution, the plot on the right will update only in a live notebook session, because it needs to run Python code for each mouse pointer position.)

As you can see, even though datashader is not a GIS system, it can be a flexible, high-performance way to explore GIS data when combined with HoloViews, GeoViews, and Bokeh.