# Generating geomedian composites

These composites are for use in collecting training data.


### Load packages

In [None]:
%matplotlib inline

import numpy as np
import geopandas as gpd
import datacube
from odc.algo import to_f32, from_float, xr_geomedian
from datacube.utils import geometry
import warnings
warnings.filterwarnings("ignore")

import sys
sys.path.append('../Scripts')
from deafrica_datahandling import load_ard, mostcommon_crs
from deafrica_plotting import rgb, display_map
from deafrica_dask import create_local_dask_cluster
from deafrica_spatialtools import xr_rasterize
from deafrica_classificationtools import HiddenPrints

### Set up a dask cluster

In [None]:
create_local_dask_cluster(aws_unsigned=True)

### Connect to the datacube

In [None]:
dc = datacube.Datacube(app='Geomedian_composites')

## Load Landsat 8 data from the datacube

Here we are loading in a timeseries of cloud-masked Landsat 8 satellite images through the datacube API using the [load_ard](https://github.com/GeoscienceAustralia/dea-notebooks/blob/develop/Frequently_used_code/Using_load_ard.ipynb) function. 
This will provide us with some data to work with. To limit computation and memory this example uses only three optical bands (red, green, blue).

In [None]:
lat, lon = -33.0, 20.233
buffer_lon = 2.5
buffer_lat = 1.9


In [None]:
# display_map(x=(lon-buffer_lon, lon+buffer_lon), y=(lat+buffer_lat, lat-buffer_lat))

In [None]:
# Create a reusable query
query = {
    'time': ('2018'),
    'measurements': ['green',
                     'red',
                     'blue'],
    'resolution': (-30, 30),
    'group_by': 'solar_day'
}

In [None]:
from datacube.utils import geometry
from deafrica_spatialtools import xr_rasterize

gdf = gpd.read_file('data/Southern.shp')

# set up query based on polygon (convert to WGS84)
geom = geometry.Geometry(
    gdf.geometry.values[0].__geo_interface__, geometry.CRS(
        'epsg:4326'))

#print(geom)    
q = {"geopolygon": geom,
     'dask_chunks': {'x':2000, 'y':2000, 'time':1}}

# merge polygon query with user supplied query params
query.update(q)

# Load available data
ds = dc.load(product='ga_ls8c_gm_2_annual',
              output_crs='EPSG:6933',
              align=(15, 15),
              **query)

# #mask dataset
# mask = xr_rasterize(gdf.iloc[[0]], ds)
# ds = ds.where(mask)

# Print output data
print(ds)

## Generate a geomedian

Generating a geomedian composite will combine all the observations in our `xarray.Dataset` into a single, complete (or near complete) image representing the geometric median of the time period.  This process requires a couple of steps:

1. Before we calculate the geomedian, we first need to prepare the dataset by scaling the surface reflectance values from their original range to `0-1` (DE Africa's Landsat Collection 1 archive is scaled from `0-10,000`, Landsat Collection 2 (not yet available) is scaled from `1-65,455`).  This will ensure numerical stability during the computation.

2. Call the `xr_geomedian` fucntion.  The parameters in the `xr_geomedian` code block do not usually require changing. The `eps` parameter controls the number of iterations to conduct; a good default is `1e-7`.  After calling the function, we then run `.compute()`, which will trigger the computation.

3. Finally, we will convert the result back to the original scaling values (0-10,000 in this example).

In [None]:
sr_max_value = 65455                 # maximum value for SR in the loaded product
scale, offset = (1/sr_max_value, 0)  # differs per product, aim for 0-1 values in float32

#scale the values using the f_32 util function
ds_scaled = to_f32(ds,
                   scale=scale,
                   offset=offset)

In [None]:
#generate a geomedian
geomedian = xr_geomedian(ds_scaled, 
                         num_threads=1,  # disable internal threading, dask will run several concurrently
                         eps=1e-7,  
                         nocheck=True)   # disable checks inside library that use too much ram


### Run the computation

The `.compute()` method will trigger the computation of everything we've instructed above. This will take about a few minutes to run, view the `dask dashboard` to check the progress.

In [None]:
%%time
geomedian = geomedian.compute()

In [None]:
%%time
#convert SR scaling values back to original values
geomedian = from_float(geomedian, 
                       dtype='float32', 
                       nodata=np.nan, 
                       scale=1/scale, 
                       offset=-offset/scale)

If we print our result, you will see that the `time dimension` has now been removed and we are left with a single image that represents the geometric median of all the satellite images in our initial time series.

In [None]:
print(geomedian)

In [None]:
# from datacube.helpers import write_geotiff
# write_geotiff('geomedian_SA_2018_LS8.tif', ds.squeeze())

In [None]:
ds = ds.compute()


In [None]:
from datacube.utils.cog import write_cog
write_cog(geo_im=ds.squeeze().to_array(), 
          fname='geomedian_SA_2018_LS8_.tif',  
          overwrite=True)