# Creating a Median Composite with Dask

We will query a STAC catalog for Sentinel-2 imagery and create a monthly cloud-free composite using distributed processing on a local machine.

## Setup and Data Download

The following blocks of code will install the required packages and download the datasets to your Colab environment.

In [None]:
%%capture
if 'google.colab' in str(get_ipython()):
    !pip install pystac-client
    !apt install libspatialindex-dev
    !pip install fiona shapely pyproj rtree
    !pip install geopandas folium stackstac rioxarray mapclassify

In [None]:
import json
import geopandas as gpd
from shapely.geometry import mapping
import pandas as pd
import pystac_client
import os
import folium
from folium import Figure
import stackstac
import rioxarray
import matplotlib.pyplot as plt
import mapclassify
import dask

In [None]:
from dask.distributed import Client, progress
client = Client()  # set up local cluster on the machine
client

In [None]:
data_folder = 'data'
output_folder = 'output'

if not os.path.exists(data_folder):
    os.mkdir(data_folder)
if not os.path.exists(output_folder):
    os.mkdir(output_folder)

In [None]:
def download(url):
    filename = os.path.join(data_folder, os.path.basename(url))
    if not os.path.exists(filename):
        from urllib.request import urlretrieve
        local, _ = urlretrieve(url, filename)
        print('Downloaded ' + local)

download('https://github.com/spatialthoughts/python-tutorials/raw/main/data/' +
         'bangalore.geojson')

## Procedure

Let's use Element84 search endpoint to look for items from the `sentinel-2-c1-l2a` collection on AWS

In [None]:
catalog = pystac_client.Client.open('https://earth-search.aws.element84.com/v1')

In [None]:
aoi_file = 'bangalore.geojson'
aoi_filepath = os.path.join(data_folder, aoi_file)
aoi = gpd.read_file(aoi_filepath)

In [None]:
geometry = aoi.unary_union
geometry_geojson = json.dumps(mapping(geometry))

We search for the imagery collected within the date range and intersecting the AOI geometry. Additionally we add filters to select imagery with less cloud cover and over a specific MGRS tile.

In [None]:
year = 2023
month = 4
time_range = f'{year}-{month:02}'

search = catalog.search(
    collections=['sentinel-2-c1-l2a'],
    intersects=geometry_geojson,
    datetime=time_range,
    query={'eo:cloud_cover': {'lt': 30},  'mgrs:grid_square': {'eq': 'GQ'}},
)
items = search.item_collection()
len(items)

In [None]:
stack = stackstac.stack(items, resolution=10)
stack

Clip and select the subset of bands.

In [None]:
geometry = aoi.to_crs(scene.rio.crs).geometry
clipped = stack.rio.clip(geometry)
subset = clipped.sel(band=['red', 'green', 'blue'])

In [None]:
median = subset.median(dim='time')
median

In [None]:
%time median = median.compute()

In [None]:
output_file = f'median_{year}_{month:02}.tif'
output_path = os.path.join(output_folder, output_file)
median.rio.to_raster(output_path, driver='COG')
print(f'Wrote {output_file}')