# Sentinel-2


<div class="alert-info">

### Overview
    
* **teaching:** 30 minutes
* **exercises:** 0
* **questions:**
    * How can I find, anaylize, and visualize Sentinel-2 satellite imagery for an area of interest using Python?
    
</div>


This notebook will focus on accessing public datasets on AWS for a target area affected by Cyclone Kenneth (2019-04-25). Read more about this event and its impact at the [Humanitarian Open Street Map website](https://tasks.hotosm.org/project/5977). We will use a bounding box we will work with covers the island of Nagazidja, including the captial [city of Moroni](https://en.wikipedia.org/wiki/Moroni,_Comoros) - Union of the Comoros, a sovereign archipelago nation in the Indian Ocean. 

We will examine raster images from the [Sentinel-2 instrument](https://sentinel.esa.int/web/sentinel/missions/sentinel-2) operated by the European Space Agency. Sentinel-2 is an electro-optical imager that has slightly different bands compared to Landsat8. For more information about sentinel-2 check out the [comprehensive user guide](https://sentinel.esa.int/web/sentinel/user-guides/sentinel-2-msi).

## Table of contents

1. [**Sat-search**](#Sat-search)
    1. [**STAC Data Model**](#STAC-Metadata)
    1. [**STAC with geopandas**](#STAC-with-geopandas)
1. [**Holoviz visualization**](#Holoviz)
1. [**Rasterio and xarray**](#Rasterio-and-xarray)

In [None]:
# Import libraries
import geopandas as gpd
import pandas as pd

import satsearch
from satstac import Items

import holoviews as hv
import hvplot.xarray
import hvplot.pandas
import geoviews as gv

import ipywidgets
import datetime

from ipywidgets import interact
from IPython.display import display, Image

import json
from cartopy import crs as ccrs

import rasterio
import rasterio.mask
from rasterio.session import AWSSession
import xarray as xr

import matplotlib.pyplot as plt
from matplotlib.colors import LogNorm
%matplotlib inline

## Sat-search 

[Sat-search](https://github.com/sat-utils/sat-search) is open-source software designed to easily discover public imagery on AWS. It depends upon metadata called Spatio-Temporal Asset Catalogs [STAC catalogs](https://stacspec.org/) to filter scenes. We will use it to search for Sentinel-2 data covering our area of interest.

In [None]:
# Set up our bounding box
bbox = [43.16, -11.32, 43.54, -11.96]
west, north, east, south = bbox
bbox_ctr = [0.5*(north+south), 0.5*(west+east)]

In [None]:
# bbox as a python list is great for use in python, but we can instead save to a more interoperable format (GeoJSON)
# Here is a great website for creating and visualizing geojson on a map: http://geojson.io
aoi = { "type": "Polygon", 
    "coordinates": [[[west, south], [west, north], [east, north], [east, south], [west, south]]]
}
# pretty print formatting
#print(json.dumps(aoi, sort_keys=False, indent=2))

# save to file for future use
with open('aoi-5977.geojson', 'w') as f:
    json.dump(aoi, f)

In [None]:
# Load results to pandas geodataframe
# now other packages such as geojson can read this file
gfa = gpd.read_file('aoi-5977.geojson')
gfa

### STAC Metadata

STAC metadata consists of catalogs that have catalogs, collections, and items. Actual paths or urls to images are stored as "assets" for an item. The following cells illustrate this data model using the [sat-stac library](https://github.com/sat-utils/sat-stac).

In [None]:
# Get results for bbox and time range
# Sentinel-2 data is available in many locations https://www.usgs.gov/centers/eros/science/usgs-eros-archive-sentinel-2
# https://sentinel.esa.int/web/sentinel/sentinel-data-access
# https://gdal.org/drivers/raster/sentinel2.html

# remember we are searching data on AWS:
#https://registry.opendata.aws/sentinel-2/ 

results = satsearch.Search(bbox=bbox, datetime='2019-02-01/2019-06-01')
print('%s items' % results.found())
items = results.items()
print('%s collections:' % len(items._collections))
print(items._collections)

In [None]:
# If you are unfamiliar with one of these satellites, we can look at stored metadata
col = items._collections[0]

print('Title:', col.title)
print('Collection Version:', col.version)
print('Keywords: ', col.keywords)
print('License:', col.license)
print('Providers:', col.providers)
print('Extent', col.extent)

In [None]:
# We can delve deeper to see what kind of metadata is available at the scene level
for key in col.properties:
    if key == 'eo:bands':
        [print(band) for band in col[key]]
    else:
        print('%s: %s' % (key, col[key]))

In [None]:
# Plot single band, full resolution
item = items[5]
band = 'red'
print(item.assets.keys())
print(item.assets_by_common_name.keys())
print(item.asset('thumbnail')['href'])
url = item.asset(band)['href']
print(url)

In [None]:
# Slightly different search syntax
properties = [] # additional filters
results = satsearch.Search.search(collection='sentinel-2-l1c', 
                        datetime='2019-02-01/2019-06-01',
                        bbox=bbox, 
                        sort=['<datetime'], #earliest scene first
                        property=properties)
print('%s items' % results.found())

In [None]:
# Might want to reduce the date further with other filters
properties.extend(["eo:cloud_cover<10"])

results = satsearch.Search.search(collection='sentinel-2-l1c', 
                        datetime='2019-02-01/2019-06-01',
                        bbox=bbox, 
                        sort=['<datetime'], #earliest scene first
                        property=properties)
print('%s items' % results.found())

## STAC with geopandas

The geopandas library provides a nice data model for organizing and visualizing STAC catalogs

In [None]:
items = results.items()
items.save('items-sentinel2.json')
#items = Items.load('items-sentinel2.json')
#items.bbox()

In [None]:
# Assets correspond to actual images related to a STAC metadata item
#items[0].assets

In [None]:
# Use pandas to better display python dictionaries!
pd.DataFrame(items[0].assets).T.reset_index()

In [None]:
# Put items into a geopandas geodataframe with image footprints
gfs = gpd.read_file('items-sentinel2.json')
gfs = gfs.sort_values('datetime').reset_index(drop=True)
print('records:', len(gfs))
gfs.head()

In [None]:
gfs.iloc[0]['eo:bands']

In [None]:
import ast
band_info = pd.DataFrame(ast.literal_eval(gfs.iloc[0]['eo:bands']))
band_info

## Holoviz

[Holoviz](https://holoviz.org/) is a set of Python visualization libraries that simplify interactive visualizations of data in a web-browser. We'll use several of these libraries including hvplot and geoviews to visualize both vector data (such as image footprints) and raster data (actual raster values). 

<div class="alert-warning">

#### Note 
    
the toolbars on the right and side of these plots. We are using a library called Bokeh that gives interactive widgets to zoom in and pan around on maps.
</div>

In [None]:
# Plot search AOI and frames on a map using Holoviz Libraries
cols = gfs.loc[:,('id','sentinel:latitude_band','sentinel:grid_square', 'geometry')]

footprints = cols.hvplot(geo=True, line_color='k', alpha=0.1, title='Sentinel-2')
aoi = gfa.hvplot(geo=True, line_color='b', fill_color=None)
tiles = gv.tile_sources.CartoEco.options(width=700, height=500) 
labels = gv.tile_sources.StamenLabels.options(level='annotation')
tiles * footprints * aoi * labels

## ipywidgets

[ipywidgets](https://ipywidgets.readthedocs.io/en/latest/) provide another convenient approach to custom visualizations. The function below allows us to browse through all the image thumbnails for a group of images (more specifically a specific Sentinel1 grid square and latitute band). 

In [None]:
# Browse all thumbnails, no date restriction
properties =  ["sentinel:latitude_band=L",
               "sentinel:grid_square=LN",
               "eo:cloud_cover<10"] 
results = satsearch.Search.search(collection='sentinel-2-l1c',
                        bbox=bbox, 
                        datetime='2019-03-01/2019-09-01',
                        sort=['<datetime'], #earliest scene first
                        property=properties)
print('%s items' % results.found())
items = results.items()
items.save('my-sentinel2-archive.json')

In [None]:
def browse_images(items):
    n = len(items)

    def view_image(i=0):
        item = items[i]
        print(f"id={item.id}\tdate={item.datetime}\tcloud%={item['eo:cloud_cover']}")
        display(Image(item.asset('thumbnail')['href']))
    
    interact(view_image, i=(0,n-1))

In [None]:
# Right click on image below and select 'new view for output'
browse_images(items)

## Rasterio and xarray

Thumbnails are great for quickly looking through imagery, but to actually load full resolution data from a particular Sentinel-2 band we'll use rasterio and xarray libraries.

In [None]:
#our geodataframe provides a nice mapping from frameid to urls for each band
gf = gpd.read_file('items-sentinel2.json')
# Excercise: turn this into a function

frameid = 'S2B_38LLN_20190415_0'

### AWS credentials for requester-pays buckets

<div class="alert-warning">

#### Note 
    
* [Sentinel-2 data on AWS](https://registry.opendata.aws/sentinel-2) is stored in the **eu-central-1** region in a "requester pays" bucket. So unlike Landsat-8, which currently has open public access, we need to set up AWS authentication to pull full resolution images.
    
* Be aware that our hub is running in us-east-1, so this data is being moved from Europe and we are paying for it! If you wanted to work with a lot of Sentinel-2 data, you should operate in the eu-central-1 region. [Blog post about this](https://medium.com/@_VincentS_/do-you-really-want-people-using-your-data-ec94cd94dc3f).
</div>

In [None]:
# Configure requester-pays access with rasterio
env = rasterio.Env(AWSSession(region_name='eu-central-1', 
                              requester_pays=True),
                  )

In [None]:
def url2s3(url, bucket='sentinel-s2-l1c'):
    """ convert public url to s3 path """
    key = url.split('amazonaws.com')[1]
    return f's3://{bucket}{key}'

In [None]:
s3path = url2s3(url)

In [None]:
with env:
    with rasterio.open(s3path) as src:
        print(src.profile)

In [None]:
with env:
    with rasterio.open(s3path) as src:
        width = src.width
        blockx = src.profile['blockxsize']
        blocky = src.profile['blockysize']
        #print(src.profile)
        xchunk = int(width/blockx)*blockx
        ychunk = blocky
        da = xr.open_rasterio(src, chunks={'band': 1, 'x': xchunk, 'y': ychunk})
da

In [None]:
# NOTE: different EPSG code compared to Landsat: epsg 32738
#https://spatialreference.org/ref/epsg/wgs-84-utm-zone-38s/

In [None]:
# This will pull raster data over network. if operating in the same AWS region, should be very fast!

img = da.hvplot.image(rasterize=True, logz=True, width=700, height=500, cmap='reds', title=f'{item.id} ({band})')

img 