# Skimming large amounts of Sentinel 2 data
In this notebook, we will show how using `geolt.skim` and `geolt.summarize` can help prepare and visualize large sets of data. We will load a large set of Sentinel-2 imagery over Austin, TX and quickly retreive statistics about the features within to assist with downstream processing. We will use the following process:
1. Initialize Dask cluster to load `xarray Dataset` from a public S3 bucket.
2. `skim` the dataset for a quick set of summary statistics.
3. `summarize` features within the dataset to assist with feature engineering and downstream use.

In [1]:
from porcupine import skim
import pandas as pd
import xarray
import dask.distributed
from odc.stac import configure_rio, stac_load
from pystac_client import Client
import folium
import folium.plugins
import geopandas as gpd
import shapely.geometry
from IPython.display import HTML, display

client = dask.distributed.Client()
configure_rio(cloud_defaults=True, aws={"aws_unsigned": True}, client=client)
display(client)

2022-08-03 11:58:01,380 - distributed.diskutils - INFO - Found stale lock file and directory '/var/folders/x3/gcqk_pm125n_d0fl67fn2fjw0000gn/T/dask-worker-space/worker-xc_61rut', purging
2022-08-03 11:58:01,380 - distributed.diskutils - INFO - Found stale lock file and directory '/var/folders/x3/gcqk_pm125n_d0fl67fn2fjw0000gn/T/dask-worker-space/worker-0i0i03ac', purging
2022-08-03 11:58:01,381 - distributed.diskutils - INFO - Found stale lock file and directory '/var/folders/x3/gcqk_pm125n_d0fl67fn2fjw0000gn/T/dask-worker-space/worker-zd_qgyx3', purging
2022-08-03 11:58:01,382 - distributed.diskutils - INFO - Found stale lock file and directory '/var/folders/x3/gcqk_pm125n_d0fl67fn2fjw0000gn/T/dask-worker-space/worker-q6y5s91b', purging


0,1
Connection method: Cluster object,Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:8787/status,

0,1
Dashboard: http://127.0.0.1:8787/status,Workers: 4
Total threads: 8,Total memory: 16.00 GiB
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:61773,Workers: 4
Dashboard: http://127.0.0.1:8787/status,Total threads: 8
Started: Just now,Total memory: 16.00 GiB

0,1
Comm: tcp://127.0.0.1:61788,Total threads: 2
Dashboard: http://127.0.0.1:61789/status,Memory: 4.00 GiB
Nanny: tcp://127.0.0.1:61776,
Local directory: /var/folders/x3/gcqk_pm125n_d0fl67fn2fjw0000gn/T/dask-worker-space/worker-vnpdzowm,Local directory: /var/folders/x3/gcqk_pm125n_d0fl67fn2fjw0000gn/T/dask-worker-space/worker-vnpdzowm

0,1
Comm: tcp://127.0.0.1:61794,Total threads: 2
Dashboard: http://127.0.0.1:61795/status,Memory: 4.00 GiB
Nanny: tcp://127.0.0.1:61777,
Local directory: /var/folders/x3/gcqk_pm125n_d0fl67fn2fjw0000gn/T/dask-worker-space/worker-0zzonlh3,Local directory: /var/folders/x3/gcqk_pm125n_d0fl67fn2fjw0000gn/T/dask-worker-space/worker-0zzonlh3

0,1
Comm: tcp://127.0.0.1:61797,Total threads: 2
Dashboard: http://127.0.0.1:61798/status,Memory: 4.00 GiB
Nanny: tcp://127.0.0.1:61778,
Local directory: /var/folders/x3/gcqk_pm125n_d0fl67fn2fjw0000gn/T/dask-worker-space/worker-b9g9ylaz,Local directory: /var/folders/x3/gcqk_pm125n_d0fl67fn2fjw0000gn/T/dask-worker-space/worker-b9g9ylaz

0,1
Comm: tcp://127.0.0.1:61791,Total threads: 2
Dashboard: http://127.0.0.1:61792/status,Memory: 4.00 GiB
Nanny: tcp://127.0.0.1:61779,
Local directory: /var/folders/x3/gcqk_pm125n_d0fl67fn2fjw0000gn/T/dask-worker-space/worker-jf_tt56n,Local directory: /var/folders/x3/gcqk_pm125n_d0fl67fn2fjw0000gn/T/dask-worker-space/worker-jf_tt56n


## Load Dataset from STAC catalog
The following cell loads an `xarray Dataset` containing Sentinel-2 imagery from a STAC catalog. We initialize the center point of our query to be the center of Austin, TX and search for imagery from July, 2022 within a 100 km radius. 

In [2]:
config = {
    "sentinel-s2-l2a-cogs": {
        "assets": {
            "*": {"data_type": "uint16", "nodata": 0},
            "SCL": {"data_type": "uint8", "nodata": 0},
            "visual": {"data_type": "uint8", "nodata": 0},
        },
        "aliases": {"red": "B04", "green": "B03", "blue": "B02"},
    },
    "*": {"warnings": "ignore"},
}

km2deg = 1.0 / 111
x, y = (-97.744, 30.266)
r = 100 * km2deg
bbox = (x - r, y - r, x + r, y + r)

catalog = Client.open("https://earth-search.aws.element84.com/v0")

query = catalog.search(
    collections=["sentinel-s2-l2a-cogs"], datetime="2022-07-01/2022-07-31", limit=100, bbox=bbox
)

items = list(query.get_items())

# Convert STAC items into a GeoJSON FeatureCollection
stac_json = query.get_all_items_as_dict()

# Since we will plot it on a map we need to use `EPSG:3857` projection
crs = "epsg:3857"
zoom = 2**5 # overview level 5

data = stac_load(
    items,
    crs=crs,
    resolution=10*zoom,
    chunks={},
    groupby="solar_day",
    stac_cfg=config,
)
data

Unnamed: 0,Array,Chunk
Bytes,14.86 MiB,1.24 MiB
Shape,"(12, 1142, 1137)","(1, 1142, 1137)"
Count,98 Tasks,12 Chunks
Type,uint8,numpy.ndarray
"Array Chunk Bytes 14.86 MiB 1.24 MiB Shape (12, 1142, 1137) (1, 1142, 1137) Count 98 Tasks 12 Chunks Type uint8 numpy.ndarray",1137  1142  12,

Unnamed: 0,Array,Chunk
Bytes,14.86 MiB,1.24 MiB
Shape,"(12, 1142, 1137)","(1, 1142, 1137)"
Count,98 Tasks,12 Chunks
Type,uint8,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,29.72 MiB,2.48 MiB
Shape,"(12, 1142, 1137)","(1, 1142, 1137)"
Count,98 Tasks,12 Chunks
Type,uint16,numpy.ndarray
"Array Chunk Bytes 29.72 MiB 2.48 MiB Shape (12, 1142, 1137) (1, 1142, 1137) Count 98 Tasks 12 Chunks Type uint16 numpy.ndarray",1137  1142  12,

Unnamed: 0,Array,Chunk
Bytes,29.72 MiB,2.48 MiB
Shape,"(12, 1142, 1137)","(1, 1142, 1137)"
Count,98 Tasks,12 Chunks
Type,uint16,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,29.72 MiB,2.48 MiB
Shape,"(12, 1142, 1137)","(1, 1142, 1137)"
Count,98 Tasks,12 Chunks
Type,uint16,numpy.ndarray
"Array Chunk Bytes 29.72 MiB 2.48 MiB Shape (12, 1142, 1137) (1, 1142, 1137) Count 98 Tasks 12 Chunks Type uint16 numpy.ndarray",1137  1142  12,

Unnamed: 0,Array,Chunk
Bytes,29.72 MiB,2.48 MiB
Shape,"(12, 1142, 1137)","(1, 1142, 1137)"
Count,98 Tasks,12 Chunks
Type,uint16,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,29.72 MiB,2.48 MiB
Shape,"(12, 1142, 1137)","(1, 1142, 1137)"
Count,98 Tasks,12 Chunks
Type,uint16,numpy.ndarray
"Array Chunk Bytes 29.72 MiB 2.48 MiB Shape (12, 1142, 1137) (1, 1142, 1137) Count 98 Tasks 12 Chunks Type uint16 numpy.ndarray",1137  1142  12,

Unnamed: 0,Array,Chunk
Bytes,29.72 MiB,2.48 MiB
Shape,"(12, 1142, 1137)","(1, 1142, 1137)"
Count,98 Tasks,12 Chunks
Type,uint16,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,29.72 MiB,2.48 MiB
Shape,"(12, 1142, 1137)","(1, 1142, 1137)"
Count,98 Tasks,12 Chunks
Type,uint16,numpy.ndarray
"Array Chunk Bytes 29.72 MiB 2.48 MiB Shape (12, 1142, 1137) (1, 1142, 1137) Count 98 Tasks 12 Chunks Type uint16 numpy.ndarray",1137  1142  12,

Unnamed: 0,Array,Chunk
Bytes,29.72 MiB,2.48 MiB
Shape,"(12, 1142, 1137)","(1, 1142, 1137)"
Count,98 Tasks,12 Chunks
Type,uint16,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,29.72 MiB,2.48 MiB
Shape,"(12, 1142, 1137)","(1, 1142, 1137)"
Count,98 Tasks,12 Chunks
Type,uint16,numpy.ndarray
"Array Chunk Bytes 29.72 MiB 2.48 MiB Shape (12, 1142, 1137) (1, 1142, 1137) Count 98 Tasks 12 Chunks Type uint16 numpy.ndarray",1137  1142  12,

Unnamed: 0,Array,Chunk
Bytes,29.72 MiB,2.48 MiB
Shape,"(12, 1142, 1137)","(1, 1142, 1137)"
Count,98 Tasks,12 Chunks
Type,uint16,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,29.72 MiB,2.48 MiB
Shape,"(12, 1142, 1137)","(1, 1142, 1137)"
Count,98 Tasks,12 Chunks
Type,uint16,numpy.ndarray
"Array Chunk Bytes 29.72 MiB 2.48 MiB Shape (12, 1142, 1137) (1, 1142, 1137) Count 98 Tasks 12 Chunks Type uint16 numpy.ndarray",1137  1142  12,

Unnamed: 0,Array,Chunk
Bytes,29.72 MiB,2.48 MiB
Shape,"(12, 1142, 1137)","(1, 1142, 1137)"
Count,98 Tasks,12 Chunks
Type,uint16,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,29.72 MiB,2.48 MiB
Shape,"(12, 1142, 1137)","(1, 1142, 1137)"
Count,98 Tasks,12 Chunks
Type,uint16,numpy.ndarray
"Array Chunk Bytes 29.72 MiB 2.48 MiB Shape (12, 1142, 1137) (1, 1142, 1137) Count 98 Tasks 12 Chunks Type uint16 numpy.ndarray",1137  1142  12,

Unnamed: 0,Array,Chunk
Bytes,29.72 MiB,2.48 MiB
Shape,"(12, 1142, 1137)","(1, 1142, 1137)"
Count,98 Tasks,12 Chunks
Type,uint16,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,29.72 MiB,2.48 MiB
Shape,"(12, 1142, 1137)","(1, 1142, 1137)"
Count,98 Tasks,12 Chunks
Type,uint16,numpy.ndarray
"Array Chunk Bytes 29.72 MiB 2.48 MiB Shape (12, 1142, 1137) (1, 1142, 1137) Count 98 Tasks 12 Chunks Type uint16 numpy.ndarray",1137  1142  12,

Unnamed: 0,Array,Chunk
Bytes,29.72 MiB,2.48 MiB
Shape,"(12, 1142, 1137)","(1, 1142, 1137)"
Count,98 Tasks,12 Chunks
Type,uint16,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,29.72 MiB,2.48 MiB
Shape,"(12, 1142, 1137)","(1, 1142, 1137)"
Count,98 Tasks,12 Chunks
Type,uint16,numpy.ndarray
"Array Chunk Bytes 29.72 MiB 2.48 MiB Shape (12, 1142, 1137) (1, 1142, 1137) Count 98 Tasks 12 Chunks Type uint16 numpy.ndarray",1137  1142  12,

Unnamed: 0,Array,Chunk
Bytes,29.72 MiB,2.48 MiB
Shape,"(12, 1142, 1137)","(1, 1142, 1137)"
Count,98 Tasks,12 Chunks
Type,uint16,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,29.72 MiB,2.48 MiB
Shape,"(12, 1142, 1137)","(1, 1142, 1137)"
Count,98 Tasks,12 Chunks
Type,uint16,numpy.ndarray
"Array Chunk Bytes 29.72 MiB 2.48 MiB Shape (12, 1142, 1137) (1, 1142, 1137) Count 98 Tasks 12 Chunks Type uint16 numpy.ndarray",1137  1142  12,

Unnamed: 0,Array,Chunk
Bytes,29.72 MiB,2.48 MiB
Shape,"(12, 1142, 1137)","(1, 1142, 1137)"
Count,98 Tasks,12 Chunks
Type,uint16,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,29.72 MiB,2.48 MiB
Shape,"(12, 1142, 1137)","(1, 1142, 1137)"
Count,98 Tasks,12 Chunks
Type,uint16,numpy.ndarray
"Array Chunk Bytes 29.72 MiB 2.48 MiB Shape (12, 1142, 1137) (1, 1142, 1137) Count 98 Tasks 12 Chunks Type uint16 numpy.ndarray",1137  1142  12,

Unnamed: 0,Array,Chunk
Bytes,29.72 MiB,2.48 MiB
Shape,"(12, 1142, 1137)","(1, 1142, 1137)"
Count,98 Tasks,12 Chunks
Type,uint16,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,29.72 MiB,2.48 MiB
Shape,"(12, 1142, 1137)","(1, 1142, 1137)"
Count,98 Tasks,12 Chunks
Type,uint16,numpy.ndarray
"Array Chunk Bytes 29.72 MiB 2.48 MiB Shape (12, 1142, 1137) (1, 1142, 1137) Count 98 Tasks 12 Chunks Type uint16 numpy.ndarray",1137  1142  12,

Unnamed: 0,Array,Chunk
Bytes,29.72 MiB,2.48 MiB
Shape,"(12, 1142, 1137)","(1, 1142, 1137)"
Count,98 Tasks,12 Chunks
Type,uint16,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,29.72 MiB,2.48 MiB
Shape,"(12, 1142, 1137)","(1, 1142, 1137)"
Count,98 Tasks,12 Chunks
Type,uint16,numpy.ndarray
"Array Chunk Bytes 29.72 MiB 2.48 MiB Shape (12, 1142, 1137) (1, 1142, 1137) Count 98 Tasks 12 Chunks Type uint16 numpy.ndarray",1137  1142  12,

Unnamed: 0,Array,Chunk
Bytes,29.72 MiB,2.48 MiB
Shape,"(12, 1142, 1137)","(1, 1142, 1137)"
Count,98 Tasks,12 Chunks
Type,uint16,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,29.72 MiB,2.48 MiB
Shape,"(12, 1142, 1137)","(1, 1142, 1137)"
Count,98 Tasks,12 Chunks
Type,uint16,numpy.ndarray
"Array Chunk Bytes 29.72 MiB 2.48 MiB Shape (12, 1142, 1137) (1, 1142, 1137) Count 98 Tasks 12 Chunks Type uint16 numpy.ndarray",1137  1142  12,

Unnamed: 0,Array,Chunk
Bytes,29.72 MiB,2.48 MiB
Shape,"(12, 1142, 1137)","(1, 1142, 1137)"
Count,98 Tasks,12 Chunks
Type,uint16,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,14.86 MiB,1.24 MiB
Shape,"(12, 1142, 1137)","(1, 1142, 1137)"
Count,98 Tasks,12 Chunks
Type,uint8,numpy.ndarray
"Array Chunk Bytes 14.86 MiB 1.24 MiB Shape (12, 1142, 1137) (1, 1142, 1137) Count 98 Tasks 12 Chunks Type uint8 numpy.ndarray",1137  1142  12,

Unnamed: 0,Array,Chunk
Bytes,14.86 MiB,1.24 MiB
Shape,"(12, 1142, 1137)","(1, 1142, 1137)"
Count,98 Tasks,12 Chunks
Type,uint8,numpy.ndarray


## Perform a local skim of the data for 1 time slice

In [3]:
small_data = data.sel(time='2022-07-03').compute()
df = skim(small_data)
df

Unnamed: 0,variables,data_types,NaNs,mean,std,maximums,minimums
0,visual,uint8,False,83.206896,85.090139,255,0
1,B01,uint16,False,694.111545,1057.890177,12671,0
2,B02,uint16,False,715.116638,1019.690505,11615,0
3,B03,uint16,False,831.311709,1033.465046,10787,0
4,B04,uint16,False,897.211163,1064.686861,10284,0
5,B05,uint16,False,1144.481014,1258.273113,11399,0
6,B06,uint16,False,1453.429606,1447.058901,10691,0
7,B07,uint16,False,1585.919234,1543.21547,10361,0
8,B08,uint16,False,1592.04866,1549.807719,10519,0
9,B8A,uint16,False,1737.786114,1662.727114,10352,0
