## Overview

In this section, we will take a single Sentinel-2 L1C scene downloaded from the [Copernicus Browser](https://dataspace.copernicus.eu/) and learn how to read it using XArray, visualize it and compute spectral indices.

## Setup and Data Download

The following blocks of code will install the required packages and download the datasets to your Colab environment.

In [3]:
%%capture
if 'google.colab' in str(get_ipython()):
  !pip install rioxarray

In [4]:
import os
import matplotlib.pyplot as plt
import xarray as xr
import rioxarray as rxr
import zipfile


In [5]:
data_folder = 'data'
output_folder = 'output'

if not os.path.exists(data_folder):
    os.mkdir(data_folder)
if not os.path.exists(output_folder):
    os.mkdir(output_folder)

In [6]:
def download(url):
    filename = os.path.join(data_folder, os.path.basename(url))
    if not os.path.exists(filename):
        from urllib.request import urlretrieve
        local, _ = urlretrieve(url, filename)
        print('Downloaded ' + local)

filename = 'S2A_MSIL1C_20230212T050931_N0509_R019_T43PGQ_20230212T065641.SAFE.zip'
data_url = 'https://storage.googleapis.com/spatialthoughts-public-data/s2/'

download(data_url + filename)

## Data Pre-Processing

We first unzip the zip archive and create a XArray Dataset from the individual band images.



In [7]:
s2_filepath = os.path.join(data_folder, filename)

with zipfile.ZipFile(s2_filepath) as zf:
  zf.extractall(data_folder)

Sentinel-2 images come as individual JPEG2000 rasters for each band. The image files are located in the `GRANULE/{SCENE_ID}/IMG_DATA/` subfolder. We find the files and read them using `rioxarray`.

In [24]:
import glob
s2_folder = s2_filepath[:-4]

bands = []

for filepath in glob.glob(os.path.join(s2_folder, 'GRANULE', '*', 'IMG_DATA', '*B*.jp2')):
  band = rxr.open_rasterio(filepath, chunks={'x':2048, 'y':2048})
  filename = os.path.basename(filepath)
  # Extract the part of the filename containing band name such as 'B01'
  name = os.path.splitext(filename)[0].split('_')[-1]
  band.name = name
  band = band.assign_coords(band=[name])
  bands.append(band)

scene = xr.concat(bands, dim='band')
scene.name = 'S2'

In [25]:
scene.sel(band=['B04', 'B03', 'B02'])

Unnamed: 0,Array,Chunk
Bytes,3.03 GiB,36.00 MiB
Shape,"(3, 16470, 16470)","(1, 3072, 3072)"
Dask graph,192 chunks in 116 graph layers,192 chunks in 116 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 3.03 GiB 36.00 MiB Shape (3, 16470, 16470) (1, 3072, 3072) Dask graph 192 chunks in 116 graph layers Data type float32 numpy.ndarray",16470  16470  3,

Unnamed: 0,Array,Chunk
Bytes,3.03 GiB,36.00 MiB
Shape,"(3, 16470, 16470)","(1, 3072, 3072)"
Dask graph,192 chunks in 116 graph layers,192 chunks in 116 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
