# **Sentinel-2 Pre-Processing**

ML-based Local Climate Zone (LCZ) classification requires all input datasets to be homogenized in terms of exten, spatial resolution and projection. This notebook prepares the Sentinel-2 data downloaded from the 01_Data_Aquisition notebook for ML model training. Here are the steps:

1. Project Setup
2. Merge tiles from each band
3. Clip merged tiles to extent of study area
4. Resample to 30 m Resolution, as reccomended by Absaraori et al. 2024


### **1. Project Setup**

#### 1.1 Import Libraries

In [2]:
%load_ext autoreload
%autoreload 

import sys
import os

# Add the module's parent directory to sys.path
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)


## Import local LCZ Classification libraries
from lcz_classification.config import *
from lcz_classification.util import merge_rasters, resample_da, clip_raster

## Import required libraries
import rioxarray as rio
import pandas as pd
import rasterio as r
from shapely.geometry import box
import geopandas as gpd
import xarray as xr


#### 1.2 Setup Paths

In [3]:

## PROJECT CONFIGURATION ##
# ===============================================================================================================================================================

STUDY_AREA_GDF=gpd.read_file(STUDY_AREA_FP) # Read study area bounds as GeoDataFrame

CRS=STUDY_AREA_GDF.estimate_utm_crs() # Retrieve local UTM CRS based on bounds of study area

STUDY_AREA_GDF=STUDY_AREA_GDF.to_crs(CRS) # Reproject study area GeoDataFrame to project CRS

BBOX=box(*list(STUDY_AREA_GDF.total_bounds)) # Create Bounding Box Polygon
BBOX= BBOX.buffer(15, join_style=2) # Create 15 meter buffer for bounding box

### **2. Merge Tiles from Each Band**
    
This section reads band data from each scene downloaded in the sentinel-2 data directory. A single tile is then created for each band using the merge_rasters() method. The band tiles are clipped to the bounding box of the study area created in section 1.2.

#### 2.1. Prepare DataFrame of Available Scenes

In [4]:


sent2_dict=dict() # Create an empty dictionary

# Retrieve Sentinel-2 scene names from the SENT2_DIR directory
sent2_scenes = [f"{SENT2_DIR}/{scene}" for scene in os.listdir(SENT2_DIR) if ".geojson" not in scene] # prepare file paths of DSM tiles

# Create a Pandas DataFrame of the available Sentinel-2 Scenes
sent2_dfs=list()
for scene_path in sent2_scenes:
    
    scene_df=pd.DataFrame(
        data=dict(
            band = [x.split(".")[0] for x in os.listdir(scene_path)],
            file_path=[f"{scene_path}/{x}" for x in os.listdir(scene_path)]

        )
    )
    scene_id=scene_path.split("/")[-1]
    scene_df["scene"] = scene_id
    scene_df["date"] = scene_id.split("_")[2]
    sent2_dfs.append(scene_df)

# Create a single dataframe with pd.concat(), this results in a single data frame with required metadata to filter and read the desired tiles for the next steps.
sent2_df=pd.concat(sent2_dfs)



#### 2.2. Merge tiles from all scenes into single bands and clip

In [5]:

# Group scene DataFrame by band and iterate over each band.
sent2_grouped=sent2_df.groupby("band") 
for band in sent2_grouped:
    raster_paths= band[-1].file_path.values # Get GeoTIFF file paths of all  tiles under this band
   
    out_path=f"{SENT2_MERGED_DIR}/{band[0]}.tif"  # Configure output file path for merged band raster

    if os.path.exists(out_path):
        print(f"Already exists: {out_path}")
    else:
        merge_rasters(raster_paths,out_path) # Merge all band tiles into a single raster, pass raster file paths as a list
        print(f"Exported merged raster for {band[0]}.tif")
    

Already exists: ../data/processed/sentinel2/merged/B02.tif
Already exists: ../data/processed/sentinel2/merged/B03.tif
Already exists: ../data/processed/sentinel2/merged/B04.tif
Already exists: ../data/processed/sentinel2/merged/B05.tif
Already exists: ../data/processed/sentinel2/merged/B06.tif
Already exists: ../data/processed/sentinel2/merged/B07.tif
Already exists: ../data/processed/sentinel2/merged/B11.tif
Already exists: ../data/processed/sentinel2/merged/B12.tif
Already exists: ../data/processed/sentinel2/merged/B8A.tif


#### **3. Clip Band Tiles**

In [6]:

band_tiles_fp = [f"{SENT2_MERGED_DIR}/{band}" for band in os.listdir(SENT2_MERGED_DIR)] # prepare file paths of merged band rasters

# Iterate over each band raster
for band_tile_fp in band_tiles_fp:

    clipped_path=SENT2_CLIPPED_DIR + "/" + band_tile_fp.replace(".tif","_clipped.tif").split("/")[-1] # Configure output path of clipped raster (per band)
    if os.path.exists(clipped_path):
        print(f"Already exists: {clipped_path}")
    else:
        # Clip raster using clip_raster() method
        clip_raster(raster_path=band_tile_fp,
                    bbox=BBOX,
                    out_path=clipped_path
                    )
   


Already exists: ../data/processed/sentinel2/clipped/B02_clipped.tif
Already exists: ../data/processed/sentinel2/clipped/B03_clipped.tif
Already exists: ../data/processed/sentinel2/clipped/B04_clipped.tif
Already exists: ../data/processed/sentinel2/clipped/B05_clipped.tif
Already exists: ../data/processed/sentinel2/clipped/B06_clipped.tif
Already exists: ../data/processed/sentinel2/clipped/B07_clipped.tif
Already exists: ../data/processed/sentinel2/clipped/B11_clipped.tif
Already exists: ../data/processed/sentinel2/clipped/B12_clipped.tif
Already exists: ../data/processed/sentinel2/clipped/B8A_clipped.tif


### **4. Resample to 30 m and stack to a single GeoTiFF**

#### 4.1 Resample Band Rasters to 30 m 

In [7]:

band_tiles_fp = [f"{SENT2_CLIPPED_DIR}/{band}" for band in os.listdir(SENT2_CLIPPED_DIR)] # prepare file paths of DSM tiles
band_tiles=[rio.open_rasterio(band_tile_fp) for band_tile_fp in band_tiles_fp] # Read all band tiles into a list of xarray DataArrays

ref=band_tiles.pop(0) # Set first band as reference for resampling other rasters 
ref=resample_da(ref,CELL_RESOLUTION) # Resample reference raster to 30m

# Resample all band tiles to the same resolution - 20 m is reccomended for this project
band_tiles_resampled=[band_tile.rio.reproject_match(ref) for band_tile in band_tiles]
band_tiles_resampled.insert(0,ref)

Resampling input raster to 30 m resolution


#### 4.2. Create Multi-band Raster with All Bands


In [8]:
# Stack all resampled bands into a single dataset
s2=xr.concat(band_tiles_resampled, dim="band")
s2= s2.assign_coords(band=[x.split("/")[-1][:3] for x in band_tiles_fp] )# update band names
s2.attrs["bands"] = [x.split("/")[-1][:3] for x in band_tiles_fp]
s2 = s2.rio.reproject(dst_crs=CRS) # reproject to project CRS - local UTM zone derived from gpd.estimate_utm_crs()

#### 4.3. Write Multi-band Raster to File

In [9]:
out_path=f"{SENT2_RESAMPLED_DIR}/s2_{sent2_df.date.iloc[0]}_{CELL_RESOLUTION}m.tif" # Configure output path of multi-band raster

s2.rio.to_raster(out_path) # Write to GeoTIFF
print("Exported multiband sentinel-2 data")

Exported multiband sentinel-2 data
