# Notes

This notebook is an example of a PRE-PROCESSING pipeline for satellite images with LAI data.
It is not meant to be run as a script, but rather to be used as a reference for how to use the preprocessing functions.

**IMPORTANT** This notebook assumes one has stored a series of *unpacked* RAS and RHD files in the VISTA format containing 
- The Leaf Area Index (LAI) values as a tensor of images over time.
- The scene classification layer (SCL) of the above images.
- Information of the datetimes and the coordinates of the images. 

The VISTA format is not publicly available, but the functions in this notebook can be used as a reference for how to preprocess satellite images in general.

*Example scenario*: We have downloaded the RAS and RHD files containing the LAI values for a sentinel-2 tile (~12k by 12k image) over the span of 2020. We want to preprocess this data for segmentation of fields.

*Author*: Jens d'Hondt (TU Eindhoven)

In [1]:
# General imports
import os
import datetime as dt
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import glob
import sys
from eolearn.core import EOPatch, FeatureType, OverwritePermission

# 0 Unpacking the data (optional)

This step is optional for when one had not yet unpacked the VISTA zips containing the RAS and RHD files with LAI values into a series of .npy files, named by the date of the image.
(e.g. `2020_1_1.npy`, `2020_1_2.npy`, ...) 
The unpacking can be done using the `unpack_vista` function in `src/pipeline/preprocessing/vista.py`, as shown below.

**Assumptions**:
- One has downloaded the VISTA zips and stored them in a folder `$DATADIR`.
- The zip is named `LAI.zip` and contains a series of RAS and RHD files named by the tile_id, and the date, month or year of the data in the image (e.g. `30TYQ_LAI_2022.RHD`, `30TYQ_LAI_2022.RAS`, ...).
- Those RAS and RHD files can contain multiple images, each with a different date, month or year.

**Process**:
The function `unpack_vista` will do the following:
1. Unzip the zip files in `$DATADIR` for each specified band.
2. For each unzipped RAS and RHD file, extract the images and store them in a series of .npy files named by the date of the individual image.

In [2]:
from pipeline.src.preprocessing.vista_preprocessing import unpack_vista

bands = ['LAI']

# DATADIR = "FILL IN PATH HERE"
DATADIR = "/home/jens/ownCloud/Documents/3.Werk/0.TUe_Research/0.STELAR/VISTA/VISTA_code/data/segmentation_example"

unpack_vista(DATADIR, bands)

Band LAI already unzipped


# 1 LAI to CSV

In this step we will ADD the LAI values of a tile image to csv files expressing timeseries of LAI values for each pixel OR field (extracted in the segmentation pipeline).

The values for collections of pixels and/or fields are stored as column-major csv file with pixel/field ids as columns, and dates as rows. This is to facilitate appending of new data.
The values for pixels will be partitioned by *patchlets*, which are subsets of the full image. This is done to reduce the size of the csv files.

**Assumptions**:
- One has unpacked the image and stored it as a `.npy` file including the date of the image (e.g. `2020_07_12.npy`).

## 1.1 LAI to CSV: pixels

**Process**:
The function `lai_to_csv_px_append` will do the following:

1. Break up the image into a series of patchlets.
4. Add LAI values for each pixel to the respective csv file.

In [2]:
from src.preprocessing.timeseries import lai_to_csv_px_append

DATADIR = "/home/jens/ownCloud/Documents/3.Werk/0.TUe_Research/0.STELAR/VISTA/VISTA_code/data/segmentation_example"
npy_path = os.path.join(DATADIR, 'LAI', '2020_07_12.npy')
outdir = os.path.join(DATADIR, 'lai_px_timeseries')

lai_to_csv_px_append(npy_path, outdir)

Saving timeseries for patchlet 81/81

In [3]:
# Check csv file
csv_path = "/home/jens/ownCloud/Documents/3.Werk/0.TUe_Research/0.STELAR/VISTA/VISTA_code/data/segmentation_example/lai_px_timeseries/patchlet_0_0.csv"

df = pd.read_csv(csv_path, usecols=np.arange(10))

df.tail()

Unnamed: 0,index,0_0,1_0,2_0,3_0,4_0,5_0,6_0,7_0,8_0
246,2022-12-06,-910,-910,-910,-910,-910,-910,-910,-910,-910
247,2022-12-11,3660,3657,3842,3685,3629,4241,4255,4122,2869
248,2022-12-16,2949,3017,3142,3211,3225,3882,3635,3353,2198
249,2022-12-21,-910,-910,-910,-910,-910,-910,-910,-910,-910
250,2020-07-12,4436,5735,6061,6114,6046,5814,5953,5793,5987


## 2.2 LAI to CSV: field values

**Process**:
The function `lai_to_csv_field_append` will do the following:

1. Temporarily save the npy file as an eopatch (necessary for masking with field shapes)
2. Temporarily save the npy file as a tiff (necessary for masking with field shapes)
3. Load the tiff.
4. For each field:
    5. Mask the LAI values for the field.
    6. Take the median of the LAI values for the field for each date.
    7. Append the LAI values for the field to the corresponding csv file.

In [3]:
from src.preprocessing.timeseries import lai_to_csv_field_append

DATADIR = "/home/jens/ownCloud/Documents/3.Werk/0.TUe_Research/0.STELAR/VISTA/VISTA_code/data/segmentation_example"
npy_path = os.path.join(DATADIR, 'LAI', '2020_07_12.npy')
bbox_path = os.path.join(DATADIR, "LAI", 'bbox.pkl')
fields_path = os.path.join(DATADIR, 'fields.gpkg')

lai_to_csv_field_append(npy_path, bbox_path, fields_path, outdir=DATADIR)

Temporarily saving npy as eopatch
Processing eopatch 1/1
3. Masking tiff and saving timeseries

100%|██████████| 83198/83198 [02:56<00:00, 471.78it/s]


Done
Removing temporarily saved eopatch


In [4]:
# Read csv to check
path = "/home/jens/ownCloud/Documents/3.Werk/0.TUe_Research/0.STELAR/VISTA/VISTA_code/data/segmentation_example/lai_field_timeseries.csv"

df = pd.read_csv(path, usecols=np.arange(0, 10))

df.tail()

Unnamed: 0.1,Unnamed: 0,0,3,4,5,6,7,8,9,10
261,2020-03-26,1922.0,518.0,350.5,240.0,159.5,3356.0,2272.0,1444.0,3186.0
262,2020-03-31,1636.0,660.0,460.5,426.0,243.5,3076.0,1760.0,1308.0,2892.0
263,2020-04-03,1534.0,360.0,308.0,182.0,136.5,3148.0,2092.0,1351.0,3064.0
264,2020-04-05,1477.0,232.0,246.5,187.0,126.5,2968.0,2027.0,1284.0,2916.0
265,2020-07-12,2968.0,,2756.0,787.5,253.0,,873.0,1709.0,3244.0
