# **Producing tweening datasets** <br>
This notebook describes the process of creating a dataset for tweening, or temporal gap filling. The tweening dataset consists of hourly stacked full tile inputs and is  <u>created from the data  you have downloaded and preprocessed</u> using the [download](./4_download_data.ipynb) and [preprocess](./5_preprocess_data.ipynb) workflows. 

Please ensure the data directories set below have valid HLS and ERA5 hourly datasets. 

Activate the environment created in the [Getting Started Notebook](./1_getting_started.ipynb)

In [1]:
# Import libraries
import os
import glob
import rasterio
import tqdm
from utils.tweening import *


#### 1. Set data parameters for single tile acquisition you would like to create a tweening dataset for

In [35]:
processed_data_path = "../data/processed_data/"   #path to processed hls data and lsts
city_iso = "johannesburg_zaf"
hls_date = "20211103"   # choose a date from data you have preprocessed
tile_id = "T35JNM"      
tweening_period = 7     # days to tween for

# path to save stacked hourly inputs
data_directory = os.path.join("../data/processed_tweening_data/")
if os.path.exists(data_directory)==False:
    os.mkdir(data_directory) 

#### 2. Replicate the HLS bands for every hour of the tweening period selected and stack these

In [36]:
# Gather hls bands for date and tile_id

band_string = os.path.join(processed_data_path, "hls-bands/") + city_iso + "." + tile_id + "." + hls_date + "*.tif"
hls_bands = sorted(glob.glob(band_string))

# Gather corresponding lst file

lst_string = os.path.join(processed_data_path, "target-lst/") + city_iso + "." + tile_id + "." + hls_date + "*.tif"
lst_band = glob.glob(lst_string)

In [37]:
# Process lst band to get grid for inputs

lst_array, grid_out, meta, crs = process_target_band(lst_band[0]) #[0] assuming there is only one lst file

In [38]:
# Extract arrays from individual hls bands

processed_bands = process_hls_bands(hls_bands)

In [39]:
# Stack these arrays

stacked = stacking(processed_bands, grid_out, crs)

In [40]:
# Write out the first stacked geotiff

file_name = city_iso + "." + tile_id + "." + hls_date + ".T000000.input_file.tif"
save_file = os.path.join(data_directory, file_name)
stacked.rio.to_raster(save_file, driver="COG", dtype="float32")

In [None]:
# Duplicate for every hour in tweening period

duplicate_hls_bands(save_file, hls_date, tweening_period, data_directory, city_iso, tile_id)

#### 3. Include hourly ERA5 2m_temperature data for every input file

In [32]:
# Gather input files

files_to_update = glob.glob(os.path.join(data_directory, "*.input_file.tif"))

In [33]:
# Gather ERA5 files for city

era5_dir = "../data/downloaded_data/era5/"
era5_cities, all_era5_inputs = files_extractor(era5_dir)
era5_city = filter_city(city_name=city_iso, lst=all_era5_inputs)

# reduce the ERA5 data - filter for the year you are considering

era5_city_year = filter_year(year= hls_date[0:4], files = era5_city)

In [34]:
# Update input file with ERA5 bands

with tqdm(total=len(files_to_update)) as pbar:
    for file in files_to_update:
        add_era5_stack(file, city_iso, grid_out, crs, era5_city_year)
        # remove HLS only stacked input 
        os.remove(file)
        pbar.update(1)

  0%|          | 0/168 [00:00<?, ?it/s]

#### 4. You are now able to run inference on the granite-geospatial-land-surface-temperature model using these hourly stacked-tiles

Refer to the [Getting Started Notebook](./1_getting_started.ipynb) and the [introduction to LST Tweening](./2_introduction_to_LST_Tweening.ipynb) notebooks.