## Average Evaporation calculation
This simple notebook takes in a Kriging interpolated NetCDF file and uses that to calculate the time averaged and spatial averaged evaporation values. THis has previously been demonstrated to create acceptably accurate results to estimate the actual evaporation values in certain areas. For more information we would like to refer you to our report. 

In [1]:
import numpy as np
import pandas as pd
import glob
import os.path
import sys
import pathlib
import platform 
import xarray as xr

In [11]:
# If errors occur here please refer to the readme file or to the file_imports.py folders. 

cwd = pathlib.Path().resolve()
src = cwd.parent
data = src.parent.parent.parent
root = src.parent
OS_type = platform.system()
sys.path.append(str(src))
sys.path.append(str(root))
from utils.file_imports import *


data_paths = file_paths(root, TAHMO = True)
raw_files = data_paths[2]

The first entry is pointing to /Users/matskerver/Documents/data_tana/TAHMO/raw_TAHMO, the second one to /Users/matskerver/Documents/data_tana/TAHMO/processed_TAHMO and the third one to /Users/matskerver/Documents/data_tana/TAHMO/interpolated_TAHMO. Animations will be put located in /Users/matskerver/Documents/data_tana/TAHMO/results


In [9]:
# Function calculating the mean of the entire dataset after incorrect values are removed. 

def calculate_evaporation_average(ds):
    average_evap = ds_trim['evap'].mean().values
    return average_evap

In [10]:
# This cell cleans up the Kriging interpolation data as some NaN values were still present in the TAHMO data. 
# As the evaporation is averaged in time and space it is not needed to have all the datapoints. So this cell
# removes any timestep with NaN values and then calculates the average of the timesteps that remain. 


# Open the kriging NetCDF file and define neccesary variables. Max file size netcdf_file = 4Gb. 
netcdf_file = 'kriging_results_evap.nc'
ds = xr.open_dataset(os.path.join(raw_files, netcdf_file))

variable = 'evap'
file_paths = []
cleaning = True

while cleaning: 
    
    time_indices = range(0, ds.dims['time'], 1)  
    yes = 0
    no = 0
    timestamps = []
    
    for time_index in time_indices: #We loop through all the timesteps and check them individually
        selected_data = ds.isel(time=time_index)
        data_array = selected_data[variable]

        # Both negative and NaN values indicate potential problems in the data and are detected here
        nan_check = data_array.isnull()
        negative_check = data_array < 0
        has_nan = nan_check.any()
        has_negative = negative_check.any()

        # If any anomoly is detected it is added to the list of values to be removed
        if has_nan.values or has_negative.values:
            timestamps.append(time_index)
            yes += 1
        else: 
            no += 1
    
    if (yes == 0):
        # When no more potential issues are detected in the file we can proceed to calculate the average value
        
        cleaning = False
        print(f'No negative values detected, proceeding to average evaporation value')
        average_evap = calculate_evaporation_average(ds)
        print(f'Completed. The average evaporation data for this dataset is {average_evap}mm/day')
        break;
        
        
    print(f'The used dataset contained {yes} timestamps with potential invalid data.')
    print(f'It contained {no} timestamps where no anomalies were detected.')
    print('Proceeding to remove these values...')

    #A mask is created containing all the timesteps that don't contain NaN values. They are then selected and 
    #all the other steps are removed from the dataset. 
    mask = ~np.isin(range(ds.dims['time']), timestamps)
    ds_trim = ds.isel(time=mask)
    ds = ds_trim

    print(f'Removed {yes} values correctly, checking new dataset...')

The used dataset contained 525 timestamps with potential invalid data.
It contained 1666 timestamps where no anomalies were detected.
Proceeding to remove these values...
Removed 525 values correctly, checking new dataset...
No negative values detected, proceeding to average evaporation value
Completed. The average evaporation data for this dataset is 0.41639290443606647mm/day


In the test case the value can simply be copied over to the dataset compiling script. In any final application the file can simply be saved as a NetCDF or .csv file. Alternatively, the function can be simply put in a separate script and then imported in another notebook. 