# Cross Validation for IDW Interpolation 
## Task 2A (continuous)

This document includes Python codes that conduct cross validation (CV) for Inverse Distance Weighting (IDW) Interpolation on water quality parameters, including 4 water quality parameters in arcpy environment:
- Dissolved oxygen (DO_mgl)
- Salinity (Sal_ppt)
- Turbidity (Turb_ntu)
- Temperature (T_c)

The analysis is conducted in the separate water bodies:
- Guana Tolomato Matanzas (GTM)
- Estero Bay (EB)
- Charlotte Harbor (CH)
- Biscayne Bay (BB)
- Big Bend Seagrasses (BBS)

**Tasks:**  

- Task 2A Calculate the RMSE and Mean Error (ME) for IDW results using both continuous and discrete data

- **Task 2B Calculate the RMSE and Mean Error (ME) for IDW results using continuous data.**


Time periods one year before and after storm event for Task 2A tests (seasons).
<br>
<div style="text-align: left;">
    <img src="misc/TimePeriods.png" style="display: block; margin-left: 0; margin-right: auto; width: 600px;"/>
</div>

Summary of IDW and RK Accuracy Assessments.
<br>
<div style="text-align: left;">
    <img src="misc/Table3.png" style="display: block; margin-left: 0; margin-right: auto; width: 600px;"/>
</div>

**Contents:**
* [1. Data Preprocess](#reg_preprocessing)
    * [1.1 Subsetting Dataset](#reg_subset)
    * [1.2 Preview Dataset](#reg_preview)
    * [1.3 Fill Unique ID](#reg_id)
* [2. Create Shapefile](#reg_create_shp)
* [3. Cross Validation for IDW](#reg_cv_idw)

In [1]:
import pandas as pd
import numpy as np
import arcpy
from arcpy.sa import *
import os
import math

import importlib
import sys
#path = r'M:\2024\WQ\Spring\IDW\git\misc'
path = r'E:\Projects\SEACAR_WQ_2024\git\misc'
sys.path.insert(0, path)
import IDW
importlib.reload(IDW)

# define scratch folder to avoid overwritting from parallel threats
arcpy.env.scratchWorkspace = r"E:\Projects\SEACAR_WQ_2024\scratch/IDW_con"

# 1. Data Preprocessing <a class="anchor" id="reg_preprocessing"></a>

In [2]:
gis_path = r'E:\Projects\SEACAR_WQ_2024/GIS_Data/'
dfCon = pd.read_csv(gis_path + 'OEAT_Continuous_WQ-2024-Jan-16.csv', low_memory=False)

## 1.1 Subsetting Data <a class="anchor" id="reg_subset"></a>

### Include the time period from 9 am to 17 pm in a day

In [3]:
dfCon['SampleDate'] = pd.to_datetime(dfCon['SampleDate'], format='%b %d %Y %I:%M%p')

In [4]:
# Include date from 9:00 am to 17:00 pm
start_time = '09:00'
end_time = '17:00'

dfConTime = dfCon[dfCon['SampleDate'].dt.time.between(pd.to_datetime(start_time).time(), pd.to_datetime(end_time).time())]
dfConTime.head()

Unnamed: 0,RowID,ProgramID,ParameterName,ParameterUnits,ProgramLocationID,ActivityType,SampleDate,Year,Month,RelativeDepth,ResultValue,Latitude_DD,Longitude_DD,ManagedAreaName,AreaID,SEACAR_QAQCFlagCode,WaterBody,WbodyAcronym,Season
0,88286023,474,Water Temperature,Degrees C,EB04,,2022-09-08 09:15:00,2022,9,bottom,31.3,26.449685,-81.871465,Estero Bay Aquatic Preserve,14,6Q,Estero Bay,EB,Summer
2,88291057,474,Water Temperature,Degrees C,EB01,,2022-08-30 13:00:00,2022,8,bottom,31.0,26.4349,-81.9114,Estero Bay Aquatic Preserve,14,6Q,Estero Bay,EB,Summer
3,88294267,474,Water Temperature,Degrees C,EB01,,2022-08-29 15:45:00,2022,8,bottom,31.9,26.4349,-81.9114,Estero Bay Aquatic Preserve,14,6Q,Estero Bay,EB,Summer
5,88302636,474,Water Temperature,Degrees C,EB01,,2022-09-09 15:15:00,2022,9,bottom,30.9,26.4349,-81.9114,Estero Bay Aquatic Preserve,14,6Q,Estero Bay,EB,Summer
6,88302639,474,Water Temperature,Degrees C,EB01,,2022-09-10 14:15:00,2022,9,bottom,30.5,26.4349,-81.9114,Estero Bay Aquatic Preserve,14,6Q,Estero Bay,EB,Summer


### Aggregate observations at the same location into an average point (continuous data)

In [5]:
dfCon_Mean = dfConTime.groupby(['WaterBody','ParameterName','ParameterUnits', 'Year','Season','Latitude_DD','Longitude_DD','WbodyAcronym'])["ResultValue"].agg("mean").reset_index()
dfCon_Mean

Unnamed: 0,WaterBody,ParameterName,ParameterUnits,Year,Season,Latitude_DD,Longitude_DD,WbodyAcronym,ResultValue
0,Big Bend Seagrasses,Dissolved Oxygen,mg/L,2015,Fall,29.287817,-83.166083,BBS,5.849359
1,Big Bend Seagrasses,Dissolved Oxygen,mg/L,2015,Fall,29.813933,-83.628917,BBS,6.660736
2,Big Bend Seagrasses,Dissolved Oxygen,mg/L,2015,Spring,29.101817,-83.076467,BBS,7.408284
3,Big Bend Seagrasses,Dissolved Oxygen,mg/L,2015,Spring,29.287817,-83.166083,BBS,6.454961
4,Big Bend Seagrasses,Dissolved Oxygen,mg/L,2015,Spring,29.813933,-83.628917,BBS,7.590892
...,...,...,...,...,...,...,...,...,...
1668,Guana Tolomato Matanzas,Water Temperature,Degrees C,2022,Spring,30.050857,-81.367465,GTM,23.298184
1669,Guana Tolomato Matanzas,Water Temperature,Degrees C,2022,Winter,29.667071,-81.257403,GTM,19.384242
1670,Guana Tolomato Matanzas,Water Temperature,Degrees C,2022,Winter,29.737041,-81.245953,GTM,19.891515
1671,Guana Tolomato Matanzas,Water Temperature,Degrees C,2022,Winter,29.868851,-81.307428,GTM,20.147273


### Preset abbreviation for waterbody and parameter name

In [6]:
area_shortnames = {
    'Guana Tolomato Matanzas': 'GTM',
    'Estero Bay': 'EB',
    'Charlotte Harbor': 'CH',
    'Biscayne Bay': 'BB',
    'Big Bend Seagrasses':'BBS'
}

param_shortnames = {
    'Salinity': 'Sal_ppt',
    'Total Nitrogen': 'TN_mgl',
    'Dissolved Oxygen': 'DO_mgl',
    'Turbidity':'Turb_ntu',
    'Secchi Depth':'Secc_m',
    'Water Temperature':'T_c'
}

### Define the barrier files

In [7]:
barrier_folder = os.path.join(gis_path, 'Barriers')
barrier_folder

barriers = []
for file in os.listdir(barrier_folder):
    if file.endswith(".shp"):
        barriers.append(os.path.join(barrier_folder, file))

for barrier in barriers:
    print(barrier)

E:\Projects\SEACAR_WQ_2024/GIS_Data/Barriers\BBS_Barriers.shp
E:\Projects\SEACAR_WQ_2024/GIS_Data/Barriers\BB_Barriers.shp
E:\Projects\SEACAR_WQ_2024/GIS_Data/Barriers\CH_Barriers.shp
E:\Projects\SEACAR_WQ_2024/GIS_Data/Barriers\EB_Barriers.shp
E:\Projects\SEACAR_WQ_2024/GIS_Data/Barriers\GTM_Barriers.shp


### Define waterbody boundary for spatial extent and masking

In [8]:
waterbody_extent = os.path.join(gis_path, 'OEAT_Waterbody_Boundaries', 'OEAT_Waterbody_Boundary.shp')

unique_waterbodies = []
with arcpy.da.SearchCursor(waterbody_extent, ['WaterbodyA']) as cursor:
    for row in cursor:
        unique_waterbodies.append(row[0])

print("Unique Waterbodies:", unique_waterbodies)

Unique Waterbodies: ['BBS', 'BB', 'CH', 'EB', 'GTM']


### Define interested study periods and parameters

In [9]:
seasons_con = pd.read_csv(gis_path + 'Seasons_con.csv', low_memory=False)

### Define output folders

In [10]:
shpCon_folder = gis_path + r"shapefiles_Con"
idwCon_folder = gis_path + r"idw_Con"

# 1.2 Preview Dataset <a class="anchor" id="reg_preview"></a>

In [11]:
dfCon_Mean

Unnamed: 0,WaterBody,ParameterName,ParameterUnits,Year,Season,Latitude_DD,Longitude_DD,WbodyAcronym,ResultValue
0,Big Bend Seagrasses,Dissolved Oxygen,mg/L,2015,Fall,29.287817,-83.166083,BBS,5.849359
1,Big Bend Seagrasses,Dissolved Oxygen,mg/L,2015,Fall,29.813933,-83.628917,BBS,6.660736
2,Big Bend Seagrasses,Dissolved Oxygen,mg/L,2015,Spring,29.101817,-83.076467,BBS,7.408284
3,Big Bend Seagrasses,Dissolved Oxygen,mg/L,2015,Spring,29.287817,-83.166083,BBS,6.454961
4,Big Bend Seagrasses,Dissolved Oxygen,mg/L,2015,Spring,29.813933,-83.628917,BBS,7.590892
...,...,...,...,...,...,...,...,...,...
1668,Guana Tolomato Matanzas,Water Temperature,Degrees C,2022,Spring,30.050857,-81.367465,GTM,23.298184
1669,Guana Tolomato Matanzas,Water Temperature,Degrees C,2022,Winter,29.667071,-81.257403,GTM,19.384242
1670,Guana Tolomato Matanzas,Water Temperature,Degrees C,2022,Winter,29.737041,-81.245953,GTM,19.891515
1671,Guana Tolomato Matanzas,Water Temperature,Degrees C,2022,Winter,29.868851,-81.307428,GTM,20.147273


## 1.3 Fill NaN RowID with unique ID, IDW function needs unique ID <a class="anchor" id="reg_id"></a>

In [12]:
IDW.fill_nan_rowids(dfCon_Mean, 'RowID')
dfCon_Mean

Unnamed: 0,WaterBody,ParameterName,ParameterUnits,Year,Season,Latitude_DD,Longitude_DD,WbodyAcronym,ResultValue,RowID
0,Big Bend Seagrasses,Dissolved Oxygen,mg/L,2015,Fall,29.287817,-83.166083,BBS,5.849359,1
1,Big Bend Seagrasses,Dissolved Oxygen,mg/L,2015,Fall,29.813933,-83.628917,BBS,6.660736,2
2,Big Bend Seagrasses,Dissolved Oxygen,mg/L,2015,Spring,29.101817,-83.076467,BBS,7.408284,3
3,Big Bend Seagrasses,Dissolved Oxygen,mg/L,2015,Spring,29.287817,-83.166083,BBS,6.454961,4
4,Big Bend Seagrasses,Dissolved Oxygen,mg/L,2015,Spring,29.813933,-83.628917,BBS,7.590892,5
...,...,...,...,...,...,...,...,...,...,...
1668,Guana Tolomato Matanzas,Water Temperature,Degrees C,2022,Spring,30.050857,-81.367465,GTM,23.298184,1669
1669,Guana Tolomato Matanzas,Water Temperature,Degrees C,2022,Winter,29.667071,-81.257403,GTM,19.384242,1670
1670,Guana Tolomato Matanzas,Water Temperature,Degrees C,2022,Winter,29.737041,-81.245953,GTM,19.891515,1671
1671,Guana Tolomato Matanzas,Water Temperature,Degrees C,2022,Winter,29.868851,-81.307428,GTM,20.147273,1672


# 2. Create Shapefile <a class="anchor" id="reg_create_shp"></a>

In [13]:
IDW.delete_all_files(shpCon_folder)

In [14]:
# Merge interested with latitude and longitude columns
seasons_con_coord = IDW.merge_with_lat_long(seasons_con, dfCon_Mean)
seasons_con_coord

Unnamed: 0,WaterBody,Year,Season,Parameter,Filename,NumDataPoints,RMSE,ME,Latitude_DD,Longitude_DD,RowID,ResultValue
0,Guana Tolomato Matanzas,2015,Fall,Salinity,,,,,29.667071,-81.257403,1304,17.855211
1,Guana Tolomato Matanzas,2015,Fall,Salinity,,,,,29.737041,-81.245953,1305,33.437808
2,Guana Tolomato Matanzas,2015,Fall,Salinity,,,,,29.868851,-81.307428,1306,32.107294
3,Guana Tolomato Matanzas,2015,Fall,Salinity,,,,,30.050857,-81.367465,1307,21.881559
4,Guana Tolomato Matanzas,2015,Winter,Salinity,,,,,29.667071,-81.257403,1316,21.540676
...,...,...,...,...,...,...,...,...,...,...,...,...
494,Big Bend Seagrasses,2021,Spring,Water Temperature,,,,,29.647203,-83.421196,105,23.379975
495,Big Bend Seagrasses,2021,Summer,Water Temperature,,,,,29.647203,-83.421196,106,28.912870
496,Big Bend Seagrasses,2021,Fall,Water Temperature,,,,,29.647203,-83.421196,104,23.467407
497,Big Bend Seagrasses,2021,Winter,Water Temperature,,,,,29.647203,-83.421196,107,16.404781


In [15]:
IDW.create_shp_season(seasons_con_coord, shpCon_folder)

Number of data rows for BBS, DO_mgl, 2020, Fall: 1
Shapefile for BBS: DO_mgl for year 2020 and season Fall has been saved as SHP_BBS_DO_mgl_2020_Fall.shp
Number of data rows for BBS, Sal_ppt, 2020, Fall: 1
Shapefile for BBS: Sal_ppt for year 2020 and season Fall has been saved as SHP_BBS_Sal_ppt_2020_Fall.shp
Number of data rows for BBS, Turb_ntu, 2020, Fall: 1
Shapefile for BBS: Turb_ntu for year 2020 and season Fall has been saved as SHP_BBS_Turb_ntu_2020_Fall.shp
Number of data rows for BBS, T_c, 2020, Fall: 1
Shapefile for BBS: T_c for year 2020 and season Fall has been saved as SHP_BBS_T_c_2020_Fall.shp
Number of data rows for BBS, DO_mgl, 2020, Summer: 1
Shapefile for BBS: DO_mgl for year 2020 and season Summer has been saved as SHP_BBS_DO_mgl_2020_Summer.shp
Number of data rows for BBS, Sal_ppt, 2020, Summer: 1
Shapefile for BBS: Sal_ppt for year 2020 and season Summer has been saved as SHP_BBS_Sal_ppt_2020_Summer.shp
Number of data rows for BBS, Turb_ntu, 2020, Summer: 1
Shapef

Shapefile for BB: DO_mgl for year 2022 and season Summer has been saved as SHP_BB_DO_mgl_2022_Summer.shp
Number of data rows for BB, Sal_ppt, 2022, Summer: 6
Shapefile for BB: Sal_ppt for year 2022 and season Summer has been saved as SHP_BB_Sal_ppt_2022_Summer.shp
Number of data rows for BB, Turb_ntu, 2022, Summer: 6
Shapefile for BB: Turb_ntu for year 2022 and season Summer has been saved as SHP_BB_Turb_ntu_2022_Summer.shp
Number of data rows for BB, T_c, 2022, Summer: 6
Shapefile for BB: T_c for year 2022 and season Summer has been saved as SHP_BB_T_c_2022_Summer.shp
Number of data rows for BB, DO_mgl, 2022, Winter: 6
Shapefile for BB: DO_mgl for year 2022 and season Winter has been saved as SHP_BB_DO_mgl_2022_Winter.shp
Number of data rows for BB, Sal_ppt, 2022, Winter: 6
Shapefile for BB: Sal_ppt for year 2022 and season Winter has been saved as SHP_BB_Sal_ppt_2022_Winter.shp
Number of data rows for BB, Turb_ntu, 2022, Winter: 6
Shapefile for BB: Turb_ntu for year 2022 and season W

Shapefile for EB: T_c for year 2016 and season Winter has been saved as SHP_EB_T_c_2016_Winter.shp
Number of data rows for EB, DO_mgl, 2017, Fall: 3
Shapefile for EB: DO_mgl for year 2017 and season Fall has been saved as SHP_EB_DO_mgl_2017_Fall.shp
Number of data rows for EB, Sal_ppt, 2017, Fall: 3
Shapefile for EB: Sal_ppt for year 2017 and season Fall has been saved as SHP_EB_Sal_ppt_2017_Fall.shp
Number of data rows for EB, Turb_ntu, 2017, Fall: 3
Shapefile for EB: Turb_ntu for year 2017 and season Fall has been saved as SHP_EB_Turb_ntu_2017_Fall.shp
Number of data rows for EB, T_c, 2017, Fall: 3
Shapefile for EB: T_c for year 2017 and season Fall has been saved as SHP_EB_T_c_2017_Fall.shp
Number of data rows for EB, DO_mgl, 2017, Spring: 3
Shapefile for EB: DO_mgl for year 2017 and season Spring has been saved as SHP_EB_DO_mgl_2017_Spring.shp
Number of data rows for EB, Sal_ppt, 2017, Spring: 3
Shapefile for EB: Sal_ppt for year 2017 and season Spring has been saved as SHP_EB_Sal_

# 3. Cross Validation for IDW <a class="anchor" id="reg_cv_idw"></a>

In [17]:
# Empty the shapefile folder
IDW.delete_all_files(idwCon_folder)

In [None]:
# If the number of data points is less than 3，skipping calculate IDW
IDW.idw_interpolation(seasons_con, shpCon_folder, idwCon_folder, waterbody_extent, barrier_folder)

Processing file: SHP_GTM_Sal_ppt_2015_Fall.shp
Calculated RMSE: 11.465092111629327, ME: 2.617005565025001 for file: SHP_GTM_Sal_ppt_2015_Fall.shp
File SHP_GTM_Sal_ppt_2015_Fall.shp has completed 4 cross-validation iterations.
Processing file: SHP_GTM_Sal_ppt_2015_Winter.shp
Calculated RMSE: 7.5095149853850325, ME: 1.3438841223750018 for file: SHP_GTM_Sal_ppt_2015_Winter.shp
File SHP_GTM_Sal_ppt_2015_Winter.shp has completed 4 cross-validation iterations.
Processing file: SHP_GTM_Sal_ppt_2016_Spring.shp
Calculated RMSE: 9.094061000437073, ME: 1.153808037100001 for file: SHP_GTM_Sal_ppt_2016_Spring.shp
File SHP_GTM_Sal_ppt_2016_Spring.shp has completed 4 cross-validation iterations.
Processing file: SHP_GTM_Sal_ppt_2016_Summer.shp
Calculated RMSE: 8.826177273157874, ME: 1.0447028336500015 for file: SHP_GTM_Sal_ppt_2016_Summer.shp
File SHP_GTM_Sal_ppt_2016_Summer.shp has completed 4 cross-validation iterations.
Processing file: SHP_GTM_Sal_ppt_2016_Fall.shp
Calculated RMSE: 8.657996312221

Calculated RMSE: 0.6689741796726012, ME: 0.17515553363000014 for file: SHP_GTM_DO_mgl_2015_Fall.shp
File SHP_GTM_DO_mgl_2015_Fall.shp has completed 4 cross-validation iterations.
Processing file: SHP_GTM_DO_mgl_2015_Winter.shp
Calculated RMSE: 0.40068754276758173, ME: 0.05577908064999981 for file: SHP_GTM_DO_mgl_2015_Winter.shp
File SHP_GTM_DO_mgl_2015_Winter.shp has completed 4 cross-validation iterations.
Processing file: SHP_GTM_DO_mgl_2016_Spring.shp
Calculated RMSE: 0.8733855458661026, ME: 0.16818793239750018 for file: SHP_GTM_DO_mgl_2016_Spring.shp
File SHP_GTM_DO_mgl_2016_Spring.shp has completed 4 cross-validation iterations.
Processing file: SHP_GTM_DO_mgl_2016_Summer.shp
Calculated RMSE: 1.1126308680093586, ME: 0.22377996639499997 for file: SHP_GTM_DO_mgl_2016_Summer.shp
File SHP_GTM_DO_mgl_2016_Summer.shp has completed 4 cross-validation iterations.
Processing file: SHP_GTM_DO_mgl_2016_Fall.shp
Calculated RMSE: 0.8978191664124523, ME: 0.19235122455249987 for file: SHP_GTM_DO