# Cross Validation for IDW Interpolation

This document includes Python codes that conduct cross validation (CV) for Inverse Distance Weighting (IDW) Interpolation on water quality parameters, including Dissolved oxygen (DO_mgl), Salinity (Sal_ppt), Turbidity (Turb_ntu), Temperature (T_c), Secchi (Secc_m), Total Nitrogen (TN_mgl) in arcpy environment.

The analysis is conducted in the separate managed areas of Guana Tolomato Matanzas (GTM), Estero Bay (EB), Charlotte Harbor (CH),
Biscayne Bay (BB), Big Bend Seagrasses (BBS).

Tasks:  

• Calculate the RMSE and Mean Error for IDW results using both continuous and discrete data points.

* [1. Data Preprocess](#preprocessing)
* [2. Create Shapefile](#reg_create_shp)
* [3. Cross Validation for IDW](#reg_cv_idw)
    * [3.1 Guana Tolomato Matanzas](#reg_result_idw_gtm)
    * [3.2 Estero Bay](#reg_result_idw_eb)
    * [3.3 Charlotte Harbor](#reg_result_idw_ch)
    * [3.4 Biscayne Bay](#reg_result_idw_bb)
    * [3.5 Big Bend Seagrasses](#reg_result_idw_bbs)

In [28]:
import pandas as pd
import arcpy
from arcpy.sa import *
import os
import math

import importlib
import sys
path = r'../git/misc'
sys.path.insert(0, path)
import IDW
importlib.reload(IDW)

<module 'IDW' from 'E:\\Projects\\SEACAR_WQ_2024\\git\\../git/misc\\IDW.py'>

# 1. Data Preprocessing <a class="anchor" id="preprocessing"></a>

In [30]:
# Define water quality files
# gis_path = r'E:/Spring_2024/WQ/Spring/IDW/GIS_Data/'
gis_path = r'E:/Projects/SEACAR_WQ_2024/GIS_Data/'


dfDis_orig = pd.read_csv(gis_path + "OEAT_Discrete_WQ-2023-Dec-12.csv", low_memory=False)
dfCon_orig = pd.read_csv(gis_path + "OEAT_Continuous_WQ-2023-Dec-12.csv", low_memory=False)

# Combine discrete and continuous file
dfAll_orig = pd.concat([dfDis_orig, dfCon_orig], ignore_index=True)
dfAll_orig.to_csv(gis_path + "OEAT_All_WQ-2023-Dec-12.csv", index=False)

In [31]:
area_shortnames = {
    'Guana Tolomato Matanzas': 'GTM',
    'Estero Bay': 'EB',
    'Charlotte Harbor': 'CH',
    'Biscayne Bay': 'BB',
    'Big Bend Seagrasses':'BBS'
}

param_shortnames = {
    'Salinity': 'Sal_ppt',
    'Total Nitrogen': 'TN_mgl',
    'Dissolved Oxygen': 'DO_mgl',
    'Turbidity':'Turb_ntu',
    'Secchi Depth':'Secc_m',
    'Water Temperature':'T_c'
}

In [32]:
# Define the barriers
barrier_folder = os.path.join(gis_path, 'Barriers')
barrier_folder

barriers = []
for file in os.listdir(barrier_folder):
    if file.endswith(".shp"):
        barriers.append(os.path.join(barrier_folder, file))

for barrier in barriers:
    print(barrier)

E:/Projects/SEACAR_WQ_2024/GIS_Data/Barriers\BBS_Barriers.shp
E:/Projects/SEACAR_WQ_2024/GIS_Data/Barriers\BB_Barriers.shp
E:/Projects/SEACAR_WQ_2024/GIS_Data/Barriers\CH_Barriers.shp
E:/Projects/SEACAR_WQ_2024/GIS_Data/Barriers\EB_Barriers.shp
E:/Projects/SEACAR_WQ_2024/GIS_Data/Barriers\GTM_Barriers.shp


In [33]:
# Define waterbody boudnary for spatial extent and mask
waterbody_extent = os.path.join(gis_path, 'OEAT_Waterbody_Boundaries', 'OEAT_Waterbody_Boundary.shp')

unique_waterbodies = []
with arcpy.da.SearchCursor(waterbody_extent, ['WaterbodyA']) as cursor:
    for row in cursor:
        unique_waterbodies.append(row[0])

print("Unique Waterbodies:", unique_waterbodies)

Unique Waterbodies: ['BBS', 'BB', 'CH', 'EB', 'GTM']


In [34]:
dfAll_orig

Unnamed: 0,RowID,ProgramID,ParameterName,ParameterUnits,ProgramLocationID,ActivityType,SampleDate,Year,Month,RelativeDepth,ResultValue,Latitude_DD,Longitude_DD,ManagedAreaName,AreaID,SEACAR_QAQCFlagCode,WaterBody,WbodyAcronym,Season
0,5062527,5014,Salinity,ppt,GTMMKNUT,Field,Aug 3 2017 11:24AM,2017,8,Surface,0.34,30.160736,-81.360278,Guana Tolomato Matanzas National Estuarine Res...,20,6Q,Guana Tolomato Matanzas,GTMNERR,Summer
1,5062528,5014,Salinity,ppt,GTMMKNUT,Field,Sep 20 2017 9:06AM,2017,9,Surface,0.32,30.160736,-81.360278,Guana Tolomato Matanzas National Estuarine Res...,20,6Q,Guana Tolomato Matanzas,GTMNERR,Fall
2,5062529,5014,Secchi Depth,m,GTMMKNUT,Field,Nov 2 2017 1:11PM,2017,11,Surface,1.20,30.160736,-81.360278,Guana Tolomato Matanzas National Estuarine Res...,20,6Q,Guana Tolomato Matanzas,GTMNERR,Fall
3,5062606,5014,Salinity,ppt,GTMMKNUT,Field,Oct 18 2017 12:52PM,2017,10,Surface,0.34,30.160736,-81.360278,Guana Tolomato Matanzas National Estuarine Res...,20,6Q,Guana Tolomato Matanzas,GTMNERR,Fall
4,5062607,5014,Salinity,ppt,GTMMKNUT,Field,Apr 24 2018 10:56AM,2018,4,Surface,0.33,30.160736,-81.360278,Guana Tolomato Matanzas National Estuarine Res...,20,6Q,Guana Tolomato Matanzas,GTMNERR,Spring
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4828313,137022711,5077,Water Temperature,Degrees C,BBCWA4,,Apr 2 2022 1:45PM,2022,4,bottom,26.80,25.758020,-80.167690,Biscayne Bay Aquatic Preserve,6,6Q,Biscayne Bay,BBAP,Spring
4828314,137022825,5077,Turbidity,NTU,BBJT71,,Jun 13 2020 8:30AM,2020,6,bottom,3.00,25.821730,-80.151250,Biscayne Bay Aquatic Preserve,6,6Q,Biscayne Bay,BBAP,Spring
4828315,137022872,5077,Salinity,ppt,BBLR03,,Apr 23 2022 9:30AM,2022,4,bottom,29.40,25.846841,-80.182861,Biscayne Bay Aquatic Preserve,6,6Q,Biscayne Bay,BBAP,Spring
4828316,137022885,5077,Turbidity,NTU,BBLR03,,May 7 2021 5:45PM,2021,5,bottom,2.00,25.846841,-80.182861,Biscayne Bay Aquatic Preserve,6,6Q,Biscayne Bay,BBAP,Spring


In [35]:
# Define output folders
shp_folder = gis_path + r"shapefiles"

idw_folder = gis_path + r"idw"

In [36]:
unique_years = dfAll_orig['Year'].unique()
unique_years.sort()
print("Unique years in 'Year' column:", unique_years)

Unique years in 'Year' column: [2015 2016 2017 2018 2019 2020 2021 2022 2023]


In [37]:
unique_years = dfAll_orig['ParameterName'].unique()
unique_years.sort()
print("Unique Parameter in 'ParameterName' column:", unique_years)

Unique Parameter in 'ParameterName' column: ['Dissolved Oxygen' 'Salinity' 'Secchi Depth' 'Total Nitrogen' 'Turbidity'
 'Water Temperature']


In [38]:
unique_years = dfAll_orig['Season'].unique()
unique_years.sort()
print("Unique Season in 'Season' column:", unique_years)

Unique Season in 'Season' column: ['Fall' 'Spring' 'Summer' 'Winter']


# 2. Create Shapefile <a class="anchor" id="reg_create_shp"></a>

In [39]:
# Set input parameters
waterbody_names = [
    'Guana Tolomato Matanzas',
    'Estero Bay',
    'Charlotte Harbor',
    'Biscayne Bay',
    'Big Bend Seagrasses'
]

#parameter_names = ['Dissolved Oxygen']
parameter_names = ['Dissolved Oxygen', 'Salinity', 'Secchi Depth', 'Total Nitrogen', 'Turbidity', 'Water Temperature']

# for instance
years = [2023] 
#seasons = ['Spring']
seasons = ['Fall', 'Spring', 'Summer', 'Winter']

In [40]:
# Empty the shapefile folder
IDW.delete_all_files(shp_folder)

In [41]:
# Create shapefiles by waterbody,parameter，year, and season
# Print number of data points in each shapefile
IDW.create_shp_season(dfAll_orig, waterbody_names, parameter_names, years, seasons, shp_folder)

No data found for area: GTM, parameter: Dissolved Oxygen, year: 2023, and season: Fall
No data found for area: GTM, parameter: Salinity, year: 2023, and season: Fall
No data found for area: GTM, parameter: Secchi Depth, year: 2023, and season: Fall
No data found for area: GTM, parameter: Total Nitrogen, year: 2023, and season: Fall
No data found for area: GTM, parameter: Turbidity, year: 2023, and season: Fall
No data found for area: GTM, parameter: Water Temperature, year: 2023, and season: Fall
Number of data rows for GTM, Dissolved Oxygen, 2023, Spring: 18
Shapefile for GTM: DO_mgl for year 2023 and season Spring has been saved as SHP_GTM_DO_mgl_2023_Spring.shp
Number of data rows for GTM, Salinity, 2023, Spring: 18
Shapefile for GTM: Sal_ppt for year 2023 and season Spring has been saved as SHP_GTM_Sal_ppt_2023_Spring.shp
Number of data rows for GTM, Secchi Depth, 2023, Spring: 11
Shapefile for GTM: Secc_m for year 2023 and season Spring has been saved as SHP_GTM_Secc_m_2023_Spring

Number of data rows for CH, Secchi Depth, 2023, Summer: 13
Shapefile for CH: Secc_m for year 2023 and season Summer has been saved as SHP_CH_Secc_m_2023_Summer.shp
Number of data rows for CH, Total Nitrogen, 2023, Summer: 33
Shapefile for CH: TN_mgl for year 2023 and season Summer has been saved as SHP_CH_TN_mgl_2023_Summer.shp
Number of data rows for CH, Turbidity, 2023, Summer: 36
Shapefile for CH: Turb_ntu for year 2023 and season Summer has been saved as SHP_CH_Turb_ntu_2023_Summer.shp
Number of data rows for CH, Water Temperature, 2023, Summer: 50
Shapefile for CH: T_c for year 2023 and season Summer has been saved as SHP_CH_T_c_2023_Summer.shp
Number of data rows for CH, Dissolved Oxygen, 2023, Winter: 70
Shapefile for CH: DO_mgl for year 2023 and season Winter has been saved as SHP_CH_DO_mgl_2023_Winter.shp
Number of data rows for CH, Salinity, 2023, Winter: 51
Shapefile for CH: Sal_ppt for year 2023 and season Winter has been saved as SHP_CH_Sal_ppt_2023_Winter.shp
Number of da

# 3. Cross Validation for IDW <a class="anchor" id="reg_cv_idw"></a>

## 3.1 Guana Tolomato Matanzas <a class="anchor" id="reg_result_idw_gtm"></a>

In [42]:
# Empty the shapefile folder
IDW.delete_all_files(idw_folder)

In [43]:
waterbody_gtm = ['Guana Tolomato Matanzas']
# If the number of data points is less than 3，skipping calculate IDW
IDW.idw_interpolation(shp_folder, idw_folder, parameter_names, waterbody_gtm, waterbody_extent, years, seasons,barrier_folder)

Processing file: SHP_GTM_DO_mgl_2023_Spring.shp
Calculated RMSE: 1.90977869490684, ME: 0.4097752117647057 for file: SHP_GTM_DO_mgl_2023_Spring.shp
File SHP_GTM_DO_mgl_2023_Spring.shp has completed 18 cross-validation iterations.
Processing file: SHP_GTM_DO_mgl_2023_Summer.shp
Calculated RMSE: 0.4842527157809326, ME: -0.04840464117647052 for file: SHP_GTM_DO_mgl_2023_Summer.shp
File SHP_GTM_DO_mgl_2023_Summer.shp has completed 17 cross-validation iterations.
Processing file: SHP_GTM_Sal_ppt_2023_Spring.shp
Calculated RMSE: 0.5594887899146133, ME: -0.04790094117647068 for file: SHP_GTM_Sal_ppt_2023_Spring.shp
File SHP_GTM_Sal_ppt_2023_Spring.shp has completed 18 cross-validation iterations.
Processing file: SHP_GTM_Sal_ppt_2023_Summer.shp
Calculated RMSE: 4.591693872902942, ME: 0.24784371752941137 for file: SHP_GTM_Sal_ppt_2023_Summer.shp
File SHP_GTM_Sal_ppt_2023_Summer.shp has completed 17 cross-validation iterations.
Processing file: SHP_GTM_Secc_m_2023_Spring.shp
Calculated RMSE: 0.2

## 3.2 Estero Bay  <a class="anchor" id="reg_result_idw_eb"></a>

In [None]:
waterbody_eb = ['Estero Bay']
IDW.idw_interpolation(shp_folder, idw_folder, parameter_names, waterbody_eb, waterbody_extent, years, seasons,barrier_folder)

Processing file: SHP_EB_DO_mgl_2023_Spring.shp


## 3.3 Charlotte Harbor <a class="anchor" id="reg_result_idw_ch"></a>

In [None]:
waterbody_ch = ['Charlotte Harbor']
IDW.idw_interpolation(shp_folder, idw_folder, parameter_names, waterbody_ch, waterbody_extent, years, seasons,barrier_folder)

## 3.4 Biscayne Bay <a class="anchor" id="reg_result_idw_bbay"></a>

In [None]:
waterbody_bb = ['Biscayne Bay']
IDW.idw_interpolation(shp_folder, idw_folder, parameter_names, waterbody_bb, waterbody_extent, years, seasons,barrier_folder)

##  3.5 Big Bend <a class="anchor" id="reg_result_idw_bb"></a>

In [None]:
waterbody_bbs = ['Big Bend Seagrasses']
IDW.idw_interpolation(shp_folder, idw_folder, parameter_names, waterbody_bbs, waterbody_extent, years, seasons,barrier_folder)