# Cross Validation for IDW & RK Interpolation 
## Task 2 (continuous & discrete) for wet & dry seasons

This document includes Python codes that conduct cross validation (CV) for Inverse Distance Weighting (IDW) Interpolation and RK on water quality parameters, including 6 water quality parameters in arcpy environment:
- Dissolved oxygen (DO_mgl)
- Salinity (Sal_ppt)
- Turbidity (Turb_ntu)
- Temperature (T_c)
- Secchi (Secc_m)
- Total Nitrogen (TN_mgl) 

The analysis is conducted in the separate water bodies:
- Guana Tolomato Matanzas (GTM)
- Estero Bay (EB)
- Charlotte Harbor (CH)
- Biscayne Bay (BB)
- Big Bend Seagrasses (BBS)

**Tasks:**  

- **Task 2.1 Calculate the RMSE and Mean Error (ME) for IDW & RK results using both continuous and discrete data**

- Task 2.2 Calculate the RMSE and Mean Error (ME) for IDW results using continuous data.

<br>
<div style="text-align: left;">
    <img src="../misc/TwoSeasons.png" style="display: block; margin-left: 0; margin-right: auto; width: 850px;"/>
</div>


**Contents:**
* [1. Data Preprocess](#reg_preprocessing)
    * [1.1 Load csv files](#reg_subset)
    * [1.2 Subsetting data](#reg_preview)
    * [1.3 Define the wet and dry seasons](#reg_wetndry)
    * [1.4 Calculating average values](#reg_average)
    * [1.5 Convert coordinate system](#reg_coordinate)
* [2. Prepare for batch interpolation](#reg_batch)
    * [2.1 Preset abbreviation](#reg_preset)
    * [2.2 Define the barrier files](#reg_barrier)
    * [2.3 Define waterbody boundary](#reg_boundary)
    * [2.4 Load the table of study periods,  parameters, and seasons](#reg_study)
    * [2.5 Define output folders](#reg_output)
    * [2.6 Fill NaN RowID with unique ID](#reg_id)
* [3. Create Shapefiles](#reg_create_shp)
* [4. Cross Validation for IDW](#reg_cv_idw)
* [5. RK Interpolation](#reg_rk)

## 1. Loading packages

In [1]:
import pandas as pd
import numpy as np
import arcpy
from arcpy.sa import *
import os
import math
import csv

import importlib
import sys
# path = r'C:/Users/cong1/WQ/IDW/git/misc'
path = r'E:\Projects\SEACAR_WQ_2024\git\misc'

sys.path.insert(0, path)
import idw_rk
importlib.reload(idw_rk)

import pyproj

# define scratch folder to avoid overwritting from parallel threats
arcpy.env.scratchWorkspace = r"E:\Projects\SEACAR_WQ_2024\scratch/IDW_2s"

arcpy.env.overwriteOutput = True

## 1. Data Preprocessing <a class="anchor" id="reg_preprocessing"></a>
### 1.1 Load csv files

In [2]:
gis_path = r'E:/Projects/SEACAR_WQ_2024/GIS_Data/'

dfDis = pd.read_csv(gis_path + 'OEAT_Discrete_WQ-2024-May-06.csv', low_memory=False)
dfCon = pd.read_csv(gis_path + 'OEAT_Continuous_WQ-2024-Feb-21.csv', low_memory=False)

## 1.2 Subsetting Data <a class="anchor" id="reg_subset"></a>
### Selecting data from 8 am to 18 pm (daytime)

In [3]:
# Convert string to datetime
dfCon['SampleDate'] = pd.to_datetime(dfCon['SampleDate'], format='%Y-%m-%d %H:%M:%S.%f')
dfDis['SampleDate'] = pd.to_datetime(dfDis['SampleDate'], format='%Y-%m-%d %H:%M:%S.%f')

# Include date from 8:00 am to 18:00 pm
start_time = '08:00'
end_time = '18:00'

dfCon = dfCon[dfCon['SampleDate'].dt.time.between(pd.to_datetime(start_time).time(), pd.to_datetime(end_time).time())]

dfAll = pd.concat([dfDis, dfCon], ignore_index=True)

dfAll.head()

Unnamed: 0,RowID,ProgramID,ParameterName,ParameterUnits,ProgramLocationID,ActivityType,SampleDate,Year,Month,RelativeDepth,ResultValue,Latitude_DD,Longitude_DD,ManagedAreaName,AreaID,SEACAR_QAQCFlagCode,WaterBody,WbodyAcronym,Season
0,1,69,Secchi Depth,m,CKM2017100405,Field,2017-10-06,2017,10,,0.3,29.3221,-83.129866,Big Bend Seagrasses Aquatic Preserve,5,6Q,Big Bend Seagrasses,BBS,Fall
1,2,69,Secchi Depth,m,CKM2017080401,Field,2017-08-08,2017,8,Surface,0.6,29.145966,-83.07225,Big Bend Seagrasses Aquatic Preserve,5,6Q/9Q,Big Bend Seagrasses,BBS,Summer
2,3,69,Secchi Depth,m,CKM2017060703,Field,2017-06-19,2017,6,Surface,0.4,29.294516,-83.155316,Big Bend Seagrasses Aquatic Preserve,5,9Q/6Q,Big Bend Seagrasses,BBS,Summer
3,4,69,Secchi Depth,m,CKM2017060202,Field,2017-06-06,2017,6,Surface,0.4,29.1404,-83.01705,Big Bend Seagrasses Aquatic Preserve,5,6Q/9Q,Big Bend Seagrasses,BBS,Spring
4,5,69,Secchi Depth,m,CKM2017110804,Field,2017-11-14,2017,11,Surface,0.4,29.269566,-83.107283,Big Bend Seagrasses Aquatic Preserve,5,9Q/6Q,Big Bend Seagrasses,BBS,Fall


### 1.3 Define the wet and dry seasons<a class="anchor" id="reg_wetndry"></a>

In [4]:
# Load the table of wet and dry seasons definitions
seasons2 = pd.read_csv(gis_path + 'season_def/2 seasons.csv', low_memory=False)
seasons2

Unnamed: 0,WaterBody,Start Year,Start Month,Start Day,End Year,End Month,End Day,Start Date,End Date,Seasons
0,Charlotte Harbor,2017,5,1,2017,10,31,5/1/2017,10/31/2017,Wet
1,Charlotte Harbor,2017,11,1,2018,4,30,11/1/2017,4/30/2018,Dry
2,Big Bend Seagrasses,2021,5,1,2021,10,31,5/1/2021,10/31/2021,Wet
3,Big Bend Seagrasses,2021,11,1,2022,4,30,11/1/2021,4/30/2022,Dry
4,Estero Bay,2017,5,1,2017,10,31,5/1/2017,10/31/2017,Wet
5,Estero Bay,2017,11,1,2018,4,30,11/1/2017,4/30/2018,Dry
6,Guana Tolomato Matanzas,2016,5,1,2016,10,31,5/1/2016,10/31/2016,Wet
7,Guana Tolomato Matanzas,2016,11,1,2017,4,30,11/1/2016,4/30/2017,Dry
8,Biscayne Bay,2022,5,1,2022,10,31,5/1/2022,10/31/2022,Wet
9,Biscayne Bay,2022,11,1,2023,4,30,11/1/2022,4/30/2023,Dry


In [5]:
# Function to filters the year range
filtered_dfAllTime = idw_rk.filter_data(dfAll, seasons2)
filtered_dfAllTime.head()

Unnamed: 0,RowID,ProgramID,ParameterName,ParameterUnits,ProgramLocationID,ActivityType,SampleDate,Year,Month,RelativeDepth,ResultValue,Latitude_DD,Longitude_DD,ManagedAreaName,AreaID,SEACAR_QAQCFlagCode,WaterBody,WbodyAcronym,Season
0,458,69,Secchi Depth,m,CHM2017091705,Field,2017-09-22,2017,9,Surface,1.0,26.59845,-82.129733,Pine Island Sound Aquatic Preserve,34,9Q/6Q,Charlotte Harbor,CH,Fall
1,458,69,Secchi Depth,m,CHM2017091705,Field,2017-09-22,2017,9,Surface,1.0,26.59845,-82.129733,Pine Island Sound Aquatic Preserve,34,9Q/6Q,Charlotte Harbor,CH,Fall
2,599,69,Secchi Depth,m,CHM2017111102,Field,2017-11-01,2017,11,Surface,0.3,26.840333,-82.269916,Gasparilla Sound-Charlotte Harbor Aquatic Pres...,18,6Q/9Q,Charlotte Harbor,CH,Fall
3,599,69,Secchi Depth,m,CHM2017111102,Field,2017-11-01,2017,11,Surface,0.3,26.840333,-82.269916,Gasparilla Sound-Charlotte Harbor Aquatic Pres...,18,6Q/9Q,Charlotte Harbor,CH,Fall
4,638,69,Secchi Depth,m,CHM2017070702,Field,2017-07-12,2017,7,Surface,0.4,26.79385,-82.153633,Cape Haze Aquatic Preserve,9,6Q/9Q,Charlotte Harbor,CH,Summer


In [6]:
# Check the filtered results
unique_years = filtered_dfAllTime.groupby('WaterBody')['Year'].unique().reset_index()
unique_years.columns = ['WaterBody', 'Unique Years']
unique_years

Unnamed: 0,WaterBody,Unique Years
0,Big Bend Seagrasses,"[2021, 2022]"
1,Biscayne Bay,"[2022, 2023]"
2,Charlotte Harbor,"[2017, 2018]"
3,Estero Bay,"[2018, 2017]"
4,Guana Tolomato Matanzas,"[2016, 2017]"


In [7]:
# Assign wet and dry season
updated_dfAllTime = idw_rk.assign_seasons(filtered_dfAllTime, seasons2)
updated_dfAllTime.head()

Unnamed: 0,RowID,ProgramID,ParameterName,ParameterUnits,ProgramLocationID,ActivityType,SampleDate,Year,Month,RelativeDepth,...,Longitude_DD,ManagedAreaName,AreaID,SEACAR_QAQCFlagCode,WaterBody,WbodyAcronym,Season,Start Date,End Date,Seasons
0,458,69,Secchi Depth,m,CHM2017091705,Field,2017-09-22,2017,9,Surface,...,-82.129733,Pine Island Sound Aquatic Preserve,34,9Q/6Q,Charlotte Harbor,CH,Fall,2017-05-01,2017-10-31,Wet
2,458,69,Secchi Depth,m,CHM2017091705,Field,2017-09-22,2017,9,Surface,...,-82.129733,Pine Island Sound Aquatic Preserve,34,9Q/6Q,Charlotte Harbor,CH,Fall,2017-05-01,2017-10-31,Wet
5,599,69,Secchi Depth,m,CHM2017111102,Field,2017-11-01,2017,11,Surface,...,-82.269916,Gasparilla Sound-Charlotte Harbor Aquatic Pres...,18,6Q/9Q,Charlotte Harbor,CH,Fall,2017-11-01,2018-04-30,Dry
7,599,69,Secchi Depth,m,CHM2017111102,Field,2017-11-01,2017,11,Surface,...,-82.269916,Gasparilla Sound-Charlotte Harbor Aquatic Pres...,18,6Q/9Q,Charlotte Harbor,CH,Fall,2017-11-01,2018-04-30,Dry
8,638,69,Secchi Depth,m,CHM2017070702,Field,2017-07-12,2017,7,Surface,...,-82.153633,Cape Haze Aquatic Preserve,9,6Q/9Q,Charlotte Harbor,CH,Summer,2017-05-01,2017-10-31,Wet


### 1.4 Calculating average values at unique observation points<a class="anchor" id="reg_average"></a>

In [8]:
dfAll_Mean = updated_dfAllTime.groupby(['WaterBody','ParameterName','ParameterUnits','Seasons','Latitude_DD','Longitude_DD','WbodyAcronym'])["ResultValue"].agg("mean").reset_index()
dfAll = dfAll_Mean

### 1.5 Convert coordinate system to EPSG: 3086<a class="anchor" id="reg_coordinate"></a>

In [9]:
# Define the EPSG codes for source (EPSG:4326) and target (EPSG:3086) coordinate systems
source_epsg = 'EPSG:4326'
target_epsg = 'EPSG:3086'

# Create a PyProj Transformer for the conversion
transformer = pyproj.Transformer.from_crs(source_epsg, target_epsg, always_xy=True)

# Define a function to apply the transformation to each row of the DataFrame
def transform_coordinates(row):
    x, y = transformer.transform(row['Longitude_DD'], row['Latitude_DD'])
    return pd.Series({'x': x, 'y': y})

# Apply the transformation function to the DataFrame and create new columns for the converted coordinates
dfAll[['x', 'y']] = dfAll.apply(transform_coordinates, axis=1)

#### Save aggregated data to csv file

In [10]:
dfAll.to_csv(gis_path + 'OEAT_2Seasons_All_WQ-2024-May-02.csv', index=False)

## 2. Prepare for batch interpolation<a class="anchor" id="reg_batch"></a>
### 2.1 Preset abbreviation for waterbody and parameter name<a class="anchor" id="reg_preset"></a>

In [11]:
area_shortnames = {
    'Guana Tolomato Matanzas': 'GTM',
    'Estero Bay': 'EB',
    'Charlotte Harbor': 'CH',
    'Biscayne Bay': 'BB',
    'Big Bend Seagrasses':'BBS'
}

param_shortnames = {
    'Salinity': 'Sal_ppt',
    'Total Nitrogen': 'TN_mgl',
    'Dissolved Oxygen': 'DO_mgl',
    'Turbidity':'Turb_ntu',
    'Secchi Depth':'Secc_m',
    'Water Temperature':'T_c'
}

covariates_dict = {
    "GTM":"LDI",
    "EB":"bathymetry+LDI+popden",
    "CH":"bathymetry+LDI+popden+water_flow_wet",
    "BB":"bathymetry+LDI+popden",
    "BBS":"bathymetry+LDI"
}

### 2.2 Load the barrier files<a class="anchor" id="reg_barrier"></a>

In [12]:
barrier_folder = os.path.join(gis_path, 'Barriers')
barrier_folder

barriers = []
for file in os.listdir(barrier_folder):
    if file.endswith(".shp"):
        barriers.append(os.path.join(barrier_folder, file))

for barrier in barriers:
    print(barrier)

E:/Projects/SEACAR_WQ_2024/GIS_Data/Barriers\BBS_Barriers.shp
E:/Projects/SEACAR_WQ_2024/GIS_Data/Barriers\BB_Barriers.shp
E:/Projects/SEACAR_WQ_2024/GIS_Data/Barriers\CH_Barriers.shp
E:/Projects/SEACAR_WQ_2024/GIS_Data/Barriers\EB_Barriers.shp
E:/Projects/SEACAR_WQ_2024/GIS_Data/Barriers\GTM_Barriers.shp


### 2.3 Define waterbody boundary for spatial extent and masking<a class="anchor" id="reg_boundary"></a>

In [13]:
waterbody_extent = os.path.join(gis_path, 'OEAT_Waterbody_Boundaries', 'OEAT_Waterbody_Boundary.shp')

unique_waterbodies = []
with arcpy.da.SearchCursor(waterbody_extent, ['WaterbodyA']) as cursor:
    for row in cursor:
        unique_waterbodies.append(row[0])

print("Unique Waterbodies:", unique_waterbodies)

Unique Waterbodies: ['BBS', 'BB', 'CH', 'EB', 'GTM']


### 2.4 Load the table of study periods,  parameters, and seasons<a class="anchor" id="reg_study"></a>

In [14]:
seasons_all = pd.read_csv(gis_path + 'season_def/TwoSeasons_all.csv', low_memory=False)
seasons_all.head(10)

Unnamed: 0,WaterBody,Start Year,End Year,Seasons,Parameter,Filename,NumDataPoints,RMSE,ME
0,Charlotte Harbor,2017,2017,Wet,Total Nitrogen,,,,
1,Charlotte Harbor,2017,2018,Dry,Total Nitrogen,,,,
2,Big Bend Seagrasses,2021,2021,Wet,Total Nitrogen,,,,
3,Big Bend Seagrasses,2021,2022,Dry,Total Nitrogen,,,,
4,Estero Bay,2017,2017,Wet,Total Nitrogen,,,,
5,Estero Bay,2017,2018,Dry,Total Nitrogen,,,,
6,Guana Tolomato Matanzas,2016,2016,Wet,Total Nitrogen,,,,
7,Guana Tolomato Matanzas,2016,2017,Dry,Total Nitrogen,,,,
8,Biscayne Bay,2022,2022,Wet,Total Nitrogen,,,,
9,Biscayne Bay,2022,2023,Dry,Total Nitrogen,,,,


### 2.5 Define output folders<a class="anchor" id="reg_output"></a>

In [15]:
# shpAll_folder = gis_path + r"shapefiles_2seasons" 
# idwAll_folder = gis_path + r"raster_idw_2seasons"

shpAll_folder = gis_path + r"shapefiles/TwoSeasons_All" 
idwAll_folder = gis_path + r"raster_output/TwoSeasons_IDW_All"

# Preview dataset
dfAll

Unnamed: 0,WaterBody,ParameterName,ParameterUnits,Seasons,Latitude_DD,Longitude_DD,WbodyAcronym,ResultValue,x,y
0,Big Bend Seagrasses,Dissolved Oxygen,mg/L,Dry,29.008300,-82.825250,BBS,8.245000,514236.629541,556316.261436
1,Big Bend Seagrasses,Dissolved Oxygen,mg/L,Dry,29.080840,-83.053090,BBS,6.630000,492019.336544,564180.798088
2,Big Bend Seagrasses,Dissolved Oxygen,mg/L,Dry,29.085830,-83.068570,BBS,6.480000,490510.856107,564723.446382
3,Big Bend Seagrasses,Dissolved Oxygen,mg/L,Dry,29.095770,-83.028650,BBS,6.930000,494381.272368,565857.139054
4,Big Bend Seagrasses,Dissolved Oxygen,mg/L,Dry,29.095800,-83.069580,BBS,6.720000,490404.358820,565829.803709
...,...,...,...,...,...,...,...,...,...,...
7111,Guana Tolomato Matanzas,Water Temperature,Degrees C,Wet,30.053833,-81.375833,GTM,3.100000,652697.269107,674545.694392
7112,Guana Tolomato Matanzas,Water Temperature,Degrees C,Wet,30.061167,-81.356167,GTM,3.110000,654573.328167,675399.981470
7113,Guana Tolomato Matanzas,Water Temperature,Degrees C,Wet,30.062167,-81.370333,GTM,3.100000,653207.093339,675481.792443
7114,Guana Tolomato Matanzas,Water Temperature,Degrees C,Wet,30.062183,-81.369175,GTM,27.594455,653318.539140,675486.024073


### 2.6 Fill NaN RowID with unique ID (IDW function needs unique ID) <a class="anchor" id="reg_id"></a>

In [16]:
idw_rk.fill_nan_rowids(dfAll, 'RowID')

# Keep RowID as integer
dfAll['RowID'] = dfAll['RowID'].astype(int)
dfAll

Unnamed: 0,WaterBody,ParameterName,ParameterUnits,Seasons,Latitude_DD,Longitude_DD,WbodyAcronym,ResultValue,x,y,RowID
0,Big Bend Seagrasses,Dissolved Oxygen,mg/L,Dry,29.008300,-82.825250,BBS,8.245000,514236.629541,556316.261436,1
1,Big Bend Seagrasses,Dissolved Oxygen,mg/L,Dry,29.080840,-83.053090,BBS,6.630000,492019.336544,564180.798088,2
2,Big Bend Seagrasses,Dissolved Oxygen,mg/L,Dry,29.085830,-83.068570,BBS,6.480000,490510.856107,564723.446382,3
3,Big Bend Seagrasses,Dissolved Oxygen,mg/L,Dry,29.095770,-83.028650,BBS,6.930000,494381.272368,565857.139054,4
4,Big Bend Seagrasses,Dissolved Oxygen,mg/L,Dry,29.095800,-83.069580,BBS,6.720000,490404.358820,565829.803709,5
...,...,...,...,...,...,...,...,...,...,...,...
7111,Guana Tolomato Matanzas,Water Temperature,Degrees C,Wet,30.053833,-81.375833,GTM,3.100000,652697.269107,674545.694392,7112
7112,Guana Tolomato Matanzas,Water Temperature,Degrees C,Wet,30.061167,-81.356167,GTM,3.110000,654573.328167,675399.981470,7113
7113,Guana Tolomato Matanzas,Water Temperature,Degrees C,Wet,30.062167,-81.370333,GTM,3.100000,653207.093339,675481.792443,7114
7114,Guana Tolomato Matanzas,Water Temperature,Degrees C,Wet,30.062183,-81.369175,GTM,27.594455,653318.539140,675486.024073,7115


## 3. Create Shapefiles <a class="anchor" id="reg_create_shp"></a>

In [17]:
# Merge interested with latitude and longitude columns
seasons_all_coord = idw_rk.merge_with_lat_long_new(seasons_all, dfAll, "Seasons")
seasons_all_coord

Unnamed: 0,WaterBody,Start Year,End Year,Seasons,Parameter,Filename,NumDataPoints,RMSE,ME,x,y,RowID,ResultValue
0,Charlotte Harbor,2017,2017,Wet,Total Nitrogen,,,,,591484.839075,272735.047067,4666,0.870000
1,Charlotte Harbor,2017,2017,Wet,Total Nitrogen,,,,,589588.413455,274006.691366,4667,0.850000
2,Charlotte Harbor,2017,2017,Wet,Total Nitrogen,,,,,591623.331894,274259.212418,4668,0.940000
3,Charlotte Harbor,2017,2017,Wet,Total Nitrogen,,,,,589465.802362,275500.164081,4669,0.780000
4,Charlotte Harbor,2017,2017,Wet,Total Nitrogen,,,,,588236.919040,276249.476798,4670,0.830000
...,...,...,...,...,...,...,...,...,...,...,...,...,...
7113,Biscayne Bay,2022,2023,Dry,Water Temperature,,,,,784697.376005,213473.546119,1343,25.176294
7114,Biscayne Bay,2022,2023,Dry,Water Temperature,,,,,785872.039658,216879.318151,1344,25.423244
7115,Biscayne Bay,2022,2023,Dry,Water Temperature,,,,,787466.428428,218699.961879,1345,25.730076
7116,Biscayne Bay,2022,2023,Dry,Water Temperature,,,,,784434.233710,221871.479681,1346,25.338482


In [47]:
idw_rk.create_shp_season_new(seasons_all_coord, "Seasons", shpAll_folder, start_year_included=True)

Number of data rows for BBS, DO_mgl, 2021, Dry: 141
Shapefile for BBS, DO_mgl for 2021 and season Dry has been saved as SHP_BBS_DO_mgl_2021_Dry.shp
Number of data rows for BBS, Sal_ppt, 2021, Dry: 132
Shapefile for BBS, Sal_ppt for 2021 and season Dry has been saved as SHP_BBS_Sal_ppt_2021_Dry.shp
Number of data rows for BBS, Secc_m, 2021, Dry: 32
Shapefile for BBS, Secc_m for 2021 and season Dry has been saved as SHP_BBS_Secc_m_2021_Dry.shp
Number of data rows for BBS, TN_mgl, 2021, Dry: 41
Shapefile for BBS, TN_mgl for 2021 and season Dry has been saved as SHP_BBS_TN_mgl_2021_Dry.shp
Number of data rows for BBS, Turb_ntu, 2021, Dry: 42
Shapefile for BBS, Turb_ntu for 2021 and season Dry has been saved as SHP_BBS_Turb_ntu_2021_Dry.shp
Number of data rows for BBS, T_c, 2021, Dry: 141
Shapefile for BBS, T_c for 2021 and season Dry has been saved as SHP_BBS_T_c_2021_Dry.shp
Number of data rows for BBS, DO_mgl, 2021, Wet: 51
Shapefile for BBS, DO_mgl for 2021 and season Wet has been saved

Shapefile for GTM, TN_mgl for 2016 and season Wet has been saved as SHP_GTM_TN_mgl_2016_Wet.shp
Number of data rows for GTM, Turb_ntu, 2016, Wet: 9
Shapefile for GTM, Turb_ntu for 2016 and season Wet has been saved as SHP_GTM_Turb_ntu_2016_Wet.shp
Number of data rows for GTM, T_c, 2016, Wet: 74
Shapefile for GTM, T_c for 2016 and season Wet has been saved as SHP_GTM_T_c_2016_Wet.shp


## 4. Cross Validation for IDW <a class="anchor" id="reg_cv_idw"></a>

In [18]:
# Empty the shapefile folder
# idw_rk.delete_all_files(idwAll_folder)

In [32]:
# Select a section of table to process
seasons_slct = seasons_all.iloc[:]
seasons_slct.drop(seasons_slct[seasons_slct['WaterBody'] != 'Charlotte Harbor'].index, inplace=True)
seasons_slct = seasons_slct.reset_index()
seasons_slct

Unnamed: 0,index,WaterBody,Start Year,End Year,Seasons,Parameter,Filename,NumDataPoints,RMSE,ME,covariates
0,0,Charlotte Harbor,2017,2017,Wet,Total Nitrogen,,,,,default_covariate
1,1,Charlotte Harbor,2017,2018,Dry,Total Nitrogen,,,,,default_covariate
2,10,Charlotte Harbor,2017,2017,Wet,Salinity,,,,,default_covariate
3,11,Charlotte Harbor,2017,2018,Dry,Salinity,,,,,default_covariate
4,20,Charlotte Harbor,2017,2017,Wet,Dissolved Oxygen,,,,,default_covariate
5,21,Charlotte Harbor,2017,2018,Dry,Dissolved Oxygen,,,,,default_covariate
6,30,Charlotte Harbor,2017,2017,Wet,Turbidity,,,,,default_covariate
7,31,Charlotte Harbor,2017,2018,Dry,Turbidity,,,,,default_covariate
8,40,Charlotte Harbor,2017,2017,Wet,Secchi Depth,,,,,default_covariate
9,41,Charlotte Harbor,2017,2018,Dry,Secchi Depth,,,,,default_covariate


In [33]:
importlib.reload(idw_rk)

# If the number of data points is less than 3，skipping calculate IDW
idw_rk.idw_interpolation_sampled(seasons_slct, shpAll_folder, idwAll_folder, waterbody_extent, barrier_folder, "Seasons", percentage = 10)

Processing file: SHP_CH_TN_mgl_2017_Wet.shp
File SHP_CH_TN_mgl_2017_Wet.shp has completed 91 cross-validation iterations using 100% samples.
Processing file: SHP_CH_TN_mgl_2017_Dry.shp
File SHP_CH_TN_mgl_2017_Dry.shp has completed 91 cross-validation iterations using 100% samples.
Processing file: SHP_CH_Sal_ppt_2017_Wet.shp
File SHP_CH_Sal_ppt_2017_Wet.shp has completed 77 cross-validation iterations using 10% samples.
Processing file: SHP_CH_Sal_ppt_2017_Dry.shp
File SHP_CH_Sal_ppt_2017_Dry.shp has completed 30 cross-validation iterations using 10% samples.
Processing file: SHP_CH_DO_mgl_2017_Wet.shp
File SHP_CH_DO_mgl_2017_Wet.shp has completed 83 cross-validation iterations using 10% samples.
Processing file: SHP_CH_DO_mgl_2017_Dry.shp
File SHP_CH_DO_mgl_2017_Dry.shp has completed 35 cross-validation iterations using 10% samples.
Processing file: SHP_CH_Turb_ntu_2017_Wet.shp
File SHP_CH_Turb_ntu_2017_Wet.shp has completed 92 cross-validation iterations using 100% samples.
Processin

## 5. RK Interpolation<a class="anchor" id="reg_rk"></a>

### Define output folder

In [20]:
# out_raster_folder = gis_path + r"rk_folder/TwoSeasons_RK_all/"
# out_ga_folder     = gis_path + r"rk_folder/ga_output_rk_2s/"
# diagnostic_folder = gis_path + r"rk_folder/diagnostic_rk_2s/"
# std_error_folder  = gis_path + r"rk_folder/std_error_pred_2s/std_error_rk/"

out_raster_folder = gis_path + r"raster_output/TwoSeasons_RK_all/"
out_ga_folder     = gis_path + r"ga_output_rk/"
diagnostic_folder = gis_path + r"diagnostic_rk/"
std_error_folder  = gis_path + r"std_error_pred/std_error_rk_4s/"

# Clean existing files in folders
# idw_rk.delete_all_files(out_raster_folder)
# idw_rk.delete_all_files(out_ga_folder)
# idw_rk.delete_all_files(diagnostic_folder)
# idw_rk.delete_all_files(std_error_folder)

In [21]:
seasons_all['covariates'] = seasons_all['WaterBody'].apply(lambda x: covariates_dict.get(x, 'default_covariate'))

rk_csv = gis_path + "rk_2s.csv" 
seasons_all.to_csv(rk_csv, index=False, encoding='utf-8-sig')

In [23]:
with open(gis_path + "rk_2s.csv", 'w', newline='') as csvfile:
    csv_writer = csv.writer(csvfile)
    # Determine if year should be included in the output based on a condition
    start_year_included = True

    # Write the header line based on whether the year is included
    cols = list(seasons_all.columns)
    if not start_year_included:
        cols.remove('Start Year')
        cols.remove('End Year')
    csv_writer.writerow(cols)
    
    for i in seasons_all.index:
        s_time = time.time()
        process, rmse, me, count, file_loc = idw_rk.rk_interpolation_new(
            method="rk",
            radius=50000,
            folder_path=gis_path,
            waterbody=area_shortnames[seasons_all.iloc[i]["WaterBody"]],
            parameter=param_shortnames[seasons_all.iloc[i]["Parameter"]],
            year=seasons_all.iloc[i]["Start Year"],
            season=seasons_all.iloc[i]['Seasons'],
            covariates=covariates_dict[area_shortnames[seasons_all.iloc[i]["WaterBody"]]],
            out_raster_folder=out_raster_folder,
            out_ga_folder=out_ga_folder,
            std_error_folder=std_error_folder,
            diagnostic_folder=diagnostic_folder,
            shapefile_folder_name=shpAll_folder,
            start_year_included=start_year_included  # Pass the variable to the function
        )
        e_time = time.time()

        # Write data row, conditionally include year based on the setting
        data_row = [
            seasons_all.iloc[i]["WaterBody"], 
            seasons_all.iloc[i]['Seasons'],
            seasons_all.iloc[i]["Parameter"],
            file_loc, count, rmse, me,
            covariates_dict[area_shortnames[seasons_all.iloc[i]["WaterBody"]]]
        ]
        if start_year_included:
            data_row.insert(1, seasons_all.iloc[i]["Start Year"])
            data_row.insert(2, seasons_all.iloc[i]["End Year"])

        print(f"{int(e_time - s_time)} seconds elapsed for processing {count} points in {i}th row: RMSE: {rmse}, ME: {me}, file exported to {file_loc}")
        csv_writer.writerow(data_row)
        if i % 10 == 0:
            csvfile.flush()  # Flush the csv file every 10 rows.

Processing file: SHP_CH_TN_mgl_2017_Wet.shp
--- Time lapse: 1738.9533660411835 seconds ---
1739 seconds elapsed for processing 91 points in 0th row: RMSE: 0.518503292409, ME: -0.00548043251879, file exported to E:/Projects/SEACAR_WQ_2024/GIS_Data/raster_output/TwoSeasons_RK_all/CH_TN_mgl_2017_Wet_RK.tif
Processing file: SHP_CH_TN_mgl_2017_Dry.shp
--- Time lapse: 2704.7894995212555 seconds ---
2705 seconds elapsed for processing 91 points in 1th row: RMSE: 0.146277379864, ME: 0.000458913001099, file exported to E:/Projects/SEACAR_WQ_2024/GIS_Data/raster_output/TwoSeasons_RK_all/CH_TN_mgl_2017_Dry_RK.tif
Processing file: SHP_BBS_TN_mgl_2021_Wet.shp
--- Time lapse: 2548.784058570862 seconds ---
2548 seconds elapsed for processing 39 points in 2th row: RMSE: 0.177156170228, ME: 0.000178653472476, file exported to E:/Projects/SEACAR_WQ_2024/GIS_Data/raster_output/TwoSeasons_RK_all/BBS_TN_mgl_2021_Wet_RK.tif
Processing file: SHP_BBS_TN_mgl_2021_Dry.shp
--- Time lapse: 2806.0260152816772 seco

Processing file: SHP_BB_DO_mgl_2022_Wet.shp
--- Time lapse: 829.6033291816711 seconds ---
829 seconds elapsed for processing 71 points in 28th row: RMSE: 0.721138352723, ME: -0.0118049358089, file exported to E:/Projects/SEACAR_WQ_2024/GIS_Data/raster_output/TwoSeasons_RK_all/BB_DO_mgl_2022_Wet_RK.tif
Processing file: SHP_BB_DO_mgl_2022_Dry.shp
--- Time lapse: 817.4514186382294 seconds ---
817 seconds elapsed for processing 70 points in 29th row: RMSE: 0.721687128102, ME: 0.0106723490885, file exported to E:/Projects/SEACAR_WQ_2024/GIS_Data/raster_output/TwoSeasons_RK_all/BB_DO_mgl_2022_Dry_RK.tif
Processing file: SHP_CH_Turb_ntu_2017_Wet.shp
--- Time lapse: 2719.5245490074158 seconds ---
2719 seconds elapsed for processing 92 points in 30th row: RMSE: 3.58760698325, ME: 0.0390725781281, file exported to E:/Projects/SEACAR_WQ_2024/GIS_Data/raster_output/TwoSeasons_RK_all/CH_Turb_ntu_2017_Wet_RK.tif
Processing file: SHP_CH_Turb_ntu_2017_Dry.shp
--- Time lapse: 2663.27903008461 seconds -

Processing file: SHP_GTM_T_c_2016_Dry.shp
--- Time lapse: 132.5718114376068 seconds ---
132 seconds elapsed for processing 21 points in 57th row: RMSE: 0.476977303124, ME: 0.0243198442045, file exported to E:/Projects/SEACAR_WQ_2024/GIS_Data/raster_output/TwoSeasons_RK_all/GTM_T_c_2016_Dry_RK.tif
Processing file: SHP_BB_T_c_2022_Wet.shp
--- Time lapse: 819.4482908248901 seconds ---
819 seconds elapsed for processing 73 points in 58th row: RMSE: 0.827363749789, ME: 0.0249062772863, file exported to E:/Projects/SEACAR_WQ_2024/GIS_Data/raster_output/TwoSeasons_RK_all/BB_T_c_2022_Wet_RK.tif
Processing file: SHP_BB_T_c_2022_Dry.shp
--- Time lapse: 805.2543451786041 seconds ---
805 seconds elapsed for processing 72 points in 59th row: RMSE: 0.573863624811, ME: 0.011426440778, file exported to E:/Projects/SEACAR_WQ_2024/GIS_Data/raster_output/TwoSeasons_RK_all/BB_T_c_2022_Dry_RK.tif
