# Cross Validation for IDW Interpolation 
## Task 2 (continuous & discrete) for wet & dry seasons

This document includes Python codes that conduct cross validation (CV) for Inverse Distance Weighting (IDW) Interpolation on water quality parameters, including 6 water quality parameters in arcpy environment:
- Dissolved oxygen (DO_mgl)
- Salinity (Sal_ppt)
- Turbidity (Turb_ntu)
- Temperature (T_c)
- Secchi (Secc_m)
- Total Nitrogen (TN_mgl) 

The analysis is conducted in the separate water bodies:
- Guana Tolomato Matanzas (GTM)
- Estero Bay (EB)
- Charlotte Harbor (CH)
- Biscayne Bay (BB)
- Big Bend Seagrasses (BBS)

**Tasks:**  

- **Task 2.1 Calculate the RMSE and Mean Error (ME) for IDW results using both continuous and discrete data**

- Task 2.2 Calculate the RMSE and Mean Error (ME) for IDW results using continuous data.

<br>
<div style="text-align: left;">
    <img src="../misc/TwoSeasons.png" style="display: block; margin-left: 0; margin-right: auto; width: 850px;"/>
</div>


**Contents:**
* [1. Data Preprocess](#reg_preprocessing)
    * [1.1 Load csv files](#reg_subset)
    * [1.2 Subsetting data](#reg_preview)
    * [1.3 Define the wet and dry seasons](#reg_wetndry)
    * [1.4 Calculating average values](#reg_average)
    * [1.5 Convert coordinate system](#reg_coordinate)
* [2. Prepare for batch interpolation](#reg_batch)
    * [2.1 Preset abbreviation](#reg_preset)
    * [2.2 Define the barrier files](#reg_barrier)
    * [2.3 Define waterbody boundary](#reg_boundary)
    * [2.4 Load the table of study periods,  parameters, and seasons](#reg_study)
    * [2.5 Define output folders](#reg_output)
    * [2.6 Fill NaN RowID with unique ID](#reg_id)
* [3. Create Shapefiles](#reg_create_shp)
* [4. Cross Validation for IDW](#reg_cv_idw)

## 1. Loading packages

In [5]:
import pandas as pd
import numpy as np
import arcpy
from arcpy.sa import *
import os
import math

import importlib
import sys
# path = r'C:/Users/cong1/WQ/IDW/git/misc'
path = r'E:\Projects\SEACAR_WQ_2024\git\misc'

sys.path.insert(0, path)
import idw_rk
importlib.reload(idw_rk)

import pyproj

# define scratch folder to avoid overwritting from parallel threats
arcpy.env.scratchWorkspace = r"E:\Projects\SEACAR_WQ_2024\scratch/IDW_2s"

## 1. Data Preprocessing <a class="anchor" id="reg_preprocessing"></a>
### 1.1 Load csv files

In [7]:
gis_path = r'E:/Projects/SEACAR_WQ_2024/GIS_Data/'

dfDis = pd.read_csv(gis_path + 'OEAT_Discrete_WQ-2024-May-06.csv', low_memory=False)
dfCon = pd.read_csv(gis_path + 'OEAT_Continuous_WQ-2024-Feb-21.csv', low_memory=False)

dfAll = pd.concat([dfDis, dfCon], ignore_index=True)

## 1.2 Subsetting Data <a class="anchor" id="reg_subset"></a>
### Selecting data from 9 am to 17 pm (daytime)

In [8]:
dfAll.head()

Unnamed: 0,RowID,ProgramID,ParameterName,ParameterUnits,ProgramLocationID,ActivityType,SampleDate,Year,Month,RelativeDepth,ResultValue,Latitude_DD,Longitude_DD,ManagedAreaName,AreaID,SEACAR_QAQCFlagCode,WaterBody,WbodyAcronym,Season
0,1,69,Secchi Depth,m,CKM2017100405,Field,2017-10-06 00:00:00.000,2017,10,,0.3,29.3221,-83.129866,Big Bend Seagrasses Aquatic Preserve,5,6Q,Big Bend Seagrasses,BBS,Fall
1,2,69,Secchi Depth,m,CKM2017080401,Field,2017-08-08 00:00:00.000,2017,8,Surface,0.6,29.145966,-83.07225,Big Bend Seagrasses Aquatic Preserve,5,6Q/9Q,Big Bend Seagrasses,BBS,Summer
2,3,69,Secchi Depth,m,CKM2017060703,Field,2017-06-19 00:00:00.000,2017,6,Surface,0.4,29.294516,-83.155316,Big Bend Seagrasses Aquatic Preserve,5,9Q/6Q,Big Bend Seagrasses,BBS,Summer
3,4,69,Secchi Depth,m,CKM2017060202,Field,2017-06-06 00:00:00.000,2017,6,Surface,0.4,29.1404,-83.01705,Big Bend Seagrasses Aquatic Preserve,5,6Q/9Q,Big Bend Seagrasses,BBS,Spring
4,5,69,Secchi Depth,m,CKM2017110804,Field,2017-11-14 00:00:00.000,2017,11,Surface,0.4,29.269566,-83.107283,Big Bend Seagrasses Aquatic Preserve,5,9Q/6Q,Big Bend Seagrasses,BBS,Fall


In [9]:
# Convert string to datetime
dfAll['SampleDate'] = pd.to_datetime(dfAll['SampleDate'], format='%Y-%m-%d %H:%M:%S.%f')

# Include date from 9:00 am to 17:00 pm
start_time = '09:00'
end_time = '17:00'

dfAllTime = dfAll[dfAll['SampleDate'].dt.time.between(pd.to_datetime(start_time).time(), pd.to_datetime(end_time).time())]
dfAllTime.head()

Unnamed: 0,RowID,ProgramID,ParameterName,ParameterUnits,ProgramLocationID,ActivityType,SampleDate,Year,Month,RelativeDepth,ResultValue,Latitude_DD,Longitude_DD,ManagedAreaName,AreaID,SEACAR_QAQCFlagCode,WaterBody,WbodyAcronym,Season
27759,27760,4058,Turbidity,NTU,45,Field,2017-05-18 09:05:00,2017,5,Surface,7.9,25.767,-80.137,Biscayne Bay Aquatic Preserve,6,7Q/11Q,Biscayne Bay,BB,Spring
27760,27761,4058,Dissolved Oxygen,mg/L,22,Field,2018-01-11 12:47:00,2018,1,Surface,6.27,25.7942,-80.1628,Biscayne Bay Aquatic Preserve,6,11Q/6Q,Biscayne Bay,BB,Winter
27761,27762,4058,Salinity,ppt,36,Field,2018-01-11 11:40:00,2018,1,Surface,30.9,25.7825,-80.1682,Biscayne Bay Aquatic Preserve,6,11Q/6Q,Biscayne Bay,BB,Winter
27762,27763,4058,Salinity,ppt,30,Field,2017-12-19 12:25:00,2017,12,Surface,29.8,25.7876,-80.1576,Biscayne Bay Aquatic Preserve,6,11Q/6Q,Biscayne Bay,BB,Winter
27763,27764,4058,Dissolved Oxygen,mg/L,31,Field,2017-02-23 11:25:00,2017,2,Surface,3.82,25.788,-80.1627,Biscayne Bay Aquatic Preserve,6,6Q/11Q,Biscayne Bay,BB,Winter


In [10]:
# unique_waterbodies = dfAllTime['WaterBody'].unique()
# print(unique_waterbodies)

### 1.3 Define the wet and dry seasons<a class="anchor" id="reg_wetndry"></a>

In [11]:
# Load the table of wet and dry seasons definitions
seasons2 = pd.read_csv(gis_path + 'season_def/2 seasons.csv', low_memory=False)
seasons2

Unnamed: 0,WaterBody,Start Year,Start Month,Start Day,End Year,End Month,End Day,Start Date,End Date,Seasons
0,Charlotte Harbor,2017,5,1,2017,10,31,5/1/2017,10/31/2017,Wet
1,Charlotte Harbor,2017,11,1,2018,4,30,11/1/2017,4/30/2018,Dry
2,Big Bend Seagrasses,2021,5,1,2021,10,31,5/1/2021,10/31/2021,Wet
3,Big Bend Seagrasses,2021,11,1,2022,4,30,11/1/2021,4/30/2022,Dry
4,Estero Bay,2017,5,1,2017,10,31,5/1/2017,10/31/2017,Wet
5,Estero Bay,2017,11,1,2018,4,30,11/1/2017,4/30/2018,Dry
6,Guana Tolomato Matanzas,2016,5,1,2016,10,31,5/1/2016,10/31/2016,Wet
7,Guana Tolomato Matanzas,2016,11,1,2017,4,30,11/1/2016,4/30/2017,Dry
8,Biscayne Bay,2022,5,1,2022,10,31,5/1/2022,10/31/2022,Wet
9,Biscayne Bay,2022,11,1,2023,4,30,11/1/2022,4/30/2023,Dry


In [12]:
# Function to filters the year range
filtered_dfAllTime = idw_rk.filter_data(dfAllTime, seasons2)
filtered_dfAllTime.head()

Unnamed: 0,RowID,ProgramID,ParameterName,ParameterUnits,ProgramLocationID,ActivityType,SampleDate,Year,Month,RelativeDepth,ResultValue,Latitude_DD,Longitude_DD,ManagedAreaName,AreaID,SEACAR_QAQCFlagCode,WaterBody,WbodyAcronym,Season
0,30930,4054,Water Temperature,Degrees C,WIN_21FLWQA_GTMSSNUT,Field,2016-07-14 10:22:00,2016,7,,31.4,29.868842,-81.307443,Guana Tolomato Matanzas National Estuarine Res...,20,6Q,Guana Tolomato Matanzas,GTM,Summer
1,30930,4054,Water Temperature,Degrees C,WIN_21FLWQA_GTMSSNUT,Field,2016-07-14 10:22:00,2016,7,,31.4,29.868842,-81.307443,Guana Tolomato Matanzas National Estuarine Res...,20,6Q,Guana Tolomato Matanzas,GTM,Summer
2,30933,4054,Dissolved Oxygen,mg/L,WIN_21FLWQA_GTMPCNUT,Field,2016-03-07 14:30:00,2016,3,Surface,6.92,29.667541,-81.257755,Guana Tolomato Matanzas National Estuarine Res...,20,9Q/6Q,Guana Tolomato Matanzas,GTM,Spring
3,30933,4054,Dissolved Oxygen,mg/L,WIN_21FLWQA_GTMPCNUT,Field,2016-03-07 14:30:00,2016,3,Surface,6.92,29.667541,-81.257755,Guana Tolomato Matanzas National Estuarine Res...,20,9Q/6Q,Guana Tolomato Matanzas,GTM,Spring
4,30934,4054,Dissolved Oxygen,mg/L,WIN_21FLWQA_GTMPCNUT,Field,2016-08-14 09:45:00,2016,8,Surface,2.72,29.667541,-81.257755,Guana Tolomato Matanzas National Estuarine Res...,20,9Q/6Q,Guana Tolomato Matanzas,GTM,Summer


In [13]:
# Check the filtered results
unique_years = filtered_dfAllTime.groupby('WaterBody')['Year'].unique().reset_index()
unique_years.columns = ['WaterBody', 'Unique Years']
unique_years

Unnamed: 0,WaterBody,Unique Years
0,Big Bend Seagrasses,"[2022, 2021]"
1,Biscayne Bay,"[2022, 2023]"
2,Charlotte Harbor,"[2018, 2017]"
3,Estero Bay,"[2018, 2017]"
4,Guana Tolomato Matanzas,"[2016, 2017]"


In [14]:
# Assign wet and dry season
updated_dfAllTime = idw_rk.assign_seasons(filtered_dfAllTime, seasons2)
updated_dfAllTime.head()

Unnamed: 0,RowID,ProgramID,ParameterName,ParameterUnits,ProgramLocationID,ActivityType,SampleDate,Year,Month,RelativeDepth,...,Longitude_DD,ManagedAreaName,AreaID,SEACAR_QAQCFlagCode,WaterBody,WbodyAcronym,Season,Start Date,End Date,Seasons
0,30930,4054,Water Temperature,Degrees C,WIN_21FLWQA_GTMSSNUT,Field,2016-07-14 10:22:00,2016,7,,...,-81.307443,Guana Tolomato Matanzas National Estuarine Res...,20,6Q,Guana Tolomato Matanzas,GTM,Summer,2016-05-01,2016-10-31,Wet
2,30930,4054,Water Temperature,Degrees C,WIN_21FLWQA_GTMSSNUT,Field,2016-07-14 10:22:00,2016,7,,...,-81.307443,Guana Tolomato Matanzas National Estuarine Res...,20,6Q,Guana Tolomato Matanzas,GTM,Summer,2016-05-01,2016-10-31,Wet
8,30934,4054,Dissolved Oxygen,mg/L,WIN_21FLWQA_GTMPCNUT,Field,2016-08-14 09:45:00,2016,8,Surface,...,-81.257755,Guana Tolomato Matanzas National Estuarine Res...,20,9Q/6Q,Guana Tolomato Matanzas,GTM,Summer,2016-05-01,2016-10-31,Wet
10,30934,4054,Dissolved Oxygen,mg/L,WIN_21FLWQA_GTMPCNUT,Field,2016-08-14 09:45:00,2016,8,Surface,...,-81.257755,Guana Tolomato Matanzas National Estuarine Res...,20,9Q/6Q,Guana Tolomato Matanzas,GTM,Summer,2016-05-01,2016-10-31,Wet
12,30937,4054,Dissolved Oxygen,mg/L,WIN_21FLWQA_GTMPCNUT,Field,2016-08-15 10:45:00,2016,8,Surface,...,-81.257755,Guana Tolomato Matanzas National Estuarine Res...,20,9Q/6Q,Guana Tolomato Matanzas,GTM,Summer,2016-05-01,2016-10-31,Wet


### 1.4 Calculating average values at unique observation points<a class="anchor" id="reg_average"></a>

In [15]:
dfAll_Mean = updated_dfAllTime.groupby(['WaterBody','ParameterName','ParameterUnits','Seasons','Latitude_DD','Longitude_DD','WbodyAcronym'])["ResultValue"].agg("mean").reset_index()
dfAll = dfAll_Mean

### 1.5 Convert coordinate system to EPSG: 3086<a class="anchor" id="reg_coordinate"></a>

In [16]:
# Define the EPSG codes for source (EPSG:4326) and target (EPSG:3086) coordinate systems
source_epsg = 'EPSG:4326'
target_epsg = 'EPSG:3086'

# Create a PyProj Transformer for the conversion
transformer = pyproj.Transformer.from_crs(source_epsg, target_epsg, always_xy=True)

# Define a function to apply the transformation to each row of the DataFrame
def transform_coordinates(row):
    x, y = transformer.transform(row['Longitude_DD'], row['Latitude_DD'])
    return pd.Series({'x': x, 'y': y})

# Apply the transformation function to the DataFrame and create new columns for the converted coordinates
dfAll[['x', 'y']] = dfAll.apply(transform_coordinates, axis=1)

#### Save aggregated data to csv file

In [17]:
dfAll.to_csv(gis_path + 'OEAT_2Seasons_All_WQ-2024-May-2.csv', index=False)

## 2. Prepare for batch interpolation<a class="anchor" id="reg_batch"></a>
### 2.1 Preset abbreviation for waterbody and parameter name<a class="anchor" id="reg_preset"></a>

In [18]:
area_shortnames = {
    'Guana Tolomato Matanzas': 'GTM',
    'Estero Bay': 'EB',
    'Charlotte Harbor': 'CH',
    'Biscayne Bay': 'BB',
    'Big Bend Seagrasses':'BBS'
}

param_shortnames = {
    'Salinity': 'Sal_ppt',
    'Total Nitrogen': 'TN_mgl',
    'Dissolved Oxygen': 'DO_mgl',
    'Turbidity':'Turb_ntu',
    'Secchi Depth':'Secc_m',
    'Water Temperature':'T_c'
}

### 2.2 Load the barrier files<a class="anchor" id="reg_barrier"></a>

In [19]:
barrier_folder = os.path.join(gis_path, 'Barriers')
barrier_folder

barriers = []
for file in os.listdir(barrier_folder):
    if file.endswith(".shp"):
        barriers.append(os.path.join(barrier_folder, file))

for barrier in barriers:
    print(barrier)

E:/Projects/SEACAR_WQ_2024/GIS_Data/Barriers\BBS_Barriers.shp
E:/Projects/SEACAR_WQ_2024/GIS_Data/Barriers\BB_Barriers.shp
E:/Projects/SEACAR_WQ_2024/GIS_Data/Barriers\CH_Barriers.shp
E:/Projects/SEACAR_WQ_2024/GIS_Data/Barriers\EB_Barriers.shp
E:/Projects/SEACAR_WQ_2024/GIS_Data/Barriers\GTM_Barriers.shp


### 2.3 Define waterbody boundary for spatial extent and masking<a class="anchor" id="reg_boundary"></a>

In [20]:
waterbody_extent = os.path.join(gis_path, 'OEAT_Waterbody_Boundaries', 'OEAT_Waterbody_Boundary.shp')

unique_waterbodies = []
with arcpy.da.SearchCursor(waterbody_extent, ['WaterbodyA']) as cursor:
    for row in cursor:
        unique_waterbodies.append(row[0])

print("Unique Waterbodies:", unique_waterbodies)

Unique Waterbodies: ['BBS', 'BB', 'CH', 'EB', 'GTM']


### 2.4 Load the table of study periods,  parameters, and seasons<a class="anchor" id="reg_study"></a>

In [21]:
seasons_all = pd.read_csv(gis_path + 'season_def/WetDry_all.csv', low_memory=False)
seasons_all.head(10)

Unnamed: 0,WaterBody,Start Year,End Year,Seasons,Parameter,Filename,NumDataPoints,RMSE,ME
0,Charlotte Harbor,2017,2017,Wet,Total Nitrogen,,,,
1,Charlotte Harbor,2017,2018,Dry,Total Nitrogen,,,,
2,Big Bend Seagrasses,2021,2021,Wet,Total Nitrogen,,,,
3,Big Bend Seagrasses,2021,2022,Dry,Total Nitrogen,,,,
4,Estero Bay,2017,2017,Wet,Total Nitrogen,,,,
5,Estero Bay,2017,2018,Dry,Total Nitrogen,,,,
6,Guana Tolomato Matanzas,2016,2016,Wet,Total Nitrogen,,,,
7,Guana Tolomato Matanzas,2016,2017,Dry,Total Nitrogen,,,,
8,Biscayne Bay,2022,2022,Wet,Total Nitrogen,,,,
9,Biscayne Bay,2022,2023,Dry,Total Nitrogen,,,,


### 2.5 Define output folders<a class="anchor" id="reg_output"></a>

In [22]:
shpAll_folder = gis_path + r"shapefiles/TwoSeasons_shapefiles_All" 
idwAll_folder = gis_path + r"raster_output/TwoSeasons_IDW_All"

# Preview dataset
dfAll

Unnamed: 0,WaterBody,ParameterName,ParameterUnits,Seasons,Latitude_DD,Longitude_DD,WbodyAcronym,ResultValue,x,y
0,Big Bend Seagrasses,Dissolved Oxygen,mg/L,Dry,29.008300,-82.825250,BBS,8.245000,514236.421562,556316.395208
1,Big Bend Seagrasses,Dissolved Oxygen,mg/L,Dry,29.108333,-82.858333,BBS,7.500000,510916.530547,567394.103382
2,Big Bend Seagrasses,Dissolved Oxygen,mg/L,Dry,29.125000,-82.841666,BBS,7.623333,512518.355037,569259.880703
3,Big Bend Seagrasses,Dissolved Oxygen,mg/L,Dry,29.149500,-83.079500,BBS,8.925000,489395.665621,571785.712589
4,Big Bend Seagrasses,Dissolved Oxygen,mg/L,Dry,29.161167,-83.047333,BBS,8.125000,492509.553281,573104.900093
...,...,...,...,...,...,...,...,...,...,...
1492,Guana Tolomato Matanzas,Water Temperature,Degrees C,Wet,29.737041,-81.245953,GTM,27.940735,665987.248566,639659.363097
1493,Guana Tolomato Matanzas,Water Temperature,Degrees C,Wet,29.868842,-81.307443,GTM,27.783333,659730.009429,654156.987464
1494,Guana Tolomato Matanzas,Water Temperature,Degrees C,Wet,29.868851,-81.307428,GTM,28.227289,659731.434296,654158.019057
1495,Guana Tolomato Matanzas,Water Temperature,Degrees C,Wet,30.050800,-81.367500,GTM,29.250000,653506.502810,674226.335251


### 2.6 Fill NaN RowID with unique ID (IDW function needs unique ID) <a class="anchor" id="reg_id"></a>

In [23]:
idw_rk.fill_nan_rowids(dfAll, 'RowID')

# Keep RowID as integer
dfAll['RowID'] = dfAll['RowID'].astype(int)
dfAll

Unnamed: 0,WaterBody,ParameterName,ParameterUnits,Seasons,Latitude_DD,Longitude_DD,WbodyAcronym,ResultValue,x,y,RowID
0,Big Bend Seagrasses,Dissolved Oxygen,mg/L,Dry,29.008300,-82.825250,BBS,8.245000,514236.421562,556316.395208,1
1,Big Bend Seagrasses,Dissolved Oxygen,mg/L,Dry,29.108333,-82.858333,BBS,7.500000,510916.530547,567394.103382,2
2,Big Bend Seagrasses,Dissolved Oxygen,mg/L,Dry,29.125000,-82.841666,BBS,7.623333,512518.355037,569259.880703,3
3,Big Bend Seagrasses,Dissolved Oxygen,mg/L,Dry,29.149500,-83.079500,BBS,8.925000,489395.665621,571785.712589,4
4,Big Bend Seagrasses,Dissolved Oxygen,mg/L,Dry,29.161167,-83.047333,BBS,8.125000,492509.553281,573104.900093,5
...,...,...,...,...,...,...,...,...,...,...,...
1492,Guana Tolomato Matanzas,Water Temperature,Degrees C,Wet,29.737041,-81.245953,GTM,27.940735,665987.248566,639659.363097,1493
1493,Guana Tolomato Matanzas,Water Temperature,Degrees C,Wet,29.868842,-81.307443,GTM,27.783333,659730.009429,654156.987464,1494
1494,Guana Tolomato Matanzas,Water Temperature,Degrees C,Wet,29.868851,-81.307428,GTM,28.227289,659731.434296,654158.019057,1495
1495,Guana Tolomato Matanzas,Water Temperature,Degrees C,Wet,30.050800,-81.367500,GTM,29.250000,653506.502810,674226.335251,1496


## 3. Create Shapefiles <a class="anchor" id="reg_create_shp"></a>

In [24]:
# Merge interested with latitude and longitude columns
seasons_all_coord = idw_rk.merge_with_lat_long_new(seasons_all, dfAll, "Seasons")
seasons_all_coord

Unnamed: 0,WaterBody,Start Year,End Year,Seasons,Parameter,Filename,NumDataPoints,RMSE,ME,x,y,RowID,ResultValue
0,Charlotte Harbor,2017,2017,Wet,Total Nitrogen,,,,,,,,
1,Charlotte Harbor,2017,2018,Dry,Total Nitrogen,,,,,591037.221033,273012.265703,1140,0.560000
2,Charlotte Harbor,2017,2018,Dry,Total Nitrogen,,,,,592008.752819,274733.690037,1141,0.760000
3,Charlotte Harbor,2017,2018,Dry,Total Nitrogen,,,,,591481.098118,275247.216619,1142,0.640000
4,Charlotte Harbor,2017,2018,Dry,Total Nitrogen,,,,,589338.150366,275567.884388,1143,0.705000
...,...,...,...,...,...,...,...,...,...,...,...,...,...
1503,Biscayne Bay,2022,2023,Dry,Water Temperature,,,,,784697.277475,213473.642774,1020,25.176294
1504,Biscayne Bay,2022,2023,Dry,Water Temperature,,,,,785871.933987,216879.419467,1021,25.423244
1505,Biscayne Bay,2022,2023,Dry,Water Temperature,,,,,787466.319703,218700.069505,1022,25.730076
1506,Biscayne Bay,2022,2023,Dry,Water Temperature,,,,,784434.115885,221871.575426,1023,25.338482


In [25]:
idw_rk.create_shp_season_new(seasons_all_coord, "Seasons", shpAll_folder, start_year_included=True)

Number of data rows for BBS, DO_mgl, 2021, Dry: 42
Shapefile for BBS, DO_mgl for 2021 and season Dry has been saved as SHP_BBS_DO_mgl_2021_Dry.shp
Number of data rows for BBS, Sal_ppt, 2021, Dry: 33
Shapefile for BBS, Sal_ppt for 2021 and season Dry has been saved as SHP_BBS_Sal_ppt_2021_Dry.shp
Number of data rows for BBS, Secc_m, 2021, Dry: 32
Shapefile for BBS, Secc_m for 2021 and season Dry has been saved as SHP_BBS_Secc_m_2021_Dry.shp
Number of data rows for BBS, TN_mgl, 2021, Dry: 41
Shapefile for BBS, TN_mgl for 2021 and season Dry has been saved as SHP_BBS_TN_mgl_2021_Dry.shp
Number of data rows for BBS, Turb_ntu, 2021, Dry: 42
Shapefile for BBS, Turb_ntu for 2021 and season Dry has been saved as SHP_BBS_Turb_ntu_2021_Dry.shp
Number of data rows for BBS, T_c, 2021, Dry: 42
Shapefile for BBS, T_c for 2021 and season Dry has been saved as SHP_BBS_T_c_2021_Dry.shp
Number of data rows for BBS, DO_mgl, 2021, Wet: 46
Shapefile for BBS, DO_mgl for 2021 and season Wet has been saved as

In [26]:
seasons_all_coord.head()

Unnamed: 0,WaterBody,Start Year,End Year,Seasons,Parameter,Filename,NumDataPoints,RMSE,ME,x,y,RowID,ResultValue
0,Charlotte Harbor,2017,2017,Wet,Total Nitrogen,,,,,,,,
1,Charlotte Harbor,2017,2018,Dry,Total Nitrogen,,,,,591037.221033,273012.265703,1140.0,0.56
2,Charlotte Harbor,2017,2018,Dry,Total Nitrogen,,,,,592008.752819,274733.690037,1141.0,0.76
3,Charlotte Harbor,2017,2018,Dry,Total Nitrogen,,,,,591481.098118,275247.216619,1142.0,0.64
4,Charlotte Harbor,2017,2018,Dry,Total Nitrogen,,,,,589338.150366,275567.884388,1143.0,0.705


## 4. Cross Validation for IDW <a class="anchor" id="reg_cv_idw"></a>

In [27]:
# Empty the shapefile folder
idw_rk.delete_all_files(idwAll_folder)

In [28]:
# Select a section of table to process
seasons_slct = seasons_all.iloc[:]
seasons_slct.head()

Unnamed: 0,WaterBody,Start Year,End Year,Seasons,Parameter,Filename,NumDataPoints,RMSE,ME
0,Charlotte Harbor,2017,2017,Wet,Total Nitrogen,,,,
1,Charlotte Harbor,2017,2018,Dry,Total Nitrogen,,,,
2,Big Bend Seagrasses,2021,2021,Wet,Total Nitrogen,,,,
3,Big Bend Seagrasses,2021,2022,Dry,Total Nitrogen,,,,
4,Estero Bay,2017,2017,Wet,Total Nitrogen,,,,


In [None]:
# If the number of data points is less than 3，skipping calculate IDW
idw_rk.idw_interpolation_new(seasons_slct, shpAll_folder, idwAll_folder, waterbody_extent, barrier_folder, "Seasons")

Shapefile not found for: SHP_CH_TN_mgl_2017_Wet.shp
Processing file: SHP_CH_TN_mgl_2017_Dry.shp
File SHP_CH_TN_mgl_2017_Dry.shp has completed 52 cross-validation iterations.
Processing file: SHP_BBS_TN_mgl_2021_Wet.shp
File SHP_BBS_TN_mgl_2021_Wet.shp has completed 38 cross-validation iterations.
Processing file: SHP_BBS_TN_mgl_2021_Dry.shp
File SHP_BBS_TN_mgl_2021_Dry.shp has completed 41 cross-validation iterations.
Shapefile not found for: SHP_EB_TN_mgl_2017_Wet.shp
Processing file: SHP_EB_TN_mgl_2017_Dry.shp
File SHP_EB_TN_mgl_2017_Dry.shp has completed 37 cross-validation iterations.
Shapefile not found for: SHP_GTM_TN_mgl_2016_Wet.shp
Shapefile not found for: SHP_GTM_TN_mgl_2016_Dry.shp
Processing file: SHP_BB_TN_mgl_2022_Wet.shp
File SHP_BB_TN_mgl_2022_Wet.shp has completed 59 cross-validation iterations.
Processing file: SHP_BB_TN_mgl_2022_Dry.shp
File SHP_BB_TN_mgl_2022_Dry.shp has completed 60 cross-validation iterations.
Processing file: SHP_CH_Sal_ppt_2017_Wet.shp
File SHP_

## 5. RK Interpolation

Define output folder

In [None]:
out_raster_floder = gis_path + r"raster_output/TwoSeasons_RK_All"
out_ga_folder     = gis_path + r"ga_output_rk/"
diagnostic_folder = gis_path + r"diagnostic_rk/"
std_error_folder  = gis_path + r"std_error_pred/std_error_rk_2s/"

# Clean existing files in folders
idw_rk.delete_all_files(out_raster_floder)
idw_rk.delete_all_files(out_ga_folder)
idw_rk.delete_all_files(diagnostic_folder)
idw_rk.delete_all_files(std_error_folder)

In [None]:
# Write the output in a csv file
with open(gis_path+"rk_2s.csv", 'w', newline='') as csvfile:
    csv_writer = csv.writer(csvfile)
    # Write the header line
    cols = list(seasons_all.columns)
    cols.append('covariates')
    csv_writer.writerow(cols)
    
    for i in seasons_all.index:
        s_time =time.time() 
        process,rmse,me,count,file_loc = idw_rk.rk_interpolation(method = "rk",
                                           radius = 10000,
                                           folder_path = gis_path,
                                           waterbody = area_shortnames[seasons_all.iloc[i]["WaterBody"]],
                                           parameter = param_shortnames[seasons_all.iloc[i]["Parameter"]],
                                           year      = seasons_all.iloc[i]["Year"],
                                           season    = seasons_all.iloc[i]['Season'],
                                           covariates= covariates_dict[area_shortnames[seasons_all.iloc[i]["WaterBody"]]],
                                           out_raster_folder = out_raster_floder,
                                           out_ga_folder     = out_ga_folder,
                                           std_error_folder  = std_error_folder,                  
                                           diagnostic_folder = diagnostic_folder)
        e_time =time.time()

        print(f"{int(e_time-s_time)} seconds elapsed for processing {count} points in {i}th row: RMSE: {rmse}, ME: {me}, file exported to {file_loc}")
        csv_writer.writerow([seasons_all.iloc[i]["WaterBody"], 
                             seasons_all.iloc[i]["Year"],
                             seasons_all.iloc[i]['Season'],
                             seasons_all.iloc[i]["Parameter"],
                             file_loc, count, rmse, me,
                             covariates_dict[area_shortnames[seasons_all.iloc[i]["WaterBody"]]]])
        if i%10 == 0: csvfile.flush() # flush the csv file in every 20 rows.
#         seasons_all['RMSE'][i:i+1] = rmse
#         seasons_all['ME'][i:i+1] = me
#         seasons_all['NumDataPoints'][i:i+1] = count
#         seasons_all['Filename'][i:i+1] = file_loc
#     seasons_all.to_csv(gis_path+"result_RK_all.csv")