# Cross Validation for IDW & RK Interpolation 
## Task 2 (continuous & discrete) for four seasons

This document includes Python codes that conduct cross validation (CV) for Inverse Distance Weighting (IDW) Interpolation and RK on water quality parameters, including 6 water quality parameters in arcpy environment:
- Dissolved oxygen (DO_mgl)
- Salinity (Sal_ppt)
- Turbidity (Turb_ntu)
- Temperature (T_c)
- Secchi (Secc_m)
- Total Nitrogen (TN_mgl) 

The analysis is conducted in the separate water bodies:
- Guana Tolomato Matanzas (GTM)
- Estero Bay (EB)
- Charlotte Harbor (CH)
- Biscayne Bay (BB)
- Big Bend Seagrasses (BBS)

**Tasks:**  

**Calculate the RMSE and Mean Error (ME) for IDW and RK results, incorporating both continuous and discrete data across four seasons (spring, summer, fall, and winter).**


<br>
<div style="text-align: left;">
    <img src="../misc/FourSeasons.png" style="display: block; margin-left: 0; margin-right: auto; width: 900px;"/>
</div>


**Contents:**
* [1. Data Preprocess](#reg_preprocessing)
    * [1.1 Load csv files](#reg_subset)
    * [1.2 Subsetting data](#reg_preview)
    * [1.3 Filter the data](#reg_studied)
    * [1.4 Calculating average values](#reg_average)
    * [1.5 Convert coordinate system](#reg_coordinate)
* [2. Prepare for batch interpolation](#reg_batch)
    * [2.1 Preset abbreviation](#reg_preset)
    * [2.2 Define the barrier files](#reg_barrier)
    * [2.3 Define waterbody boundary](#reg_boundary)
    * [2.4 Load the table of study periods,  parameters, and seasons](#reg_study)
    * [2.5 Define output folders](#reg_output)
    * [2.6 Fill NaN RowID with unique ID](#reg_id)
* [3. Create Shapefiles](#reg_create_shp)
* [4. Cross Validation for IDW](#reg_cv_idw)
* [5. RK Interpolation](#reg_rk)

## 1. Loading packages

In [15]:
import pandas as pd
import numpy as np
import arcpy
from arcpy.sa import *
import os
import math
import csv

import importlib
import sys
# path = r'C:/Users/cong1/WQ/IDW/git/misc'
path = r'E:\Projects\SEACAR_WQ_2024\git\misc'

sys.path.insert(0, path)
import idw_rk
importlib.reload(idw_rk)

import pyproj

# define scratch folder to avoid overwritting from parallel threats
arcpy.env.scratchWorkspace = r"E:\Projects\SEACAR_WQ_2024\scratch/IDW_4s"

arcpy.env.overwriteOutput = True

## 1. Data Preprocessing <a class="anchor" id="reg_preprocessing"></a>
### 1.1 Load csv files

In [2]:
gis_path = r'E:/Projects/SEACAR_WQ_2024/GIS_Data/'

dfDis = pd.read_csv(gis_path + 'OEAT_Discrete_WQ-2024-May-06.csv', low_memory=False)
dfCon = pd.read_csv(gis_path + 'OEAT_Continuous_WQ-2024-Feb-21.csv', low_memory=False)

dfAll = pd.concat([dfDis, dfCon], ignore_index=True)

## 1.2 Subsetting Data <a class="anchor" id="reg_subset"></a>
### Selecting data from 8 am to 18 pm (daytime)

In [3]:
# Convert string to datetime
dfCon['SampleDate'] = pd.to_datetime(dfCon['SampleDate'], format='%Y-%m-%d %H:%M:%S.%f')
dfDis['SampleDate'] = pd.to_datetime(dfDis['SampleDate'], format='%Y-%m-%d %H:%M:%S.%f')

# Include date from 8:00 am to 18:00 pm
start_time = '08:00'
end_time = '18:00'

dfCon = dfCon[dfCon['SampleDate'].dt.time.between(pd.to_datetime(start_time).time(), pd.to_datetime(end_time).time())]

dfAll = pd.concat([dfDis, dfCon], ignore_index=True)

dfAll.head()

Unnamed: 0,RowID,ProgramID,ParameterName,ParameterUnits,ProgramLocationID,ActivityType,SampleDate,Year,Month,RelativeDepth,ResultValue,Latitude_DD,Longitude_DD,ManagedAreaName,AreaID,SEACAR_QAQCFlagCode,WaterBody,WbodyAcronym,Season
0,1,69,Secchi Depth,m,CKM2017100405,Field,2017-10-06,2017,10,,0.3,29.3221,-83.129866,Big Bend Seagrasses Aquatic Preserve,5,6Q,Big Bend Seagrasses,BBS,Fall
1,2,69,Secchi Depth,m,CKM2017080401,Field,2017-08-08,2017,8,Surface,0.6,29.145966,-83.07225,Big Bend Seagrasses Aquatic Preserve,5,6Q/9Q,Big Bend Seagrasses,BBS,Summer
2,3,69,Secchi Depth,m,CKM2017060703,Field,2017-06-19,2017,6,Surface,0.4,29.294516,-83.155316,Big Bend Seagrasses Aquatic Preserve,5,9Q/6Q,Big Bend Seagrasses,BBS,Summer
3,4,69,Secchi Depth,m,CKM2017060202,Field,2017-06-06,2017,6,Surface,0.4,29.1404,-83.01705,Big Bend Seagrasses Aquatic Preserve,5,6Q/9Q,Big Bend Seagrasses,BBS,Spring
4,5,69,Secchi Depth,m,CKM2017110804,Field,2017-11-14,2017,11,Surface,0.4,29.269566,-83.107283,Big Bend Seagrasses Aquatic Preserve,5,9Q/6Q,Big Bend Seagrasses,BBS,Fall


### 1.3 Filter the data<a class="anchor" id="reg_studied"></a>

In [4]:
# Load the table of four seasons definitions
seasons4 = pd.read_csv(gis_path + 'season_def/4 seasons.csv', low_memory=False)
seasons4

Unnamed: 0,WaterBody,SeasonNum,Season,Start Year,Start Month,Start Day,End Year,End Month,End Day,Start Date,End Date
0,Charlotte Harbor,1,Spring,2017,2,28,2017,6,11,2/28/2017,6/11/2017
1,Charlotte Harbor,2,Summer,2017,6,12,2017,9,11,6/12/2017,9/11/2017
2,Charlotte Harbor,3,Fall,2017,9,12,2017,11,28,9/12/2017,11/28/2017
3,Charlotte Harbor,4,Winter,2017,11,29,2018,2,27,11/29/2017,2/27/2018
4,Big Bend Seagrasses,1,Spring,2021,3,3,2021,6,7,3/3/2021,6/7/2021
5,Big Bend Seagrasses,2,Summer,2021,6,8,2021,9,7,6/8/2021,9/7/2021
6,Big Bend Seagrasses,3,Fall,2021,9,8,2021,12,2,9/8/2021,12/2/2021
7,Big Bend Seagrasses,4,Winter,2021,12,3,2022,3,2,12/3/2021,3/2/2022


In [5]:
# Function to filter data based on specified date ranges
selected_dfAllTime = idw_rk.filter_by_date_range(dfAll, seasons4)
selected_dfAllTime.head()

Unnamed: 0,RowID,ProgramID,ParameterName,ParameterUnits,ProgramLocationID,ActivityType,SampleDate,Year,Month,RelativeDepth,...,Latitude_DD,Longitude_DD,ManagedAreaName,AreaID,SEACAR_QAQCFlagCode,WaterBody,WbodyAcronym,Season,Start Date,End Date
1830,458,69,Secchi Depth,m,CHM2017091705,Field,2017-09-22,2017,9,Surface,...,26.59845,-82.129733,Pine Island Sound Aquatic Preserve,34,9Q/6Q,Charlotte Harbor,CH,Fall,2017-09-12,2017-11-28
2394,599,69,Secchi Depth,m,CHM2017111102,Field,2017-11-01,2017,11,Surface,...,26.840333,-82.269916,Gasparilla Sound-Charlotte Harbor Aquatic Pres...,18,6Q/9Q,Charlotte Harbor,CH,Fall,2017-09-12,2017-11-28
2549,638,69,Secchi Depth,m,CHM2017070702,Field,2017-07-12,2017,7,Surface,...,26.79385,-82.153633,Cape Haze Aquatic Preserve,9,6Q/9Q,Charlotte Harbor,CH,Summer,2017-06-12,2017-09-11
2582,646,69,Secchi Depth,m,CHM2017100707,Field,2017-10-17,2017,10,,...,26.775283,-82.1343,Gasparilla Sound-Charlotte Harbor Aquatic Pres...,18,6Q,Charlotte Harbor,CH,Fall,2017-09-12,2017-11-28
2586,647,69,Secchi Depth,m,CHM2017110607,Field,2017-11-14,2017,11,,...,26.884983,-82.09405,Gasparilla Sound-Charlotte Harbor Aquatic Pres...,18,6Q,Charlotte Harbor,CH,Fall,2017-09-12,2017-11-28


### 1.4 Calculating average values at unique observation points<a class="anchor" id="reg_average"></a>

In [6]:
dfAll_Mean = selected_dfAllTime.groupby(['WaterBody','ParameterName','ParameterUnits','Season','Latitude_DD','Longitude_DD','WbodyAcronym'])["ResultValue"].agg("mean").reset_index()
dfAll = dfAll_Mean

### 1.5 Convert coordinate system to EPSG: 3086<a class="anchor" id="reg_coordinate"></a>

In [7]:
# Define the EPSG codes for source (EPSG:4326) and target (EPSG:3086) coordinate systems
source_epsg = 'EPSG:4326'
target_epsg = 'EPSG:3086'

# Create a PyProj Transformer for the conversion
transformer = pyproj.Transformer.from_crs(source_epsg, target_epsg, always_xy=True)

# Define a function to apply the transformation to each row of the DataFrame
def transform_coordinates(row):
    x, y = transformer.transform(row['Longitude_DD'], row['Latitude_DD'])
    return pd.Series({'x': x, 'y': y})

# Apply the transformation function to the DataFrame and create new columns for the converted coordinates
dfAll[['x', 'y']] = dfAll.apply(transform_coordinates, axis=1)

#### Save aggregated data to csv file

In [8]:
dfAll.to_csv(gis_path + 'OEAT_4Seasons_All_WQ-2024-May-02.csv', index=False)

## 2. Prepare for batch interpolation<a class="anchor" id="reg_batch"></a>
### 2.1 Preset abbreviation for waterbody and parameter name<a class="anchor" id="reg_preset"></a>

In [9]:
area_shortnames = {
    'Guana Tolomato Matanzas': 'GTM',
    'Estero Bay': 'EB',
    'Charlotte Harbor': 'CH',
    'Biscayne Bay': 'BB',
    'Big Bend Seagrasses':'BBS'
}

param_shortnames = {
    'Salinity': 'Sal_ppt',
    'Total Nitrogen': 'TN_mgl',
    'Dissolved Oxygen': 'DO_mgl',
    'Turbidity':'Turb_ntu',
    'Secchi Depth':'Secc_m',
    'Water Temperature':'T_c'
}

covariates_dict = {
    "GTM":"LDI",
    "EB":"bathymetry+LDI+popden",
    "CH":"bathymetry+LDI+popden+water_flow_wet",
    "BB":"bathymetry+LDI+popden",
    "BBS":"bathymetry+LDI"
}

### 2.2 Define the barrier files<a class="anchor" id="reg_barrier"></a>

In [10]:
barrier_folder = os.path.join(gis_path, 'Barriers')
barrier_folder

barriers = []
for file in os.listdir(barrier_folder):
    if file.endswith(".shp"):
        barriers.append(os.path.join(barrier_folder, file))

for barrier in barriers:
    print(barrier)

E:/Projects/SEACAR_WQ_2024/GIS_Data/Barriers\BBS_Barriers.shp
E:/Projects/SEACAR_WQ_2024/GIS_Data/Barriers\BB_Barriers.shp
E:/Projects/SEACAR_WQ_2024/GIS_Data/Barriers\CH_Barriers.shp
E:/Projects/SEACAR_WQ_2024/GIS_Data/Barriers\EB_Barriers.shp
E:/Projects/SEACAR_WQ_2024/GIS_Data/Barriers\GTM_Barriers.shp


### 2.3 Define waterbody boundary for spatial extent and masking<a class="anchor" id="reg_boundary"></a>

In [11]:
waterbody_extent = os.path.join(gis_path, 'OEAT_Waterbody_Boundaries', 'OEAT_Waterbody_Boundary.shp')

unique_waterbodies = []
with arcpy.da.SearchCursor(waterbody_extent, ['WaterbodyA']) as cursor:
    for row in cursor:
        unique_waterbodies.append(row[0])

print("Unique Waterbodies:", unique_waterbodies)

Unique Waterbodies: ['BBS', 'BB', 'CH', 'EB', 'GTM']


### 2.4 Load the table of study periods,  parameters, and seasons<a class="anchor" id="reg_study"></a>

In [12]:
seasons_all = pd.read_csv(gis_path + 'season_def/FourSeasons_all.csv', low_memory=False)
seasons_all

Unnamed: 0,WaterBody,Start Year,End Year,Season,Parameter,Filename,NumDataPoints,RMSE,ME
0,Charlotte Harbor,2017,2017,Spring,Total Nitrogen,,,,
1,Charlotte Harbor,2017,2017,Summer,Total Nitrogen,,,,
2,Charlotte Harbor,2017,2017,Fall,Total Nitrogen,,,,
3,Charlotte Harbor,2017,2018,Winter,Total Nitrogen,,,,
4,Charlotte Harbor,2017,2017,Spring,Salinity,,,,
5,Charlotte Harbor,2017,2017,Summer,Salinity,,,,
6,Charlotte Harbor,2017,2017,Fall,Salinity,,,,
7,Charlotte Harbor,2017,2018,Winter,Salinity,,,,
8,Charlotte Harbor,2017,2017,Spring,Dissolved Oxygen,,,,
9,Charlotte Harbor,2017,2017,Summer,Dissolved Oxygen,,,,


### 2.5 Define output folders<a class="anchor" id="reg_output"></a>

In [13]:
# shpAll_folder = gis_path + r"shapefiles_2seasons" 
# idwAll_folder = gis_path + r"raster_idw_2seasons"

shpAll_folder = gis_path + r"shapefiles/FourSeasons_All" 
idwAll_folder = gis_path + r"raster_output/FourSeasons_IDW_All"

# Preview dataset
dfAll

Unnamed: 0,WaterBody,ParameterName,ParameterUnits,Season,Latitude_DD,Longitude_DD,WbodyAcronym,ResultValue,x,y
0,Big Bend Seagrasses,Dissolved Oxygen,mg/L,Fall,29.008300,-82.825250,BBS,6.873333,514236.629541,556316.261436
1,Big Bend Seagrasses,Dissolved Oxygen,mg/L,Fall,29.125000,-82.841666,BBS,7.180000,512518.602025,569259.744247
2,Big Bend Seagrasses,Dissolved Oxygen,mg/L,Fall,29.149500,-83.079500,BBS,7.225000,489395.986664,571785.532572
3,Big Bend Seagrasses,Dissolved Oxygen,mg/L,Fall,29.161167,-83.047333,BBS,7.110000,492509.872329,573104.729928
4,Big Bend Seagrasses,Dissolved Oxygen,mg/L,Fall,29.162167,-82.810500,BBS,6.675000,515506.111607,573415.523716
...,...,...,...,...,...,...,...,...,...,...
7141,Charlotte Harbor,Water Temperature,Degrees C,Winter,26.956683,-82.073533,CH,23.700000,590890.442311,329432.468353
7142,Charlotte Harbor,Water Temperature,Degrees C,Winter,26.957016,-82.178716,CH,16.900000,580468.265764,329311.057959
7143,Charlotte Harbor,Water Temperature,Degrees C,Winter,26.958566,-82.178250,CH,16.500000,580511.895379,329483.813243
7144,Charlotte Harbor,Water Temperature,Degrees C,Winter,26.960560,-82.112840,CH,19.800000,586989.296420,329802.643699


### 2.6 Fill NaN RowID with unique ID, IDW function needs unique ID <a class="anchor" id="reg_id"></a>

In [14]:
idw_rk.fill_nan_rowids(dfAll, 'RowID')

# Keep RowID as integer
dfAll['RowID'] = dfAll['RowID'].astype(int)
dfAll.head()

Unnamed: 0,WaterBody,ParameterName,ParameterUnits,Season,Latitude_DD,Longitude_DD,WbodyAcronym,ResultValue,x,y,RowID
0,Big Bend Seagrasses,Dissolved Oxygen,mg/L,Fall,29.0083,-82.82525,BBS,6.873333,514236.629541,556316.261436,1
1,Big Bend Seagrasses,Dissolved Oxygen,mg/L,Fall,29.125,-82.841666,BBS,7.18,512518.602025,569259.744247,2
2,Big Bend Seagrasses,Dissolved Oxygen,mg/L,Fall,29.1495,-83.0795,BBS,7.225,489395.986664,571785.532572,3
3,Big Bend Seagrasses,Dissolved Oxygen,mg/L,Fall,29.161167,-83.047333,BBS,7.11,492509.872329,573104.729928,4
4,Big Bend Seagrasses,Dissolved Oxygen,mg/L,Fall,29.162167,-82.8105,BBS,6.675,515506.111607,573415.523716,5


## 3. Create Shapefiles <a class="anchor" id="reg_create_shp"></a>

In [16]:
# Merge interested with latitude and longitude columns
seasons_all_coord = idw_rk.merge_with_lat_long_new(seasons_all, dfAll, "Season")
seasons_all_coord

Unnamed: 0,WaterBody,Start Year,End Year,Season,Parameter,Filename,NumDataPoints,RMSE,ME,x,y,RowID,ResultValue
0,Charlotte Harbor,2017,2017,Spring,Total Nitrogen,,,,,591484.839075,272735.047067,5267,0.870
1,Charlotte Harbor,2017,2017,Spring,Total Nitrogen,,,,,589465.802362,275500.164081,5268,0.780
2,Charlotte Harbor,2017,2017,Spring,Total Nitrogen,,,,,582451.183526,278043.792487,5269,0.594
3,Charlotte Harbor,2017,2017,Spring,Total Nitrogen,,,,,584823.590495,278587.753297,5270,0.690
4,Charlotte Harbor,2017,2017,Spring,Total Nitrogen,,,,,585413.995613,280195.521648,5271,0.780
...,...,...,...,...,...,...,...,...,...,...,...,...,...
7141,Big Bend Seagrasses,2021,2022,Winter,Water Temperature,,,,,371688.784902,691953.259259,1194,21.100
7142,Big Bend Seagrasses,2021,2022,Winter,Water Temperature,,,,,371015.252225,692080.963200,1195,20.700
7143,Big Bend Seagrasses,2021,2022,Winter,Water Temperature,,,,,401894.709931,699334.789795,1196,19.200
7144,Big Bend Seagrasses,2021,2022,Winter,Water Temperature,,,,,401457.471573,702258.561096,1197,20.100


In [23]:
idw_rk.create_shp_season_new(seasons_all_coord, "Season", shpAll_folder, start_year_included=True)

Number of data rows for BBS, DO_mgl, 2021, Fall: 39
Shapefile for BBS, DO_mgl for 2021 and season Fall has been saved as SHP_BBS_DO_mgl_2021_Fall.shp
Number of data rows for BBS, Sal_ppt, 2021, Fall: 30
Shapefile for BBS, Sal_ppt for 2021 and season Fall has been saved as SHP_BBS_Sal_ppt_2021_Fall.shp
Number of data rows for BBS, Secc_m, 2021, Fall: 31
Shapefile for BBS, Secc_m for 2021 and season Fall has been saved as SHP_BBS_Secc_m_2021_Fall.shp
Number of data rows for BBS, TN_mgl, 2021, Fall: 34
Shapefile for BBS, TN_mgl for 2021 and season Fall has been saved as SHP_BBS_TN_mgl_2021_Fall.shp
Number of data rows for BBS, Turb_ntu, 2021, Fall: 35
Shapefile for BBS, Turb_ntu for 2021 and season Fall has been saved as SHP_BBS_Turb_ntu_2021_Fall.shp
Number of data rows for BBS, T_c, 2021, Fall: 39
Shapefile for BBS, T_c for 2021 and season Fall has been saved as SHP_BBS_T_c_2021_Fall.shp
Number of data rows for BBS, DO_mgl, 2021, Spring: 48
Shapefile for BBS, DO_mgl for 2021 and season 

## 4. Cross Validation for IDW <a class="anchor" id="reg_cv_idw"></a>

In [24]:
# Empty the shapefile folder
# idw_rk.delete_all_files(idwAll_folder)

In [20]:
# Select a section of table to process
seasons_slct = seasons_all.iloc[:]
# seasons_slct.drop(seasons_slct[seasons_slct['WaterBody'] == 'Charlotte Harbor'].index, inplace=True)
# seasons_slct = seasons_slct.reset_index()

In [21]:
seasons_slct

Unnamed: 0,WaterBody,Start Year,End Year,Season,Parameter,Filename,NumDataPoints,RMSE,ME
0,Charlotte Harbor,2017,2017,Spring,Total Nitrogen,,,,
1,Charlotte Harbor,2017,2017,Summer,Total Nitrogen,,,,
2,Charlotte Harbor,2017,2017,Fall,Total Nitrogen,,,,
3,Charlotte Harbor,2017,2018,Winter,Total Nitrogen,,,,
4,Charlotte Harbor,2017,2017,Spring,Salinity,,,,
5,Charlotte Harbor,2017,2017,Summer,Salinity,,,,
6,Charlotte Harbor,2017,2017,Fall,Salinity,,,,
7,Charlotte Harbor,2017,2018,Winter,Salinity,,,,
8,Charlotte Harbor,2017,2017,Spring,Dissolved Oxygen,,,,
9,Charlotte Harbor,2017,2017,Summer,Dissolved Oxygen,,,,


In [27]:
# If the number of data points is less than 3，skipping calculate IDW
idw_rk.idw_interpolation_new(seasons_slct, shpAll_folder, idwAll_folder, waterbody_extent, barrier_folder, "Season", include_start_year=True)

Processing file: SHP_BBS_TN_mgl_2021_Spring.shp
File SHP_BBS_TN_mgl_2021_Spring.shp has completed 37 cross-validation iterations.
Processing file: SHP_BBS_TN_mgl_2021_Summer.shp
File SHP_BBS_TN_mgl_2021_Summer.shp has completed 33 cross-validation iterations.
Processing file: SHP_BBS_TN_mgl_2021_Fall.shp
File SHP_BBS_TN_mgl_2021_Fall.shp has completed 34 cross-validation iterations.
Processing file: SHP_BBS_TN_mgl_2021_Winter.shp
File SHP_BBS_TN_mgl_2021_Winter.shp has completed 33 cross-validation iterations.
Processing file: SHP_BBS_Sal_ppt_2021_Spring.shp
File SHP_BBS_Sal_ppt_2021_Spring.shp has completed 37 cross-validation iterations.
Processing file: SHP_BBS_Sal_ppt_2021_Summer.shp
File SHP_BBS_Sal_ppt_2021_Summer.shp has completed 39 cross-validation iterations.
Processing file: SHP_BBS_Sal_ppt_2021_Fall.shp
File SHP_BBS_Sal_ppt_2021_Fall.shp has completed 30 cross-validation iterations.
Processing file: SHP_BBS_Sal_ppt_2021_Winter.shp
File SHP_BBS_Sal_ppt_2021_Winter.shp has co

## 5. RK Interpolation<a class="anchor" id="reg_rk"></a>

### Define output folder

In [30]:
# out_raster_folder = gis_path + r"rk_folder/FourSeasons_RK_all/"
# out_ga_folder     = gis_path + r"rk_folder/ga_output_rk_4s/"
# diagnostic_folder = gis_path + r"rk_folder/diagnostic_rk_4s/"
# std_error_folder  = gis_path + r"rk_folder/std_error_pred_4s/std_error_rk_4s/"

out_raster_folder = gis_path + r"raster_output/FourSeasons_RK_all/"
out_ga_folder     = gis_path + r"ga_output_rk/"
diagnostic_folder = gis_path + r"diagnostic_rk"
std_error_folder  = gis_path + r"std_error_pred/std_error_rk_4s/"

# Clean existing files in folders
idw_rk.delete_all_files(out_raster_folder)
idw_rk.delete_all_files(out_ga_folder)
idw_rk.delete_all_files(diagnostic_folder)
idw_rk.delete_all_files(std_error_folder)

In [31]:
seasons_all['covariates'] = seasons_all['WaterBody'].apply(lambda x: covariates_dict.get(x, 'default_covariate'))

rk_csv = gis_path + "rk_4s.csv" 
seasons_all.to_csv(rk_csv, index=False, encoding='utf-8-sig') 
seasons_all.head()

Unnamed: 0,WaterBody,Start Year,End Year,Season,Parameter,Filename,NumDataPoints,RMSE,ME,covariates
0,Charlotte Harbor,2017,2017,Spring,Total Nitrogen,,,,,default_covariate
1,Charlotte Harbor,2017,2017,Summer,Total Nitrogen,,,,,default_covariate
2,Charlotte Harbor,2017,2017,Fall,Total Nitrogen,,,,,default_covariate
3,Charlotte Harbor,2017,2018,Winter,Total Nitrogen,,,,,default_covariate
4,Charlotte Harbor,2017,2017,Spring,Salinity,,,,,default_covariate


In [None]:
with open(gis_path + "rk_4s.csv", 'w', newline='') as csvfile:
    csv_writer = csv.writer(csvfile)
    # Determine if year should be included in the output based on a condition
    start_year_included = True

    # Write the header line based on whether the year is included
    cols = list(seasons_all.columns)
    if not start_year_included:
        cols.remove('Start Year')
        cols.remove('End Year')
    csv_writer.writerow(cols)
    
    for i in seasons_all.index:
        s_time = time.time()
        process, rmse, me, count, file_loc = idw_rk.rk_interpolation_new(
            method="rk",
            radius=10000,
            folder_path=gis_path,
            waterbody=area_shortnames[seasons_all.iloc[i]["WaterBody"]],
            parameter=param_shortnames[seasons_all.iloc[i]["Parameter"]],
            year=seasons_all.iloc[i]["Start Year"],
            season=seasons_all.iloc[i]['Season'],
            covariates=covariates_dict[area_shortnames[seasons_all.iloc[i]["WaterBody"]]],
            out_raster_folder=out_raster_folder,
            out_ga_folder=out_ga_folder,
            std_error_folder=std_error_folder,
            diagnostic_folder=diagnostic_folder,
            shapefile_folder_name=gis_path+r"\shapefiles\FourSeasons_All",
            start_year_included=start_year_included  # Pass the variable to the function
        )
        e_time = time.time()

        # Write data row, conditionally include year based on the setting
        data_row = [
            seasons_all.iloc[i]["WaterBody"], 
            seasons_all.iloc[i]['Season'],
            seasons_all.iloc[i]["Parameter"],
            file_loc, count, rmse, me,
            covariates_dict[area_shortnames[seasons_all.iloc[i]["WaterBody"]]]
        ]
        if start_year_included:
            data_row.insert(1, seasons_all.iloc[i]["Start Year"])
            data_row.insert(2, seasons_all.iloc[i]["End Year"])

        print(f"{int(e_time - s_time)} seconds elapsed for processing {count} points in {i}th row: RMSE: {rmse}, ME: {me}, file exported to {file_loc}")
        csv_writer.writerow(data_row)
        if i % 10 == 0:
            csvfile.flush()  # Flush the csv file every 10 rows.

Processing file: SHP_CH_TN_mgl_2017_Spring.shp
--- Time lapse: 1037.6216125488281 seconds ---
1038 seconds elapsed for processing 0 points in 0th row: RMSE: 0.282670778246, ME: 0.00631307578716, file exported to E:/Projects/SEACAR_WQ_2024/GIS_Data/raster_output/FourSeasons_RK_all/CH_TN_mgl_2017_Spring_RK.tif
Processing file: SHP_CH_TN_mgl_2017_Summer.shp
--- Time lapse: 787.626387834549 seconds ---
788 seconds elapsed for processing 0 points in 1th row: RMSE: 0.111242486399, ME: -0.000411451659123, file exported to E:/Projects/SEACAR_WQ_2024/GIS_Data/raster_output/FourSeasons_RK_all/CH_TN_mgl_2017_Summer_RK.tif
Processing file: SHP_CH_TN_mgl_2017_Fall.shp
--- Time lapse: 966.3231992721558 seconds ---
966 seconds elapsed for processing 0 points in 2th row: RMSE: 0.194217073213, ME: -0.00301999152275, file exported to E:/Projects/SEACAR_WQ_2024/GIS_Data/raster_output/FourSeasons_RK_all/CH_TN_mgl_2017_Fall_RK.tif
Processing file: SHP_CH_TN_mgl_2017_Winter.shp
--- Time lapse: 1072.78054332