# Cross Validation for IDW Interpolation 
## Task 2A (IDW for continuous data only)

This document includes Python codes that conduct cross validation (CV) for Inverse Distance Weighting (IDW) Interpolation on water quality parameters, including 6 water quality parameters in arcpy environment:
- Dissolved oxygen (DO_mgl)
- Salinity (Sal_ppt)
- Turbidity (Turb_ntu)
- Temperature (T_c)
- Secchi (Secc_m)
- Total Nitrogen (TN_mgl) 

The analysis is conducted in the separate water bodies:
- Guana Tolomato Matanzas (GTM)
- Estero Bay (EB)
- Charlotte Harbor (CH)
- Biscayne Bay (BB)
- Big Bend Seagrasses (BBS)

**Tasks:**  

- **Task 2A Calculate the RMSE and Mean Error (ME) for IDW results using both continuous and discrete data**

- Task 2B Calculate the RMSE and Mean Error (ME) for IDW results using continuous data.

Time periods one year before and after storm event for Task 2A tests (seasons).
<br>
<div style="text-align: left;">
    <img src="misc/TimePeriods.png" style="display: block; margin-left: 0; margin-right: auto; width: 600px;"/>
</div>

Summary of IDK and RK Accuracy Assessments.
<br>
<div style="text-align: left;">
    <img src="misc/Table3.png" style="display: block; margin-left: 0; margin-right: auto; width: 600px;"/>
</div>

**Contents:**
* [1. Data Preprocess](#reg_preprocessing)
    * [1.1 Subsetting Dataset](#reg_subset)
    * [1.2 Preview Dataset](#reg_preview)
    * [1.3 Fill Unique ID](#reg_id)
* [2. Create Shapefile](#reg_create_shp)
* [3. Cross Validation for IDW](#reg_cv_idw)

# 1. Loading packages

In [2]:
import pandas as pd
import numpy as np
import arcpy
from arcpy.sa import *
import os
import math

import importlib
import sys
# path = r'C:/Users/cong1/WQ/IDW/git/misc'
path = r'E:\Projects\SEACAR_WQ_2024\git\misc'

sys.path.insert(0, path)
import idw_rk
importlib.reload(idw_rk)

import pyproj

# define scratch folder to avoid overwritting from parallel threats
arcpy.env.scratchWorkspace = r"E:\Projects\SEACAR_WQ_2024\scratch/IDW_all"

# 1. Data Preprocessing <a class="anchor" id="reg_preprocessing"></a>

## 1.1 Load csv files

In [None]:
gis_path = r'E:\Projects\SEACAR_WQ_2024/GIS_Data/'
dfCon = pd.read_csv(gis_path + 'OEAT_Continuous_WQ-2024-Jan-16.csv', low_memory=False)

## 1.2 Subsetting Data <a class="anchor" id="reg_subset"></a>

### Selecting data from 9 am to 17 pm (daytime)

In [None]:
dfCon['SampleDate'] = pd.to_datetime(dfCon['SampleDate'], format='%b %d %Y %I:%M%p')

In [None]:
# Include date from 9:00 am to 17:00 pm
start_time = '09:00'
end_time = '17:00'

dfConTime = dfCon[dfCon['SampleDate'].dt.time.between(pd.to_datetime(start_time).time(), pd.to_datetime(end_time).time())]
dfConTime.head()

## 1.3 Calculating average values at unique observation points

In [None]:
dfCon_Mean = dfConTime.groupby(['WaterBody','ParameterName','ParameterUnits', 'Year','Season','Latitude_DD','Longitude_DD','WbodyAcronym'])["ResultValue"].agg("mean").reset_index()
dfCon_Mean

## 1.4 Convert coordinate system to EPSG: 3086

In [None]:
# Define the EPSG codes for source (EPSG:4326) and target (EPSG:3086) coordinate systems
source_epsg = 'EPSG:4326'
target_epsg = 'EPSG:3086'

# Create a PyProj Transformer for the conversion
transformer = pyproj.Transformer.from_crs(source_epsg, target_epsg, always_xy=True)

# Define a function to apply the transformation to each row of the DataFrame
def transform_coordinates(row):
    x, y = transformer.transform(row['Longitude_DD'], row['Latitude_DD'])
    return pd.Series({'x': x, 'y': y})

# Apply the transformation function to the DataFrame and create new columns for the converted coordinates
dfCon_Mean[['x', 'y']] = dfCon_Mean.apply(transform_coordinates, axis=1)

## 2. Prepare for batch interpolation
### 2.1 Preset abbreviation for waterbody and parameter name

In [None]:
area_shortnames = {
    'Guana Tolomato Matanzas': 'GTM',
    'Estero Bay': 'EB',
    'Charlotte Harbor': 'CH',
    'Biscayne Bay': 'BB',
    'Big Bend Seagrasses':'BBS'
}

param_shortnames = {
    'Salinity': 'Sal_ppt',
    'Total Nitrogen': 'TN_mgl',
    'Dissolved Oxygen': 'DO_mgl',
    'Turbidity':'Turb_ntu',
    'Secchi Depth':'Secc_m',
    'Water Temperature':'T_c'
}

### 2.2 Define the barrier files

In [None]:
barrier_folder = os.path.join(gis_path, 'Barriers')
barrier_folder

barriers = []
for file in os.listdir(barrier_folder):
    if file.endswith(".shp"):
        barriers.append(os.path.join(barrier_folder, file))

for barrier in barriers:
    print(barrier)

### 2.3 Define waterbody boundary for spatial extent and masking

In [None]:
waterbody_extent = os.path.join(gis_path, 'OEAT_Waterbody_Boundaries', 'OEAT_Waterbody_Boundary.shp')

unique_waterbodies = []
with arcpy.da.SearchCursor(waterbody_extent, ['WaterbodyA']) as cursor:
    for row in cursor:
        unique_waterbodies.append(row[0])

print("Unique Waterbodies:", unique_waterbodies)

### 2.4 Load the table of study periods,  parameters, and seasons

In [None]:
seasons_con = pd.read_csv(gis_path + 'Seasons_con.csv', low_memory=False)

### 2.5 Define output folders### Define output folders

In [None]:
shpCon_folder = gis_path + r"shapefiles_Con"
idwCon_folder = gis_path + r"idw_Con"

# Preview data
dfCon_Mean

## 2.6 Fill NaN RowID with unique ID, IDW function needs unique ID <a class="anchor" id="reg_id"></a>

In [None]:
idw_rk.fill_nan_rowids(dfCon_Mean, 'RowID')
dfCon_Mean

# 3. Create Shapefiles <a class="anchor" id="reg_create_shp"></a># 2. Create Shapefile <a class="anchor" id="reg_create_shp"></a>

In [None]:
# Empty the shapefile folder
idw_rk.delete_all_files(shpCon_folder)

# Merge interested with latitude and longitude columns
seasons_con_coord = idw_rk.merge_with_lat_long(seasons_con, dfCon_Mean)
seasons_con_coord

In [None]:
idw_rk.create_shp_season(seasons_con_coord, shpCon_folder)

# 3. Cross Validation for IDW <a class="anchor" id="reg_cv_idw"></a>

In [None]:
# Empty the shapefile folder
idw_rk.delete_all_files(idwCon_folder)

In [None]:
# Select a section of table to process
seasons_slct = seasons_all.iloc[122:]

In [None]:
# If the number of data points is less than 3，skipping calculate IDW
idw_rk.idw_interpolation(seasons_con, shpCon_folder, idwCon_folder, waterbody_extent, barrier_folder)