# RK Interpolation

This document includes Python codes that conduct Regression Kriging (RK) Interpolation on each waterbody, including Guana Tolomato Matanzas (GTM), Estero Bay (EB), Charlotte Harbor (CH), Biscayne Bay (BB), Big Bend Seagrasses (BBS).
parameters, including Dissolved oxygen (DO_mgl), Salinity (Sal_ppt), Turbidity (Turb_ntu), Temperature (T_c), Secchi (Secc_m), Total Nitrogen (TN_mgl) in arcpy environment.

The analysis is conducted in the separate managed parameters of Total Nitrogen (TN_mgl), Dissolved oxygen (DO_mgl), Salinity (Sal_ppt), Turbidity (Turb_ntu), Temperature (T_c), and Secchi (Secc_m) in arcpy environment.

* [1. Data Preprocess](#preprocessing)
* [2. Generate Shapefiles](#create_shp)
* [3. Regression Kriging for All Stations](#rk_all)

## Load packages

In [1]:
import pandas as pd
import numpy  as np
import arcpy
from arcpy.sa import *
import os, time, math, importlib, sys
path = r'E:\Projects\SEACAR_WQ_2024\git\misc'
sys.path.insert(0, path)
import idw_rk
# !install conda install conda-forge::pyproj
import pyproj,csv

importlib.reload(idw_rk)

import warnings
warnings.filterwarnings('ignore')

# define scratch folder to avoid overwritting from parallel threats
arcpy.env.scratchWorkspace = r"E:\Projects\SEACAR_WQ_2024\scratch/RK_all"
arcpy.env.overwriteOutput = True

# 1. Preprocessing <a class="anchor" id="preprocessing"></a>

## 1.1 Load csv files

In [2]:
gis_path = r'E:/Projects/SEACAR_WQ_2024/GIS_Data/'

dfCon = pd.read_csv(gis_path + 'OEAT_Continuous_WQ-2024-Feb-21.csv', low_memory=False)
dfDis = pd.read_csv(gis_path + 'OEAT_Discrete_WQ-2024-May-06.csv', low_memory=False)


## 1.2 Select data from 8 am to 18 pm in the continuous data

In [3]:
# Convert string to datetime
dfDis['SampleDate'] = pd.to_datetime(dfDis['SampleDate'], format='%Y-%m-%d %H:%M:%S.%f')
dfCon['SampleDate'] = pd.to_datetime(dfCon['SampleDate'], format='%Y-%m-%d %H:%M:%S.%f')


# Include date from 08:00 am to 18:00 pm
start_time = '08:00'
end_time = '18:00'

dfConTime = dfCon[dfCon['SampleDate'].dt.time.between(pd.to_datetime(start_time).time(), pd.to_datetime(end_time).time())]

# Concatenate time-filtered continuous and discrete data
dfAll = pd.concat([dfDis, dfConTime], ignore_index=True)

## 1.3 Calculating average values at unique observation points

In [4]:
dfAll_Mean = dfAll.groupby(['WaterBody','ParameterName','ParameterUnits', 'Year','Season','Latitude_DD','Longitude_DD','WbodyAcronym'])["ResultValue"].agg("mean").reset_index()
dfAll = dfAll_Mean

## 1.4 Convert coordinate system to EPSG: 3086

In [5]:
# Define the EPSG codes for source (EPSG:4326) and target (EPSG:3086) coordinate systems
source_epsg = 'EPSG:4326'
target_epsg = 'EPSG:3086'

# Create a PyProj Transformer for the conversion
transformer = pyproj.Transformer.from_crs(source_epsg, target_epsg, always_xy=True)

# Define a function to apply the transformation to each row of the DataFrame
def transform_coordinates(row):
    x, y = transformer.transform(row['Longitude_DD'], row['Latitude_DD'])
    return pd.Series({'x': x, 'y': y})

# Apply the transformation function to the DataFrame and create new columns for the converted coordinates
dfAll[['x', 'y']] = dfAll.apply(transform_coordinates, axis=1)

#### Save aggregated data to csv file

In [6]:
dfAll.to_csv(gis_path + 'OEAT_All_WQ-2024-May-06.csv', index=False)

## 2. Prepare for batch interpolation
### 2.1 Preset abbreviation for waterbody and parameter name

In [7]:
area_shortnames = {
    'Guana Tolomato Matanzas': 'GTM',
    'Estero Bay': 'EB',
    'Charlotte Harbor': 'CH',
    'Biscayne Bay': 'BB',
    'Big Bend Seagrasses':'BBS'
}

param_shortnames = {
    'Salinity': 'Sal_ppt',
    'Total Nitrogen': 'TN_mgl',
    'Dissolved Oxygen': 'DO_mgl',
    'Turbidity':'Turb_ntu',
    'Secchi Depth':'Secc_m',
    'Water Temperature':'T_c'
}

# Set input parameters
waterbody_names = [
    'Guana Tolomato Matanzas',
    'Estero Bay',
    'Charlotte Harbor',
    'Biscayne Bay',
    'Big Bend Seagrasses'
]

covariates_dict = {
    "GTM":"LDI",
    "EB":"bathymetry+LDI+popden",
    "CH":"bathymetry+LDI+popden+water_flow_wet",
    "BB":"bathymetry+LDI+popden",
    "BBS":"bathymetry+LDI"
}

parameter_names = ['Dissolved Oxygen', 'Salinity', 'Secchi Depth', 'Total Nitrogen', 'Turbidity', 'Water Temperature']
# years = unique_years
seasons = ['Fall', 'Spring', 'Summer', 'Winter']
# shp_folder = gis_path + r"shapefiles_All"
shp_folder = gis_path + r"shapefiles"

### 2.3 Load the table of study periods,  parameters, and seasons

In [23]:
seasons_all = pd.read_csv(gis_path + 'Seasons_all.csv', low_memory=False)

### 2.4 Define output folders


In [9]:
shpAll_folder = gis_path + r"shapefiles/shapefiles_All" 

# Preview dataset
dfAll

Unnamed: 0,WaterBody,ParameterName,ParameterUnits,Year,Season,Latitude_DD,Longitude_DD,WbodyAcronym,ResultValue,x,y
0,Big Bend Seagrasses,Dissolved Oxygen,mg/L,2015,Fall,29.008300,-82.825250,BBS,6.350000,514236.421551,556316.396318
1,Big Bend Seagrasses,Dissolved Oxygen,mg/L,2015,Fall,29.036716,-83.129066,BBS,6.200000,484670.524231,559226.858975
2,Big Bend Seagrasses,Dissolved Oxygen,mg/L,2015,Fall,29.046916,-83.033200,BBS,7.100000,493981.422006,560428.927830
3,Big Bend Seagrasses,Dissolved Oxygen,mg/L,2015,Fall,29.054833,-82.758666,BBS,6.500000,520659.055377,561546.969670
4,Big Bend Seagrasses,Dissolved Oxygen,mg/L,2015,Fall,29.056800,-83.059133,BBS,6.000000,491452.167490,561506.993606
...,...,...,...,...,...,...,...,...,...,...,...
77793,Guana Tolomato Matanzas,Water Temperature,Degrees C,2023,Summer,30.025360,-81.370918,GTM,29.150000,653237.586095,671395.945419
77794,Guana Tolomato Matanzas,Water Temperature,Degrees C,2023,Summer,30.026440,-81.369403,GTM,29.500000,653380.952596,671518.961956
77795,Guana Tolomato Matanzas,Water Temperature,Degrees C,2023,Summer,30.033611,-81.353027,GTM,29.766667,654940.976043,672348.548670
77796,Guana Tolomato Matanzas,Water Temperature,Degrees C,2023,Summer,30.050338,-81.371008,GTM,29.675000,653169.830710,674167.856772


### 2.6 Fill NaN RowID with unique ID, IDW function needs unique ID <a class="anchor" id="reg_id"></a>#

In [10]:
idw_rk.fill_nan_rowids(dfAll, 'RowID')

# Keep RowID as integer
dfAll['RowID'] = dfAll['RowID'].astype(int)
dfAll

Unnamed: 0,WaterBody,ParameterName,ParameterUnits,Year,Season,Latitude_DD,Longitude_DD,WbodyAcronym,ResultValue,x,y,RowID
0,Big Bend Seagrasses,Dissolved Oxygen,mg/L,2015,Fall,29.008300,-82.825250,BBS,6.350000,514236.421551,556316.396318,1
1,Big Bend Seagrasses,Dissolved Oxygen,mg/L,2015,Fall,29.036716,-83.129066,BBS,6.200000,484670.524231,559226.858975,2
2,Big Bend Seagrasses,Dissolved Oxygen,mg/L,2015,Fall,29.046916,-83.033200,BBS,7.100000,493981.422006,560428.927830,3
3,Big Bend Seagrasses,Dissolved Oxygen,mg/L,2015,Fall,29.054833,-82.758666,BBS,6.500000,520659.055377,561546.969670,4
4,Big Bend Seagrasses,Dissolved Oxygen,mg/L,2015,Fall,29.056800,-83.059133,BBS,6.000000,491452.167490,561506.993606,5
...,...,...,...,...,...,...,...,...,...,...,...,...
77793,Guana Tolomato Matanzas,Water Temperature,Degrees C,2023,Summer,30.025360,-81.370918,GTM,29.150000,653237.586095,671395.945419,77794
77794,Guana Tolomato Matanzas,Water Temperature,Degrees C,2023,Summer,30.026440,-81.369403,GTM,29.500000,653380.952596,671518.961956,77795
77795,Guana Tolomato Matanzas,Water Temperature,Degrees C,2023,Summer,30.033611,-81.353027,GTM,29.766667,654940.976043,672348.548670,77796
77796,Guana Tolomato Matanzas,Water Temperature,Degrees C,2023,Summer,30.050338,-81.371008,GTM,29.675000,653169.830710,674167.856772,77797


# 3. Create Shapefiles <a class="anchor" id="reg_create_shp"></a>

In [11]:
# Empty the shapefile folder
# idw_rk.delete_all_files(shpAll_folder)

In [12]:
# Merge interested with latitude and longitude columns
seasons_all_coord = idw_rk.merge_with_lat_long(seasons_all, dfAll)

In [13]:
seasons_all_coord.head()

Unnamed: 0,WaterBody,Year,Season,Parameter,Filename,NumDataPoints,RMSE,ME,x,y,RowID,ResultValue
0,Guana Tolomato Matanzas,2015,Fall,Total Nitrogen,,0,,,669975.848287,626752.656623,75890,0.212233
1,Guana Tolomato Matanzas,2015,Fall,Total Nitrogen,,0,,,662275.840738,630059.18747,75891,1.0345
2,Guana Tolomato Matanzas,2015,Fall,Total Nitrogen,,0,,,667035.271306,631036.021679,75892,1.3665
3,Guana Tolomato Matanzas,2015,Fall,Total Nitrogen,,0,,,668862.259531,631692.835328,75893,0.192567
4,Guana Tolomato Matanzas,2015,Fall,Total Nitrogen,,0,,,665055.970903,631868.535738,75894,0.862


### Create Shapefiles

In [80]:
# Clean the shapefile folder if necessary
# RK.delete_all_files(shp_folder)
# Print number of data points in each shapefile
idw_rk.create_shp_season(seasons_all_coord, shpAll_folder)

Number of data rows for BBS, DO_mgl, 2020, Fall: 31
Shapefile for BBS: DO_mgl for year 2020 and season Fall has been saved as SHP_BBS_DO_mgl_2020_Fall.shp
Number of data rows for BBS, Sal_ppt, 2020, Fall: 26
Shapefile for BBS: Sal_ppt for year 2020 and season Fall has been saved as SHP_BBS_Sal_ppt_2020_Fall.shp
Number of data rows for BBS, Secc_m, 2020, Fall: 30
Shapefile for BBS: Secc_m for year 2020 and season Fall has been saved as SHP_BBS_Secc_m_2020_Fall.shp
Number of data rows for BBS, TN_mgl, 2020, Fall: 24
Shapefile for BBS: TN_mgl for year 2020 and season Fall has been saved as SHP_BBS_TN_mgl_2020_Fall.shp
Number of data rows for BBS, Turb_ntu, 2020, Fall: 31
Shapefile for BBS: Turb_ntu for year 2020 and season Fall has been saved as SHP_BBS_Turb_ntu_2020_Fall.shp
Number of data rows for BBS, T_c, 2020, Fall: 31
Shapefile for BBS: T_c for year 2020 and season Fall has been saved as SHP_BBS_T_c_2020_Fall.shp
Number of data rows for BBS, DO_mgl, 2020, Summer: 30
Shapefile for BB

Shapefile for BB: Turb_ntu for year 2021 and season Fall has been saved as SHP_BB_Turb_ntu_2021_Fall.shp
Number of data rows for BB, T_c, 2021, Fall: 86
Shapefile for BB: T_c for year 2021 and season Fall has been saved as SHP_BB_T_c_2021_Fall.shp
Number of data rows for BB, DO_mgl, 2021, Summer: 88
Shapefile for BB: DO_mgl for year 2021 and season Summer has been saved as SHP_BB_DO_mgl_2021_Summer.shp
Number of data rows for BB, Sal_ppt, 2021, Summer: 66
Shapefile for BB: Sal_ppt for year 2021 and season Summer has been saved as SHP_BB_Sal_ppt_2021_Summer.shp
Number of data rows for BB, Secc_m, 2021, Summer: 1
Shapefile for BB: Secc_m for year 2021 and season Summer has been saved as SHP_BB_Secc_m_2021_Summer.shp
Number of data rows for BB, TN_mgl, 2021, Summer: 81
Shapefile for BB: TN_mgl for year 2021 and season Summer has been saved as SHP_BB_TN_mgl_2021_Summer.shp
Number of data rows for BB, Turb_ntu, 2021, Summer: 66
Shapefile for BB: Turb_ntu for year 2021 and season Summer has 

Shapefile for CH: T_c for year 2016 and season Summer has been saved as SHP_CH_T_c_2016_Summer.shp
Number of data rows for CH, DO_mgl, 2016, Winter: 393
Shapefile for CH: DO_mgl for year 2016 and season Winter has been saved as SHP_CH_DO_mgl_2016_Winter.shp
Number of data rows for CH, Sal_ppt, 2016, Winter: 401
Shapefile for CH: Sal_ppt for year 2016 and season Winter has been saved as SHP_CH_Sal_ppt_2016_Winter.shp
Number of data rows for CH, Secc_m, 2016, Winter: 342
Shapefile for CH: Secc_m for year 2016 and season Winter has been saved as SHP_CH_Secc_m_2016_Winter.shp
Number of data rows for CH, TN_mgl, 2016, Winter: 55
Shapefile for CH: TN_mgl for year 2016 and season Winter has been saved as SHP_CH_TN_mgl_2016_Winter.shp
Number of data rows for CH, Turb_ntu, 2016, Winter: 51
Shapefile for CH: Turb_ntu for year 2016 and season Winter has been saved as SHP_CH_Turb_ntu_2016_Winter.shp
Number of data rows for CH, T_c, 2016, Winter: 429
Shapefile for CH: T_c for year 2016 and season W

Shapefile for EB: Turb_ntu for year 2016 and season Winter has been saved as SHP_EB_Turb_ntu_2016_Winter.shp
Number of data rows for EB, T_c, 2016, Winter: 52


ExecuteError: ERROR 000464: Cannot get exclusive schema lock.  Either being edited or in use by another application or service.
Failed to execute (AddSpatialIndex).


Error occurred: ERROR 000582: Error occurred during execution.

Number of data rows for EB, DO_mgl, 2017, Fall: 45
Shapefile for EB: DO_mgl for year 2017 and season Fall has been saved as SHP_EB_DO_mgl_2017_Fall.shp
Number of data rows for EB, Sal_ppt, 2017, Fall: 15
Shapefile for EB: Sal_ppt for year 2017 and season Fall has been saved as SHP_EB_Sal_ppt_2017_Fall.shp
Number of data rows for EB, Secc_m, 2017, Fall: 4
Shapefile for EB: Secc_m for year 2017 and season Fall has been saved as SHP_EB_Secc_m_2017_Fall.shp
Number of data rows for EB, TN_mgl, 2017, Fall: 35
Shapefile for EB: TN_mgl for year 2017 and season Fall has been saved as SHP_EB_TN_mgl_2017_Fall.shp
Number of data rows for EB, Turb_ntu, 2017, Fall: 39
Shapefile for EB: Turb_ntu for year 2017 and season Fall has been saved as SHP_EB_Turb_ntu_2017_Fall.shp
Number of data rows for EB, T_c, 2017, Fall: 45
Shapefile for EB: T_c for year 2017 and season Fall has been saved as SHP_EB_T_c_2017_Fall.shp
Number of data rows for E

Shapefile for GTM: T_c for year 2016 and season Spring has been saved as SHP_GTM_T_c_2016_Spring.shp
Number of data rows for GTM, DO_mgl, 2016, Summer: 78
Shapefile for GTM: DO_mgl for year 2016 and season Summer has been saved as SHP_GTM_DO_mgl_2016_Summer.shp
Number of data rows for GTM, Sal_ppt, 2016, Summer: 76
Shapefile for GTM: Sal_ppt for year 2016 and season Summer has been saved as SHP_GTM_Sal_ppt_2016_Summer.shp
No valid data found for area: GTM, parameter: Secc_m, year: 2016, and season: Summer
Number of data rows for GTM, TN_mgl, 2016, Summer: 5
Shapefile for GTM: TN_mgl for year 2016 and season Summer has been saved as SHP_GTM_TN_mgl_2016_Summer.shp
Number of data rows for GTM, Turb_ntu, 2016, Summer: 9
Shapefile for GTM: Turb_ntu for year 2016 and season Summer has been saved as SHP_GTM_Turb_ntu_2016_Summer.shp
Number of data rows for GTM, T_c, 2016, Summer: 45
Shapefile for GTM: T_c for year 2016 and season Summer has been saved as SHP_GTM_T_c_2016_Summer.shp
Number of d


# 3. Regression Kriging for both continuous and discrete data<a class="anchor" id="rk_all"></a>

## Loop for all parameters


### Clean the output folder

In [14]:
out_raster_floder = gis_path + r"raster_output/rk_All/"
out_ga_folder     = gis_path + r"ga_output_rk/"
diagnostic_folder = gis_path + r"diagnostic_rk/"
std_error_folder  = gis_path + r"std_error_pred/std_error_rk_2s/"

# Clean existing files in folders
# idw_rk.delete_all_files(out_raster_floder)
# idw_rk.delete_all_files(out_ga_folder)
# idw_rk.delete_all_files(diagnostic_folder)
# idw_rk.delete_all_files(std_error_folder)

In [25]:
seasons_all = seasons_all[seasons_all['WaterBody']!= 'Big Bend Seagrasses']

In [26]:
importlib.reload(idw_rk)

# Write the output in a csv file
with open(gis_path+"rk_results_temp.csv", 'w', newline='') as csvfile:
    csv_writer = csv.writer(csvfile)
    # Write the header line
    cols = list(seasons_all.columns)
    cols.append('covariates')
    csv_writer.writerow(cols)
    
    for i in seasons_all.index:
        s_time =time.time() 
        process,rmse,me,count,file_loc = idw_rk.rk_interpolation(method = "rk",
                                           radius = 10000,
                                           folder_path = gis_path,
                                           shp_path = shpAll_folder,
                                           waterbody = area_shortnames[seasons_all.loc[i]["WaterBody"]],
                                           parameter = param_shortnames[seasons_all.loc[i]["Parameter"]],
                                           year      = seasons_all.loc[i]["Year"],
                                           season    = seasons_all.loc[i]['Season'],
                                           covariates= covariates_dict[area_shortnames[seasons_all.loc[i]["WaterBody"]]],
                                           out_raster_folder = out_raster_floder,
                                           out_ga_folder     = out_ga_folder,
                                           std_error_folder  = std_error_folder,                  
                                           diagnostic_folder = diagnostic_folder)
        e_time =time.time()

        print(f"{int(e_time-s_time)} seconds elapsed for processing {count} points in {i}th row: RMSE: {rmse}, ME: {me}, file exported to {file_loc}")
        csv_writer.writerow([seasons_all.loc[i]["WaterBody"], 
                             seasons_all.loc[i]["Year"],
                             seasons_all.loc[i]['Season'],
                             seasons_all.loc[i]["Parameter"],
                             file_loc, count, rmse, me,
                             covariates_dict[area_shortnames[seasons_all.loc[i]["WaterBody"]]]])
        if i%10 == 0: csvfile.flush() # flush the csv file in every 20 rows.
#         seasons_all['RMSE'][i:i+1] = rmse
#         seasons_all['ME'][i:i+1] = me
#         seasons_all['NumDataPoints'][i:i+1] = count
#         seasons_all['Filename'][i:i+1] = file_loc
#     seasons_all.to_csv(gis_path+"result_RK_all.csv")

Processing file: SHP_GTM_TN_mgl_2015_Fall.shp
--- Time lapse: 61.736483097076416 seconds ---
62 seconds elapsed for processing 10 points in 0th row: RMSE: 0.44906412833, ME: 0.0557400542326, file exported to E:/Projects/SEACAR_WQ_2024/GIS_Data/raster_output/rk_All/GTM_TN_mgl_2015_Fall_RK.tif
TN_mgl in 2015 Winter caused an error:
ERROR 040039: Not enough data to compute method.
Failed to execute (EBKRegressionPrediction).

0 seconds elapsed for processing 6 points in 1th row: RMSE: nan, ME: nan, file exported to nan
TN_mgl in 2016 Spring caused an error:
ERROR 040039: Not enough data to compute method.
Failed to execute (EBKRegressionPrediction).

0 seconds elapsed for processing 5 points in 2th row: RMSE: nan, ME: nan, file exported to nan
TN_mgl in 2016 Summer caused an error:
ERROR 040039: Not enough data to compute method.
Failed to execute (EBKRegressionPrediction).

1 seconds elapsed for processing 5 points in 3th row: RMSE: nan, ME: nan, file exported to nan
TN_mgl in 2016 Fall 

Processing file: SHP_BB_TN_mgl_2022_Fall.shp
--- Time lapse: 222.91591572761536 seconds ---
222 seconds elapsed for processing 59 points in 29th row: RMSE: 0.163460759484, ME: -0.00745043172939, file exported to E:/Projects/SEACAR_WQ_2024/GIS_Data/raster_output/rk_All/BB_TN_mgl_2022_Fall_RK.tif
Processing file: SHP_BB_TN_mgl_2022_Winter.shp
--- Time lapse: 325.83646035194397 seconds ---
325 seconds elapsed for processing 59 points in 30th row: RMSE: 0.109330580966, ME: 3.53813703276e-05, file exported to E:/Projects/SEACAR_WQ_2024/GIS_Data/raster_output/rk_All/BB_TN_mgl_2022_Winter_RK.tif
Processing file: SHP_BB_TN_mgl_2023_Spring.shp
--- Time lapse: 329.25987815856934 seconds ---
329 seconds elapsed for processing 65 points in 31th row: RMSE: 0.117502694627, ME: -0.00442517741538, file exported to E:/Projects/SEACAR_WQ_2024/GIS_Data/raster_output/rk_All/BB_TN_mgl_2023_Spring_RK.tif
Processing file: SHP_GTM_Sal_ppt_2015_Fall.shp
--- Time lapse: 143.75197386741638 seconds ---
143 second

Processing file: SHP_BB_Sal_ppt_2021_Fall.shp
--- Time lapse: 232.11251068115234 seconds ---
232 seconds elapsed for processing 66 points in 65th row: RMSE: 4.56409444488, ME: -0.0850729997401, file exported to E:/Projects/SEACAR_WQ_2024/GIS_Data/raster_output/rk_All/BB_Sal_ppt_2021_Fall_RK.tif
Processing file: SHP_BB_Sal_ppt_2021_Winter.shp
--- Time lapse: 320.42043828964233 seconds ---
320 seconds elapsed for processing 66 points in 66th row: RMSE: 6.03256089796, ME: 0.038386862525, file exported to E:/Projects/SEACAR_WQ_2024/GIS_Data/raster_output/rk_All/BB_Sal_ppt_2021_Winter_RK.tif
Processing file: SHP_BB_Sal_ppt_2022_Spring.shp
--- Time lapse: 161.29373621940613 seconds ---
161 seconds elapsed for processing 52 points in 67th row: RMSE: 5.01371555608, ME: 0.294147473036, file exported to E:/Projects/SEACAR_WQ_2024/GIS_Data/raster_output/rk_All/BB_Sal_ppt_2022_Spring_RK.tif
Processing file: SHP_BB_Sal_ppt_2022_Summer.shp
--- Time lapse: 145.5575590133667 seconds ---
145 seconds el

Processing file: SHP_CH_DO_mgl_2017_Fall.shp
--- Time lapse: 6482.056753396988 seconds ---
6482 seconds elapsed for processing 445 points in 101th row: RMSE: 2.00364047873, ME: 0.0471114169257, file exported to E:/Projects/SEACAR_WQ_2024/GIS_Data/raster_output/rk_All/CH_DO_mgl_2017_Fall_RK.tif
Processing file: SHP_CH_DO_mgl_2017_Winter.shp
--- Time lapse: 5238.452392339706 seconds ---
5238 seconds elapsed for processing 375 points in 102th row: RMSE: 1.25127377496, ME: 0.00497869820688, file exported to E:/Projects/SEACAR_WQ_2024/GIS_Data/raster_output/rk_All/CH_DO_mgl_2017_Winter_RK.tif
Processing file: SHP_CH_DO_mgl_2018_Spring.shp
--- Time lapse: 1266.0542414188385 seconds ---
1266 seconds elapsed for processing 71 points in 103th row: RMSE: 0.813343412755, ME: -0.0439816160111, file exported to E:/Projects/SEACAR_WQ_2024/GIS_Data/raster_output/rk_All/CH_DO_mgl_2018_Spring_RK.tif
Processing file: SHP_BB_DO_mgl_2021_Summer.shp
--- Time lapse: 458.8295202255249 seconds ---
458 seconds

Processing file: SHP_CH_Turb_ntu_2016_Winter.shp
--- Time lapse: 887.9938464164734 seconds ---
888 seconds elapsed for processing 51 points in 138th row: RMSE: 3.58708976869, ME: 0.240021766715, file exported to E:/Projects/SEACAR_WQ_2024/GIS_Data/raster_output/rk_All/CH_Turb_ntu_2016_Winter_RK.tif
Processing file: SHP_CH_Turb_ntu_2017_Spring.shp
--- Time lapse: 952.5229353904724 seconds ---
952 seconds elapsed for processing 57 points in 139th row: RMSE: 4.11964558758, ME: -0.00168547595085, file exported to E:/Projects/SEACAR_WQ_2024/GIS_Data/raster_output/rk_All/CH_Turb_ntu_2017_Spring_RK.tif
Processing file: SHP_CH_Turb_ntu_2017_Summer.shp
--- Time lapse: 798.6594116687775 seconds ---
798 seconds elapsed for processing 42 points in 140th row: RMSE: 2.76503839865, ME: 0.196666977974, file exported to E:/Projects/SEACAR_WQ_2024/GIS_Data/raster_output/rk_All/CH_Turb_ntu_2017_Summer_RK.tif
Processing file: SHP_CH_Turb_ntu_2017_Fall.shp
--- Time lapse: 874.0147352218628 seconds ---
874 

Processing file: SHP_CH_Secc_m_2016_Winter.shp
--- Time lapse: 5298.427699565887 seconds ---
5298 seconds elapsed for processing 342 points in 178th row: RMSE: 0.411268430022, ME: 0.000213688471797, file exported to E:/Projects/SEACAR_WQ_2024/GIS_Data/raster_output/rk_All/CH_Secc_m_2016_Winter_RK.tif
Processing file: SHP_CH_Secc_m_2017_Spring.shp
--- Time lapse: 5239.873451471329 seconds ---
5240 seconds elapsed for processing 351 points in 179th row: RMSE: 0.441179751868, ME: -0.0166103280277, file exported to E:/Projects/SEACAR_WQ_2024/GIS_Data/raster_output/rk_All/CH_Secc_m_2017_Spring_RK.tif
Processing file: SHP_CH_Secc_m_2017_Summer.shp
--- Time lapse: 5605.274613380432 seconds ---
5605 seconds elapsed for processing 339 points in 180th row: RMSE: 0.325121251156, ME: -0.000487827425113, file exported to E:/Projects/SEACAR_WQ_2024/GIS_Data/raster_output/rk_All/CH_Secc_m_2017_Summer_RK.tif
Processing file: SHP_CH_Secc_m_2017_Fall.shp
--- Time lapse: 5319.992224693298 seconds ---
532

Processing file: SHP_CH_T_c_2016_Fall.shp
--- Time lapse: 6437.612372159958 seconds ---
6437 seconds elapsed for processing 510 points in 217th row: RMSE: 2.81238055145, ME: -0.106348110797, file exported to E:/Projects/SEACAR_WQ_2024/GIS_Data/raster_output/rk_All/CH_T_c_2016_Fall_RK.tif
Processing file: SHP_CH_T_c_2016_Winter.shp
--- Time lapse: 7011.407678604126 seconds ---
7011 seconds elapsed for processing 429 points in 218th row: RMSE: 3.01953717431, ME: -0.0968352592513, file exported to E:/Projects/SEACAR_WQ_2024/GIS_Data/raster_output/rk_All/CH_T_c_2016_Winter_RK.tif
Processing file: SHP_CH_T_c_2017_Spring.shp
--- Time lapse: 6873.923624753952 seconds ---
6874 seconds elapsed for processing 440 points in 219th row: RMSE: 2.72615264734, ME: -0.110811130155, file exported to E:/Projects/SEACAR_WQ_2024/GIS_Data/raster_output/rk_All/CH_T_c_2017_Spring_RK.tif
Processing file: SHP_CH_T_c_2017_Summer.shp
--- Time lapse: 6946.810841083527 seconds ---
6946 seconds elapsed for processin

### Select the abnormal value from the generated results, here we defined that if the RMSE is too low (<-10000), it is the abnormal value.

In [14]:
error_row = pd.read_csv(gis_path+"rk_results_temp.csv")
error_row_new = error_row.loc[error_row["RMSE"]<-10000]
#error_row_new = error_row_new.drop("Unnamed: 0",axis=1)

error_row_new.to_csv(gis_path+"temp_error.csv")

### Reset the smooth_radius from 10 km to 50 km

In [18]:
error_row_new

Unnamed: 0,WaterBody,Year,Season,Parameter,Filename,NumDataPoints,RMSE,ME,covariates
32,Big Bend Seagrasses,2020,Summer,Total Nitrogen,E:/Projects/SEACAR_WQ_2024/GIS_Data/raster_out...,21,-1.797693e+308,-1.797693e+308,bathymetry+LDI
33,Big Bend Seagrasses,2020,Fall,Total Nitrogen,E:/Projects/SEACAR_WQ_2024/GIS_Data/raster_out...,24,-1.797693e+308,-1.797693e+308,bathymetry+LDI
34,Big Bend Seagrasses,2020,Winter,Total Nitrogen,E:/Projects/SEACAR_WQ_2024/GIS_Data/raster_out...,34,-1.797693e+308,-1.797693e+308,bathymetry+LDI
35,Big Bend Seagrasses,2021,Spring,Total Nitrogen,E:/Projects/SEACAR_WQ_2024/GIS_Data/raster_out...,37,-1.797693e+308,-1.797693e+308,bathymetry+LDI
36,Big Bend Seagrasses,2021,Summer,Total Nitrogen,E:/Projects/SEACAR_WQ_2024/GIS_Data/raster_out...,33,-1.797693e+308,-1.797693e+308,bathymetry+LDI
37,Big Bend Seagrasses,2021,Fall,Total Nitrogen,E:/Projects/SEACAR_WQ_2024/GIS_Data/raster_out...,34,-1.797693e+308,-1.797693e+308,bathymetry+LDI
38,Big Bend Seagrasses,2021,Winter,Total Nitrogen,E:/Projects/SEACAR_WQ_2024/GIS_Data/raster_out...,31,-1.797693e+308,-1.797693e+308,bathymetry+LDI
39,Big Bend Seagrasses,2022,Spring,Total Nitrogen,E:/Projects/SEACAR_WQ_2024/GIS_Data/raster_out...,25,-1.797693e+308,-1.797693e+308,bathymetry+LDI
72,Big Bend Seagrasses,2020,Summer,Salinity,E:/Projects/SEACAR_WQ_2024/GIS_Data/raster_out...,24,-1.797693e+308,-1.797693e+308,bathymetry+LDI
73,Big Bend Seagrasses,2020,Fall,Salinity,E:/Projects/SEACAR_WQ_2024/GIS_Data/raster_out...,26,-1.797693e+308,-1.797693e+308,bathymetry+LDI


In [19]:
# Write the output in a csv file
with open(gis_path+"temp_error.csv", 'w', newline='') as csvfile:
    csv_writer = csv.writer(csvfile)
    # Write the header line
    cols = list(error_row_new.columns)
    cols.append('covariates')
    csv_writer.writerow(cols)
    
    for i in error_row_new.index:
        s_time =time.time() 
        process,rmse,me,count,file_loc = idw_rk.rk_interpolation(method = "rk",
                                         radius = 50000,
                                         folder_path = shpAll_folder,
                                         covariate_path = gis_path + 'covariates',
                                         waterbody_path = gis_path + "managed_area_boundary/",
                                         waterbody = area_shortnames[error_row_new.loc[i]["WaterBody"]],
                                         parameter = param_shortnames[error_row_new.loc[i]["Parameter"]],
                                         year      = error_row_new.loc[i]["Year"],
                                         season    = error_row_new.loc[i]['Season'],
                                         covariates= covariates_dict[area_shortnames[error_row_new.loc[i]["WaterBody"]]],
                                         out_raster_folder = out_raster_floder,
                                         out_ga_folder     = out_ga_folder,
                                         std_error_folder  = std_error_folder,
                                         diagnostic_folder = diagnostic_folder)
        e_time =time.time()
        print(f"{int(e_time-s_time)} seconds elapsed for processing {count} points in {i}th row: RMSE: {rmse}, ME: {me}, file exported to {file_loc}")
        csv_writer.writerow([error_row_new.loc[i]["WaterBody"], 
                             error_row_new.loc[i]["Year"],
                             error_row_new.loc[i]['Season'],
                             error_row_new.loc[i]["Parameter"],
                             file_loc, count, rmse, me,
                             covariates_dict[area_shortnames[error_row_new.loc[i]["WaterBody"]]]])
        if i%10 == 0: csvfile.flush() # flush the csv file in every 20 rows.
#         seasons_all['RMSE'][i:i+1] = rmse
#         seasons_all['ME'][i:i+1] = me
#         seasons_all['NumDataPoints'][i:i+1] = count
#         seasons_all['Filename'][i:i+1] = file_loc
#     seasons_all.to_csv(gis_path+"result_RK_all.csv")

Processing file: SHP_BBS_TN_mgl_2020_Summer.shp
--- Time lapse: 1243.7007274627686 seconds ---
1243 seconds elapsed for processing 21 points in 32th row: RMSE: 0.36269660957, ME: -0.00546533503078, file exported to E:/Projects/SEACAR_WQ_2024/GIS_Data/raster_output/rk_All/BBS_TN_mgl_2020_Summer_RK.tif
Processing file: SHP_BBS_TN_mgl_2020_Fall.shp
--- Time lapse: 1412.3664746284485 seconds ---
1412 seconds elapsed for processing 24 points in 33th row: RMSE: 0.119020045865, ME: -0.00873171146534, file exported to E:/Projects/SEACAR_WQ_2024/GIS_Data/raster_output/rk_All/BBS_TN_mgl_2020_Fall_RK.tif
Processing file: SHP_BBS_TN_mgl_2020_Winter.shp
--- Time lapse: 2343.838243484497 seconds ---
2343 seconds elapsed for processing 34 points in 34th row: RMSE: 0.202855833315, ME: 0.00810422364266, file exported to E:/Projects/SEACAR_WQ_2024/GIS_Data/raster_output/rk_All/BBS_TN_mgl_2020_Winter_RK.tif
Processing file: SHP_BBS_TN_mgl_2021_Spring.shp
--- Time lapse: 2365.6148660182953 seconds ---
236

Processing file: SHP_BBS_Turb_ntu_2021_Summer.shp
--- Time lapse: 1846.7193732261658 seconds ---
1846 seconds elapsed for processing 37 points in 156th row: RMSE: 4.24610556119, ME: 0.126021608593, file exported to E:/Projects/SEACAR_WQ_2024/GIS_Data/raster_output/rk_All/BBS_Turb_ntu_2021_Summer_RK.tif
Processing file: SHP_BBS_Turb_ntu_2021_Fall.shp
--- Time lapse: 2188.2067263126373 seconds ---
2188 seconds elapsed for processing 35 points in 157th row: RMSE: 2.11561020138, ME: 0.0563472780595, file exported to E:/Projects/SEACAR_WQ_2024/GIS_Data/raster_output/rk_All/BBS_Turb_ntu_2021_Fall_RK.tif
Processing file: SHP_BBS_Turb_ntu_2021_Winter.shp
--- Time lapse: 2288.771324634552 seconds ---
2288 seconds elapsed for processing 35 points in 158th row: RMSE: 2.40376761844, ME: 0.11179622214, file exported to E:/Projects/SEACAR_WQ_2024/GIS_Data/raster_output/rk_All/BBS_Turb_ntu_2021_Winter_RK.tif
Processing file: SHP_BBS_Turb_ntu_2022_Spring.shp
--- Time lapse: 1913.753386259079 seconds -

### Merge the revised results into original results

In [81]:
# Specify the file paths of the CSV files you want to remove
file1 = gis_path+"rk_results_temp.csv"
file2 = gis_path+"temp_error.csv"

original_file = pd.read_csv(file1)
error_row_file= pd.read_csv(file2)

In [84]:
error_index = original_file[original_file["RMSE"]<-10000].index

abnormal_drop = original_file.drop(index=error_index)

In [85]:
new_file = pd.concat([abnormal_drop,error_row_file])
new_file = new_file.drop(columns=['covariates.1'])

### Export the final results

In [87]:
new_file.to_csv(gis_path+"rk_results.csv")

#os.remove(file1)
#os.remove(file2)

### Join IDW all data and IDW of continuous data

In [36]:
# Define result folder
result_folder = "E:/Projects/SEACAR_WQ_2024/result/result_v3/"

# Read RK results
new_file = pd.read_csv(result_folder+"rk_results.csv")

# Read IDW con and discrete result
df_idw_all = pd.read_csv(result_folder + "idw_all.csv") # read results of IDW of all samples
df_idw_con = pd.read_csv(gis_path + "raster_output/idw_Con/updated_idw.csv") # read results of IDW of continuous samples

# join idw_con with idw_all
df_idw = pd.merge(df_idw_all, df_idw_con, on=['WaterBody', 'Year', 'Season', 'Parameter'], how='left', suffixes=('_idw_all','_idw_con'))


# rename and remove certain columns
df_idw = df_idw.drop(columns = ['Filename_idw_con']).rename(columns={'Filename_idw_all':'Filename',
                                                                     'NumDataPoints_idw_all':'NumDataPoints_all',
                                                                    'NumDataPoints_idw_con':'NumDataPoints_con'})


# convert number of data points to integer and remove NaN
df_idw['NumDataPoints_con'] = df_idw['NumDataPoints_con'].fillna(0).astype(int)

### Join IDW and RK results

In [38]:
df_all = pd.merge(df_idw, new_file, on=['WaterBody', 'Year', 'Season', 'Parameter'], how='left', suffixes=('','_rk'))
df_all = df_all.drop(columns = ['Filename_rk','NumDataPoints']).rename(columns={'Filename_idw':'Filename',
                                                                                   'NumDataPoints_idw':'NumDataPoints',
                                                                                  'RMSE':'RMSE_rk',
                                                                                  'ME':'ME_rk'})
df_all.to_csv(result_folder+"rk_idw_comp.csv")

df_all

Unnamed: 0.1,WaterBody,Year,Season,Parameter,Filename,NumDataPoints_all,RMSE_idw_all,ME_idw_all,NumDataPoints_con,RMSE_idw_con,ME_idw_con,Unnamed: 0,RMSE_rk,ME_rk,covariates
0,Guana Tolomato Matanzas,2015,Fall,Total Nitrogen,SHP_GTM_TN_mgl_2015_Fall.shp,10,0.561377,0.118043,0,,,0,0.449064,0.055740,LDI
1,Guana Tolomato Matanzas,2015,Winter,Total Nitrogen,SHP_GTM_TN_mgl_2015_Winter.shp,6,0.140398,-0.037123,0,,,1,,,LDI
2,Guana Tolomato Matanzas,2016,Spring,Total Nitrogen,SHP_GTM_TN_mgl_2016_Spring.shp,5,0.147875,-0.037670,0,,,2,,,LDI
3,Guana Tolomato Matanzas,2016,Summer,Total Nitrogen,SHP_GTM_TN_mgl_2016_Summer.shp,5,0.096097,-0.026495,0,,,3,,,LDI
4,Guana Tolomato Matanzas,2016,Fall,Total Nitrogen,SHP_GTM_TN_mgl_2016_Fall.shp,5,0.163046,-0.044804,0,,,4,,,LDI
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
235,Charlotte Harbor,2017,Spring,Total Nitrogen,SHP_CH_TN_mgl_2017_Spring.shp,57,0.400803,0.004530,0,,,19,0.281052,0.004211,bathymetry+LDI+popden+water_flow_wet
236,Charlotte Harbor,2017,Summer,Total Nitrogen,SHP_CH_TN_mgl_2017_Summer.shp,42,0.149936,-0.002792,0,,,20,0.111250,-0.000281,bathymetry+LDI+popden+water_flow_wet
237,Charlotte Harbor,2017,Fall,Total Nitrogen,SHP_CH_TN_mgl_2017_Fall.shp,51,0.219190,0.004765,0,,,21,0.194828,0.000415,bathymetry+LDI+popden+water_flow_wet
238,Charlotte Harbor,2017,Winter,Total Nitrogen,SHP_CH_TN_mgl_2017_Winter.shp,61,0.160795,0.005508,0,,,22,0.124463,0.002064,bathymetry+LDI+popden+water_flow_wet
