# Cross Validation for IDW Interpolation 
## Task 2A (continuous)

This document includes Python codes that conduct cross validation (CV) for Inverse Distance Weighting (IDW) Interpolation on water quality parameters, including 4 water quality parameters in arcpy environment:
- Dissolved oxygen (DO_mgl)
- Salinity (Sal_ppt)
- Turbidity (Turb_ntu)
- Temperature (T_c)

The analysis is conducted in the separate water bodies:
- Guana Tolomato Matanzas (GTM)
- Estero Bay (EB)
- Charlotte Harbor (CH)
- Biscayne Bay (BB)
- Big Bend Seagrasses (BBS)

**Tasks:**  

- Task 2A Calculate the RMSE and Mean Error (ME) for IDW results using both continuous and discrete data

- **Task 2B Calculate the RMSE and Mean Error (ME) for IDW results using continuous data.**


**Monthly: The SEACAR team would like the USF team to define their months as six 30-day
increments prior to the storm day, and then six 30-day increments following the storm day.** 

**Weekly: The SEACAR team would like the USF team to define their weeks as 26 7-day
increments prior to the storm day, and then 26 7-day increments following the storm day** 

**Contents:**
* [1. Data Preprocess](#reg_preprocessing)
    * [1.1 Subsetting Dataset](#reg_subset)
    * [1.2 Preview Dataset](#reg_preview)
    * [1.3 Fill Unique ID](#reg_id)
* [2. Create Shapefile](#reg_create_shp)
* [3. Cross Validation for IDW](#reg_cv_idw)
* [4. Monthly Results](#month)

In [6]:
import pandas as pd
import numpy as np
import arcpy
from arcpy.sa import *
import os
import math
import warnings

import importlib
import sys
#path = r'M:\2024\WQ\Spring\IDW\git\misc'
path = r'F:\SEACAR_WQ_2024\git\misc'
sys.path.insert(0, path)
import idw_rk
importlib.reload(idw_rk)

import pyproj

# define scratch folder to avoid overwritting from parallel threats
arcpy.env.scratchWorkspace = r"F:\SEACAR_WQ_2024\scratch/IDW_con"
warnings.filterwarnings('ignore')

# 1. Data Preprocessing <a class="anchor" id="reg_preprocessing"></a>

In [2]:
gis_path = r'F:\SEACAR_WQ_2024/GIS_Data/'
dfCon = pd.read_csv(gis_path + 'OEAT_Continuous_WQ-2024-Jan-16.csv', low_memory=False)

### Define output folder. Here is the folder setting for week.

In [3]:
shpCon_folder = gis_path + r"shapefiles_Con/week"
idwCon_folder = gis_path + r"idw_Con/week"

## 1.1 Subsetting Data <a class="anchor" id="reg_subset"></a>

### Include the time period from 9 am to 17 pm in a day

In [4]:
area_shortnames = {
    'Guana Tolomato Matanzas': 'GTM',
    'Estero Bay': 'EB',
    'Charlotte Harbor': 'CH',
    'Biscayne Bay': 'BB',
    'Big Bend Seagrasses':'BBS'
}

param_shortnames = {
    'Salinity': 'Sal_ppt',
    'Total Nitrogen': 'TN_mgl',
    'Dissolved Oxygen': 'DO_mgl',
    'Turbidity':'Turb_ntu',
    'Secchi Depth':'Secc_m',
    'Water Temperature':'T_c'
}

In [5]:
dfCon['SampleDate'] = pd.to_datetime(dfCon['SampleDate'], format='%b %d %Y %I:%M%p')

In [7]:
# Include date from 9:00 am to 17:00 pm
start_time = '09:00'
end_time = '17:00'

dfConTime = dfCon[dfCon['SampleDate'].dt.time.between(pd.to_datetime(start_time).time(), pd.to_datetime(end_time).time())]
dfConTime["SampleDate"] = pd.to_datetime(dfConTime['SampleDate'])
dfConTime.head()

Unnamed: 0,RowID,ProgramID,ParameterName,ParameterUnits,ProgramLocationID,ActivityType,SampleDate,Year,Month,RelativeDepth,ResultValue,Latitude_DD,Longitude_DD,ManagedAreaName,AreaID,SEACAR_QAQCFlagCode,WaterBody,WbodyAcronym,Season
0,88286023,474,Water Temperature,Degrees C,EB04,,2022-09-08 09:15:00,2022,9,bottom,31.3,26.449685,-81.871465,Estero Bay Aquatic Preserve,14,6Q,Estero Bay,EB,Summer
2,88291057,474,Water Temperature,Degrees C,EB01,,2022-08-30 13:00:00,2022,8,bottom,31.0,26.4349,-81.9114,Estero Bay Aquatic Preserve,14,6Q,Estero Bay,EB,Summer
3,88294267,474,Water Temperature,Degrees C,EB01,,2022-08-29 15:45:00,2022,8,bottom,31.9,26.4349,-81.9114,Estero Bay Aquatic Preserve,14,6Q,Estero Bay,EB,Summer
5,88302636,474,Water Temperature,Degrees C,EB01,,2022-09-09 15:15:00,2022,9,bottom,30.9,26.4349,-81.9114,Estero Bay Aquatic Preserve,14,6Q,Estero Bay,EB,Summer
6,88302639,474,Water Temperature,Degrees C,EB01,,2022-09-10 14:15:00,2022,9,bottom,30.5,26.4349,-81.9114,Estero Bay Aquatic Preserve,14,6Q,Estero Bay,EB,Summer


### Select the data based on the 52 weeks excel file.

In [8]:
area_ab = ["GTM","EB","CH","BB","BBS"]
period_type = [" 52 week"," Month"]
dfConTime["Period"] = ""
def select_data_period(df,area,period):
    sheet_name = str(area) + str(period)
    df_period_table = pd.read_excel(gis_path + "All_Waterbodies_Season_Month_Week_Definitions.xlsx",sheet_name=sheet_name)
    df_select_area = df[df["WbodyAcronym"]==str(area)]
    df_period_table['Start Date'] = pd.to_datetime(df_period_table['Start Date'])
    df_period_table['End Date']   = pd.to_datetime(df_period_table['End Date'])
    sub_dfs = []

    for index, row in df_period_table.iterrows():
        start_date = row['Start Date']
        end_date   = row['End Date']
        sub_df = df_select_area[(df_select_area['SampleDate'] >= start_date) & (df_select_area['SampleDate'] < end_date)]
        sub_df['Period'] = row["Week"]
        sub_dfs.append(sub_df)
        
    df_period = pd.concat(sub_dfs,ignore_index=True)
    return df_period

### Generate the aggregated mean value for weekly data.

In [9]:
sel_week_temp = []
for each in area_ab:
    df_week_temp = select_data_period(dfConTime,str(each)," 52 week")
    df_week_temp_group = df_week_temp.groupby(['WaterBody','ParameterName','ParameterUnits',
                                          'Year','Season','Latitude_DD','Longitude_DD','WbodyAcronym',"Period"])["ResultValue"].agg("mean").reset_index()
    sel_week_temp.append(df_week_temp_group)
df_week_select_Mean = pd.concat(sel_week_temp,ignore_index=True)

In [10]:
# Define the EPSG codes for source (EPSG:4326) and target (EPSG:3086) coordinate systems
source_epsg = 'EPSG:4326'
target_epsg = 'EPSG:3086'

# Create a PyProj Transformer for the conversion
transformer = pyproj.Transformer.from_crs(source_epsg, target_epsg, always_xy=True)

# Define a function to apply the transformation to each row of the DataFrame
def transform_coordinates(row):
    x, y = transformer.transform(row['Longitude_DD'], row['Latitude_DD'])
    return pd.Series({'x': x, 'y': y})

# Apply the transformation function to the DataFrame and create new columns for the converted coordinates
df_week_select_Mean[['x', 'y']] = df_week_select_Mean.apply(transform_coordinates, axis=1)

### Define the barrier files

In [11]:
barrier_folder = os.path.join(gis_path, 'Barriers')
barrier_folder

barriers = []
for file in os.listdir(barrier_folder):
    if file.endswith(".shp"):
        barriers.append(os.path.join(barrier_folder, file))

for barrier in barriers:
    print(barrier)

F:\SEACAR_WQ_2024/GIS_Data/Barriers\BBS_Barriers.shp
F:\SEACAR_WQ_2024/GIS_Data/Barriers\BB_Barriers.shp
F:\SEACAR_WQ_2024/GIS_Data/Barriers\CH_Barriers.shp
F:\SEACAR_WQ_2024/GIS_Data/Barriers\EB_Barriers.shp
F:\SEACAR_WQ_2024/GIS_Data/Barriers\GTM_Barriers.shp


### Define waterbody boundary for spatial extent and masking

In [12]:
waterbody_extent = os.path.join(gis_path, 'OEAT_Waterbody_Boundaries', 'OEAT_Waterbody_Boundary.shp')

unique_waterbodies = []
with arcpy.da.SearchCursor(waterbody_extent, ['WaterbodyA']) as cursor:
    for row in cursor:
        unique_waterbodies.append(row[0])

print("Unique Waterbodies:", unique_waterbodies)

Unique Waterbodies: ['BBS', 'BB', 'CH', 'EB', 'GTM']


### Define and generate weekly table.

In [13]:
waterBody = ['Big Bend Seagrasses', 'Biscayne Bay', 'Charlotte Harbor', 'Estero Bay', 'Guana Tolomato Matanzas']
parameter = ['Dissolved Oxygen', 'Salinity', 'Turbidity', 'Water Temperature']
waterBody_list = []
parameter_list= []
week_list = []
for i in waterBody:
    for j in parameter:
        for k in range(1,53):
            waterBody_list.append(i)
            parameter_list.append(j)
            week_list.append(k)

In [14]:
df_week_table = pd.DataFrame({
    "WaterBody":waterBody_list,
    "Parameter":parameter_list,
    "Period":week_list
})
df_week_table["Filename"] = ""
df_week_table["NumDataPoints"] = ""
df_week_table["RMSE"] = ""
df_week_table["ME"] = ""
df_week_table["WbodyAcronym"] = df_week_table["WaterBody"].map(area_shortnames)

In [15]:
name_ab = df_week_table["WbodyAcronym"].unique()
para_list = df_week_table["Parameter"].unique()
dfs_sub = []
for each in name_ab:
    sheet_name_use = str(each) + " 52 week"
    df_period_table_used = pd.read_excel(gis_path + "All_Waterbodies_Season_Month_Week_Definitions.xlsx",sheet_name=sheet_name_use)
    startDate = list(df_period_table_used["Start Date"])
    endDate   = list(df_period_table_used["End Date"])
    for para in para_list:
        df_temp_use = df_week_table[(df_week_table["WbodyAcronym"]==str(each))&(df_week_table["Parameter"]==str(para))]
        df_temp_use["startDate"] = startDate
        df_temp_use["endDate"]   = endDate
        dfs_sub.append(df_temp_use)
df_week_table = pd.concat(dfs_sub)
df_week_table.to_csv(gis_path + 'week_table.csv')

In [16]:
df_week_table

Unnamed: 0,WaterBody,Parameter,Period,Filename,NumDataPoints,RMSE,ME,WbodyAcronym,startDate,endDate
0,Big Bend Seagrasses,Dissolved Oxygen,1,,,,,BBS,2021-01-06,2021-01-12
1,Big Bend Seagrasses,Dissolved Oxygen,2,,,,,BBS,2021-01-13,2021-01-19
2,Big Bend Seagrasses,Dissolved Oxygen,3,,,,,BBS,2021-01-20,2021-01-26
3,Big Bend Seagrasses,Dissolved Oxygen,4,,,,,BBS,2021-01-27,2021-02-02
4,Big Bend Seagrasses,Dissolved Oxygen,5,,,,,BBS,2021-02-03,2021-02-09
...,...,...,...,...,...,...,...,...,...,...
1035,Guana Tolomato Matanzas,Water Temperature,48,,,,,GTM,2017-03-04,2017-03-11
1036,Guana Tolomato Matanzas,Water Temperature,49,,,,,GTM,2017-03-11,2017-03-18
1037,Guana Tolomato Matanzas,Water Temperature,50,,,,,GTM,2017-03-18,2017-03-25
1038,Guana Tolomato Matanzas,Water Temperature,51,,,,,GTM,2017-03-25,2017-04-01


# 1.2 Preview Dataset <a class="anchor" id="reg_preview"></a>

In [17]:
df_week_select_Mean.head()

Unnamed: 0,WaterBody,ParameterName,ParameterUnits,Year,Season,Latitude_DD,Longitude_DD,WbodyAcronym,Period,ResultValue,x,y
0,Guana Tolomato Matanzas,Dissolved Oxygen,mg/L,2016,Fall,29.667071,-81.257403,GTM,23,5.566667,665054.340859,631868.366218
1,Guana Tolomato Matanzas,Dissolved Oxygen,mg/L,2016,Fall,29.667071,-81.257403,GTM,24,5.106926,665054.340859,631868.366218
2,Guana Tolomato Matanzas,Dissolved Oxygen,mg/L,2016,Fall,29.667071,-81.257403,GTM,25,4.68658,665054.340859,631868.366218
3,Guana Tolomato Matanzas,Dissolved Oxygen,mg/L,2016,Fall,29.667071,-81.257403,GTM,26,5.511688,665054.340859,631868.366218
4,Guana Tolomato Matanzas,Dissolved Oxygen,mg/L,2016,Fall,29.667071,-81.257403,GTM,27,2.891775,665054.340859,631868.366218


## 1.3 Fill NaN RowID with unique ID, IDW function needs unique ID <a class="anchor" id="reg_id"></a>

In [18]:
idw_rk.fill_nan_rowids(df_week_select_Mean, 'RowID')

df_week_select_Mean['RowID'] = df_week_select_Mean['RowID'].astype(int)
df_week_select_Mean

Unnamed: 0,WaterBody,ParameterName,ParameterUnits,Year,Season,Latitude_DD,Longitude_DD,WbodyAcronym,Period,ResultValue,x,y,RowID
0,Guana Tolomato Matanzas,Dissolved Oxygen,mg/L,2016,Fall,29.667071,-81.257403,GTM,23,5.566667,665054.340859,631868.366218,1
1,Guana Tolomato Matanzas,Dissolved Oxygen,mg/L,2016,Fall,29.667071,-81.257403,GTM,24,5.106926,665054.340859,631868.366218,2
2,Guana Tolomato Matanzas,Dissolved Oxygen,mg/L,2016,Fall,29.667071,-81.257403,GTM,25,4.686580,665054.340859,631868.366218,3
3,Guana Tolomato Matanzas,Dissolved Oxygen,mg/L,2016,Fall,29.667071,-81.257403,GTM,26,5.511688,665054.340859,631868.366218,4
4,Guana Tolomato Matanzas,Dissolved Oxygen,mg/L,2016,Fall,29.667071,-81.257403,GTM,27,2.891775,665054.340859,631868.366218,5
...,...,...,...,...,...,...,...,...,...,...,...,...,...
3221,Big Bend Seagrasses,Water Temperature,Degrees C,2021,Winter,29.647203,-83.421196,BBS,49,20.032828,455952.242067,626846.365120,3222
3222,Big Bend Seagrasses,Water Temperature,Degrees C,2021,Winter,29.647203,-83.421196,BBS,50,20.496970,455952.242067,626846.365120,3223
3223,Big Bend Seagrasses,Water Temperature,Degrees C,2021,Winter,29.647203,-83.421196,BBS,51,16.403535,455952.242067,626846.365120,3224
3224,Big Bend Seagrasses,Water Temperature,Degrees C,2021,Winter,29.647203,-83.421196,BBS,52,21.236364,455952.242067,626846.365120,3225


# 2. Create Shapefile <a class="anchor" id="reg_create_shp"></a>

In [19]:
idw_rk.delete_all_files(shpCon_folder)

In [20]:
# Merge interested with latitude and longitude columns
seasons_con_coord = idw_rk.merge_with_lat_long1(df_week_table, df_week_select_Mean)

In [21]:
seasons_con_coord

Unnamed: 0,WaterBody,Parameter,Period,Filename,NumDataPoints,RMSE,ME,WbodyAcronym,startDate,endDate,x,y,RowID,ResultValue
0,Big Bend Seagrasses,Dissolved Oxygen,1,,,,,BBS,2021-01-06,2021-01-12,455952.242067,626846.365120,3055,8.345960
1,Big Bend Seagrasses,Dissolved Oxygen,2,,,,,BBS,2021-01-13,2021-01-19,455952.242067,626846.365120,3056,8.507576
2,Big Bend Seagrasses,Dissolved Oxygen,3,,,,,BBS,2021-01-20,2021-01-26,455952.242067,626846.365120,3057,8.120202
3,Big Bend Seagrasses,Dissolved Oxygen,4,,,,,BBS,2021-01-27,2021-02-02,455952.242067,626846.365120,3058,7.933838
4,Big Bend Seagrasses,Dissolved Oxygen,5,,,,,BBS,2021-02-03,2021-02-09,455952.242067,626846.365120,3059,8.726263
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3292,Guana Tolomato Matanzas,Water Temperature,51,,,,,GTM,2017-03-25,2017-04-01,653509.737698,674232.732990,747,22.775325
3293,Guana Tolomato Matanzas,Water Temperature,52,,,,,GTM,2017-04-01,2017-04-08,665054.340859,631868.366218,733,24.606061
3294,Guana Tolomato Matanzas,Water Temperature,52,,,,,GTM,2017-04-01,2017-04-08,665987.248566,639659.363097,738,21.770996
3295,Guana Tolomato Matanzas,Water Temperature,52,,,,,GTM,2017-04-01,2017-04-08,659731.434296,654158.019057,743,22.544156


In [23]:
idw_rk.create_shp_season1(seasons_con_coord, shpCon_folder)

# 3. Cross Validation for IDW <a class="anchor" id="reg_cv_idw"></a>

In [239]:
# Empty the shapefile folder
idw_rk.delete_all_files(idwCon_folder)

In [275]:
df_week_table

Unnamed: 0,WaterBody,Parameter,Period,Filename,NumDataPoints,RMSE,ME,WbodyAcronym,startDate,endDate
0,Big Bend Seagrasses,Dissolved Oxygen,1,,,,,BBS,2021-01-06,2021-01-12
1,Big Bend Seagrasses,Dissolved Oxygen,2,,,,,BBS,2021-01-13,2021-01-19
2,Big Bend Seagrasses,Dissolved Oxygen,3,,,,,BBS,2021-01-20,2021-01-26
3,Big Bend Seagrasses,Dissolved Oxygen,4,,,,,BBS,2021-01-27,2021-02-02
4,Big Bend Seagrasses,Dissolved Oxygen,5,,,,,BBS,2021-02-03,2021-02-09
...,...,...,...,...,...,...,...,...,...,...
1035,Guana Tolomato Matanzas,Water Temperature,48,,,,,GTM,2017-03-04,2017-03-11
1036,Guana Tolomato Matanzas,Water Temperature,49,,,,,GTM,2017-03-11,2017-03-18
1037,Guana Tolomato Matanzas,Water Temperature,50,,,,,GTM,2017-03-18,2017-03-25
1038,Guana Tolomato Matanzas,Water Temperature,51,,,,,GTM,2017-03-25,2017-04-01


In [276]:
# If the number of data points is less than 3，skipping calculate IDW
idw_rk.idw_interpolation1(df_week_table, shpCon_folder, idwCon_folder, waterbody_extent, barrier_folder)

# 4. Monthly Results <a class="anchor" id="month"></a>

### Set the folder path for weekly result

In [24]:
shpCon_folder = gis_path + r"shapefiles_Con/month"
idwCon_folder = gis_path + r"idw_Con/month"

In [25]:
area_ab = ["GTM","EB","CH","BB","BBS"]
period_type = [" 52 week"," Month"]
dfConTime["Period"] = ""
def select_data_period1(df,area,period):
    sheet_name = str(area) + str(period)
    df_period_table = pd.read_excel(gis_path + "All_Waterbodies_Season_Month_Week_Definitions.xlsx",sheet_name=sheet_name)
    df_select_area = df[df["WbodyAcronym"]==str(area)]
    df_period_table['Start Date'] = pd.to_datetime(df_period_table['Start Date'])
    df_period_table['End Date']   = pd.to_datetime(df_period_table['End Date'])
    sub_dfs = []

    for index, row in df_period_table.iterrows():
        start_date = row['Start Date']
        end_date   = row['End Date']
        sub_df = df_select_area[(df_select_area['SampleDate'] >= start_date) & (df_select_area['SampleDate'] < end_date)]
        sub_df['Period'] = row["Month"]
        sub_dfs.append(sub_df)
        
    df_period = pd.concat(sub_dfs,ignore_index=True)
    return df_period

In [26]:
sel_month_temp = []
for each in area_ab:
    df_month_temp = select_data_period1(dfConTime,str(each)," Month")
    df_month_temp_group = df_month_temp.groupby(['WaterBody','ParameterName','ParameterUnits',
                                          'Year','Season','Latitude_DD','Longitude_DD','WbodyAcronym',"Period"])["ResultValue"].agg("mean").reset_index()
    sel_month_temp.append(df_month_temp_group)
df_month_select_Mean = pd.concat(sel_month_temp,ignore_index=True)

In [27]:
# Define the EPSG codes for source (EPSG:4326) and target (EPSG:3086) coordinate systems
source_epsg = 'EPSG:4326'
target_epsg = 'EPSG:3086'

# Create a PyProj Transformer for the conversion
transformer = pyproj.Transformer.from_crs(source_epsg, target_epsg, always_xy=True)

# Define a function to apply the transformation to each row of the DataFrame
def transform_coordinates(row):
    x, y = transformer.transform(row['Longitude_DD'], row['Latitude_DD'])
    return pd.Series({'x': x, 'y': y})

# Apply the transformation function to the DataFrame and create new columns for the converted coordinates
df_month_select_Mean[['x', 'y']] = df_month_select_Mean.apply(transform_coordinates, axis=1)

In [28]:
waterBody = ['Big Bend Seagrasses', 'Biscayne Bay', 'Charlotte Harbor', 'Estero Bay', 'Guana Tolomato Matanzas']
parameter = ['Dissolved Oxygen', 'Salinity', 'Turbidity', 'Water Temperature']
waterBody_list = []
parameter_list= []
month_list = []
for i in waterBody:
    for j in parameter:
        for k in range(1,13):
            waterBody_list.append(i)
            parameter_list.append(j)
            month_list.append(k)

In [29]:
df_month_table = pd.DataFrame({
    "WaterBody":waterBody_list,
    "Parameter":parameter_list,
    "Period":month_list
})
df_month_table["Filename"] = ""
df_month_table["NumDataPoints"] = ""
df_month_table["RMSE"] = ""
df_month_table["ME"] = ""
df_month_table["WbodyAcronym"] = df_month_table["WaterBody"].map(area_shortnames)

In [30]:
name_ab = df_month_table["WbodyAcronym"].unique()
para_list = df_month_table["Parameter"].unique()
dfs_sub = []
for each in name_ab:
    sheet_name_use = str(each) + " Month"
    df_period_table_used = pd.read_excel(gis_path + "All_Waterbodies_Season_Month_Week_Definitions.xlsx",sheet_name=sheet_name_use)
    startDate = list(df_period_table_used["Start Date"])
    endDate   = list(df_period_table_used["End Date"])
    for para in para_list:
        df_temp_use = df_month_table[(df_month_table["WbodyAcronym"]==str(each))&(df_month_table["Parameter"]==str(para))]
        df_temp_use["startDate"] = startDate
        df_temp_use["endDate"]   = endDate
        dfs_sub.append(df_temp_use)
df_month_table = pd.concat(dfs_sub)
df_month_table.to_csv(gis_path + 'month_table.csv')

In [31]:
df_month_table

Unnamed: 0,WaterBody,Parameter,Period,Filename,NumDataPoints,RMSE,ME,WbodyAcronym,startDate,endDate
0,Big Bend Seagrasses,Dissolved Oxygen,1,,,,,BBS,2021-01-08,2021-02-06
1,Big Bend Seagrasses,Dissolved Oxygen,2,,,,,BBS,2021-02-07,2021-03-08
2,Big Bend Seagrasses,Dissolved Oxygen,3,,,,,BBS,2021-03-09,2021-04-07
3,Big Bend Seagrasses,Dissolved Oxygen,4,,,,,BBS,2021-04-08,2021-05-07
4,Big Bend Seagrasses,Dissolved Oxygen,5,,,,,BBS,2021-05-08,2021-06-06
...,...,...,...,...,...,...,...,...,...,...
235,Guana Tolomato Matanzas,Water Temperature,8,,,,,GTM,2016-11-07,2016-12-07
236,Guana Tolomato Matanzas,Water Temperature,9,,,,,GTM,2016-12-07,2017-01-06
237,Guana Tolomato Matanzas,Water Temperature,10,,,,,GTM,2017-01-06,2017-02-05
238,Guana Tolomato Matanzas,Water Temperature,11,,,,,GTM,2017-02-05,2017-03-07


In [32]:
idw_rk.fill_nan_rowids(df_month_select_Mean, 'RowID')

df_month_select_Mean['RowID'] = df_month_select_Mean['RowID'].astype(int)
df_month_select_Mean

Unnamed: 0,WaterBody,ParameterName,ParameterUnits,Year,Season,Latitude_DD,Longitude_DD,WbodyAcronym,Period,ResultValue,x,y,RowID
0,Guana Tolomato Matanzas,Dissolved Oxygen,mg/L,2016,Fall,29.667071,-81.257403,GTM,6,5.142161,665054.340859,631868.366218,1
1,Guana Tolomato Matanzas,Dissolved Oxygen,mg/L,2016,Fall,29.667071,-81.257403,GTM,7,3.728834,665054.340859,631868.366218,2
2,Guana Tolomato Matanzas,Dissolved Oxygen,mg/L,2016,Fall,29.667071,-81.257403,GTM,8,6.632493,665054.340859,631868.366218,3
3,Guana Tolomato Matanzas,Dissolved Oxygen,mg/L,2016,Fall,29.737041,-81.245953,GTM,6,5.606851,665987.248566,639659.363097,4
4,Guana Tolomato Matanzas,Dissolved Oxygen,mg/L,2016,Fall,29.737041,-81.245953,GTM,7,6.354141,665987.248566,639659.363097,5
...,...,...,...,...,...,...,...,...,...,...,...,...,...
989,Big Bend Seagrasses,Water Temperature,Degrees C,2021,Summer,29.647203,-83.421196,BBS,9,29.325253,455952.242067,626846.365120,990
990,Big Bend Seagrasses,Water Temperature,Degrees C,2021,Winter,29.647203,-83.421196,BBS,1,13.373354,455952.242067,626846.365120,991
991,Big Bend Seagrasses,Water Temperature,Degrees C,2021,Winter,29.647203,-83.421196,BBS,2,17.077020,455952.242067,626846.365120,992
992,Big Bend Seagrasses,Water Temperature,Degrees C,2021,Winter,29.647203,-83.421196,BBS,12,19.022835,455952.242067,626846.365120,993


In [33]:
# Merge interested with latitude and longitude columns
seasons_con_coord = idw_rk.merge_with_lat_long1(df_month_table, df_month_select_Mean)
seasons_con_coord

Unnamed: 0,WaterBody,Parameter,Period,Filename,NumDataPoints,RMSE,ME,WbodyAcronym,startDate,endDate,x,y,RowID,ResultValue
0,Big Bend Seagrasses,Dissolved Oxygen,1,,,,,BBS,2021-01-08,2021-02-06,455952.242067,626846.365120,944,8.356844
1,Big Bend Seagrasses,Dissolved Oxygen,2,,,,,BBS,2021-02-07,2021-03-08,455952.242067,626846.365120,936,7.433333
2,Big Bend Seagrasses,Dissolved Oxygen,2,,,,,BBS,2021-02-07,2021-03-08,455952.242067,626846.365120,945,7.730177
3,Big Bend Seagrasses,Dissolved Oxygen,3,,,,,BBS,2021-03-09,2021-04-07,455952.242067,626846.365120,937,7.490700
4,Big Bend Seagrasses,Dissolved Oxygen,4,,,,,BBS,2021-04-08,2021-05-07,455952.242067,626846.365120,938,6.859352
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
997,Guana Tolomato Matanzas,Water Temperature,11,,,,,GTM,2017-02-05,2017-03-07,653509.737698,674232.732990,234,18.666667
998,Guana Tolomato Matanzas,Water Temperature,12,,,,,GTM,2017-03-07,2017-04-06,665054.340859,631868.366218,229,21.517071
999,Guana Tolomato Matanzas,Water Temperature,12,,,,,GTM,2017-03-07,2017-04-06,665987.248566,639659.363097,231,20.061616
1000,Guana Tolomato Matanzas,Water Temperature,12,,,,,GTM,2017-03-07,2017-04-06,659731.434296,654158.019057,233,20.107685


In [34]:
idw_rk.create_shp_season1(seasons_con_coord, shpCon_folder)

Number of data rows for BBS, DO_mgl, 1: 1
Shapefile for BBS: DO_mgl for period 1 has been saved as SHP_BBS_DO_mgl_1.shp
Number of data rows for BBS, DO_mgl, 2: 2
Shapefile for BBS: DO_mgl for period 2 has been saved as SHP_BBS_DO_mgl_2.shp
Number of data rows for BBS, DO_mgl, 3: 1
Shapefile for BBS: DO_mgl for period 3 has been saved as SHP_BBS_DO_mgl_3.shp
Number of data rows for BBS, DO_mgl, 4: 1
Shapefile for BBS: DO_mgl for period 4 has been saved as SHP_BBS_DO_mgl_4.shp
Number of data rows for BBS, DO_mgl, 5: 1
Shapefile for BBS: DO_mgl for period 5 has been saved as SHP_BBS_DO_mgl_5.shp
Number of data rows for BBS, DO_mgl, 6: 1
Shapefile for BBS: DO_mgl for period 6 has been saved as SHP_BBS_DO_mgl_6.shp
Number of data rows for BBS, DO_mgl, 7: 1
Shapefile for BBS: DO_mgl for period 7 has been saved as SHP_BBS_DO_mgl_7.shp
Number of data rows for BBS, DO_mgl, 8: 1
Shapefile for BBS: DO_mgl for period 8 has been saved as SHP_BBS_DO_mgl_8.shp
Number of data rows for BBS, DO_mgl, 9: 

Shapefile for BB: Sal_ppt for period 10 has been saved as SHP_BB_Sal_ppt_10.shp
Number of data rows for BB, Sal_ppt, 11: 1
Shapefile for BB: Sal_ppt for period 11 has been saved as SHP_BB_Sal_ppt_11.shp
No valid data found for area: BB, parameter: Sal_ppt, period: 12
Number of data rows for BB, Turb_ntu, 1: 6
Shapefile for BB: Turb_ntu for period 1 has been saved as SHP_BB_Turb_ntu_1.shp
Number of data rows for BB, Turb_ntu, 2: 6
Shapefile for BB: Turb_ntu for period 2 has been saved as SHP_BB_Turb_ntu_2.shp
Number of data rows for BB, Turb_ntu, 3: 6
Shapefile for BB: Turb_ntu for period 3 has been saved as SHP_BB_Turb_ntu_3.shp
Number of data rows for BB, Turb_ntu, 4: 12
Shapefile for BB: Turb_ntu for period 4 has been saved as SHP_BB_Turb_ntu_4.shp
Number of data rows for BB, Turb_ntu, 5: 5
Shapefile for BB: Turb_ntu for period 5 has been saved as SHP_BB_Turb_ntu_5.shp
Number of data rows for BB, Turb_ntu, 6: 6
Shapefile for BB: Turb_ntu for period 6 has been saved as SHP_BB_Turb_ntu

Shapefile for CH: T_c for period 9 has been saved as SHP_CH_T_c_9.shp
Number of data rows for CH, T_c, 10: 6
Shapefile for CH: T_c for period 10 has been saved as SHP_CH_T_c_10.shp
Number of data rows for CH, T_c, 11: 3
Shapefile for CH: T_c for period 11 has been saved as SHP_CH_T_c_11.shp
Number of data rows for CH, T_c, 12: 6
Shapefile for CH: T_c for period 12 has been saved as SHP_CH_T_c_12.shp
Number of data rows for EB, DO_mgl, 1: 3
Shapefile for EB: DO_mgl for period 1 has been saved as SHP_EB_DO_mgl_1.shp
Number of data rows for EB, DO_mgl, 2: 3
Shapefile for EB: DO_mgl for period 2 has been saved as SHP_EB_DO_mgl_2.shp
Number of data rows for EB, DO_mgl, 3: 3
Shapefile for EB: DO_mgl for period 3 has been saved as SHP_EB_DO_mgl_3.shp
Number of data rows for EB, DO_mgl, 4: 5
Shapefile for EB: DO_mgl for period 4 has been saved as SHP_EB_DO_mgl_4.shp
Number of data rows for EB, DO_mgl, 5: 3
Shapefile for EB: DO_mgl for period 5 has been saved as SHP_EB_DO_mgl_5.shp
Number of da

Shapefile for GTM: Sal_ppt for period 7 has been saved as SHP_GTM_Sal_ppt_7.shp
Number of data rows for GTM, Sal_ppt, 8: 8
Shapefile for GTM: Sal_ppt for period 8 has been saved as SHP_GTM_Sal_ppt_8.shp
Number of data rows for GTM, Sal_ppt, 9: 8
Shapefile for GTM: Sal_ppt for period 9 has been saved as SHP_GTM_Sal_ppt_9.shp
No valid data found for area: GTM, parameter: Sal_ppt, period: 10
Number of data rows for GTM, Sal_ppt, 11: 3
Shapefile for GTM: Sal_ppt for period 11 has been saved as SHP_GTM_Sal_ppt_11.shp
Number of data rows for GTM, Sal_ppt, 12: 4
Shapefile for GTM: Sal_ppt for period 12 has been saved as SHP_GTM_Sal_ppt_12.shp
Number of data rows for GTM, Turb_ntu, 1: 4
Shapefile for GTM: Turb_ntu for period 1 has been saved as SHP_GTM_Turb_ntu_1.shp
Number of data rows for GTM, Turb_ntu, 2: 4
Shapefile for GTM: Turb_ntu for period 2 has been saved as SHP_GTM_Turb_ntu_2.shp
Number of data rows for GTM, Turb_ntu, 3: 8
Shapefile for GTM: Turb_ntu for period 3 has been saved as S

In [35]:
# If the number of data points is less than 3，skipping calculate IDW
idw_rk.idw_interpolation1(df_month_table, shpCon_folder, idwCon_folder, waterbody_extent, barrier_folder)

Processing file: SHP_BBS_DO_mgl_1.shp
Not enough data for IDW interpolation in SHP_BBS_DO_mgl_1.shp, skipping
Processing file: SHP_BBS_DO_mgl_2.shp
Not enough data for IDW interpolation in SHP_BBS_DO_mgl_2.shp, skipping
Processing file: SHP_BBS_DO_mgl_3.shp
Not enough data for IDW interpolation in SHP_BBS_DO_mgl_3.shp, skipping
Processing file: SHP_BBS_DO_mgl_4.shp
Not enough data for IDW interpolation in SHP_BBS_DO_mgl_4.shp, skipping
Processing file: SHP_BBS_DO_mgl_5.shp
Not enough data for IDW interpolation in SHP_BBS_DO_mgl_5.shp, skipping
Processing file: SHP_BBS_DO_mgl_6.shp
Not enough data for IDW interpolation in SHP_BBS_DO_mgl_6.shp, skipping
Processing file: SHP_BBS_DO_mgl_7.shp
Not enough data for IDW interpolation in SHP_BBS_DO_mgl_7.shp, skipping
Processing file: SHP_BBS_DO_mgl_8.shp
Not enough data for IDW interpolation in SHP_BBS_DO_mgl_8.shp, skipping
Processing file: SHP_BBS_DO_mgl_9.shp
Not enough data for IDW interpolation in SHP_BBS_DO_mgl_9.shp, skipping
Processing

Processing file: SHP_BB_Turb_ntu_5.shp
File SHP_BB_Turb_ntu_5.shp has completed 5 cross-validation iterations.
Processing file: SHP_BB_Turb_ntu_6.shp
File SHP_BB_Turb_ntu_6.shp has completed 6 cross-validation iterations.
Processing file: SHP_BB_Turb_ntu_7.shp
File SHP_BB_Turb_ntu_7.shp has completed 12 cross-validation iterations.
Processing file: SHP_BB_Turb_ntu_8.shp
File SHP_BB_Turb_ntu_8.shp has completed 6 cross-validation iterations.
Processing file: SHP_BB_Turb_ntu_9.shp
File SHP_BB_Turb_ntu_9.shp has completed 12 cross-validation iterations.
Processing file: SHP_BB_Turb_ntu_10.shp
File SHP_BB_Turb_ntu_10.shp has completed 6 cross-validation iterations.
Processing file: SHP_BB_Turb_ntu_11.shp
Not enough data for IDW interpolation in SHP_BB_Turb_ntu_11.shp, skipping
Shapefile not found for: SHP_BB_Turb_ntu_12.shp
Processing file: SHP_BB_T_c_1.shp
File SHP_BB_T_c_1.shp has completed 6 cross-validation iterations.
Processing file: SHP_BB_T_c_2.shp
File SHP_BB_T_c_2.shp has complet

KeyboardInterrupt: 