# IDW Interpolation for weekly intervals
## Task 2B

This document includes Python codes that conduct cross validation (CV) for Inverse Distance Weighting (IDW) Interpolation on water quality parameters, including 4 water quality parameters in arcpy environment:
- Dissolved oxygen (DO_mgl)
- Salinity (Sal_ppt)
- Turbidity (Turb_ntu)
- Temperature (T_c)

The analysis is conducted in the separate water bodies:
- Guana Tolomato Matanzas (GTM)
- Estero Bay (EB)
- Charlotte Harbor (CH)
- Biscayne Bay (BB)
- Big Bend Seagrasses (BBS)

**Tasks:**  

- Task 2A Calculate the RMSE and Mean Error (ME) for IDW results using both continuous and discrete data

- **Task 2B Calculate the RMSE and Mean Error (ME) for IDW results using continuous data.**


**Monthly: The SEACAR team would like the USF team to define their months as six 30-day
increments prior to the storm day, and then six 30-day increments following the storm day.** 

**Weekly: The SEACAR team would like the USF team to define their weeks as 26 7-day
increments prior to the storm day, and then 26 7-day increments following the storm day** 

**Contents:**
* [1. Data Preprocess](#reg_preprocessing)
    * [1.1 Subsetting Dataset](#reg_subset)
    * [1.2 Preview Dataset](#reg_preview)
    * [1.3 Fill Unique ID](#reg_id)
* [2. Create Shapefile](#reg_create_shp)
* [3. Cross Validation for IDW](#reg_cv_idw)
* [4. Monthly Results](#month)

In [3]:
import pandas as pd
import numpy as np
import arcpy
from arcpy.sa import *
import os
import math
import warnings

import importlib
import sys
#path = r'F:\SEACAR_WQ_2024\git\misc'
path = r'E:\Projects\SEACAR_WQ_2024\git\misc'

sys.path.insert(0, path)
# import idw_rk
import idw_rk
# importlib.reload(idw_rk)
importlib.reload(idw_rk)

import pyproj

# define scratch folder to avoid overwritting from parallel threats

arcpy.env.scratchWorkspace = r"E:\Projects\SEACAR_WQ_2024\scratch/IDW_week"

warnings.filterwarnings('ignore')
arcpy.env.overwriteOutput = True

## 1. Data Preprocessing <a class="anchor" id="reg_preprocessing"></a>
### 1.1 Load csv files

In [4]:
#gis_path = r'F:\SEACAR_WQ_2024/GIS_Data/'
gis_path = r'E:\Projects\SEACAR_WQ_2024/GIS_Data/'
dfCon = pd.read_csv(gis_path + 'OEAT_Continuous_WQ-2024-Feb-21.csv', low_memory=False)

### Define output folder. Here is the folder setting for week.

In [5]:
# shpCon_folder = gis_path + r"shapefiles_Con/week"
# idwCon_folder = gis_path + r"idw_Con/week"

shpCon_folder = gis_path + r"shapefiles/IDW_Week"
idwCon_folder = gis_path + r"raster_output/IDW_Week"

### 1.2 Subsetting Data <a class="anchor" id="reg_subset"></a>

#### Include the time period from 8 am to 18 pm in a day

In [6]:
dfCon['SampleDate'] = pd.to_datetime(dfCon['SampleDate'], format='%Y-%m-%d %H:%M:%S.%f')

# Include date from 8:00 am to 18:00 pm
start_time = '08:00'
end_time = '18:00'

dfConTime = dfCon[dfCon['SampleDate'].dt.time.between(pd.to_datetime(start_time).time(), pd.to_datetime(end_time).time())]
dfConTime["SampleDate"] = pd.to_datetime(dfConTime['SampleDate'])
dfConTime.head()

Unnamed: 0,RowID,ProgramID,ParameterName,ParameterUnits,ProgramLocationID,ActivityType,SampleDate,Year,Month,RelativeDepth,ResultValue,Latitude_DD,Longitude_DD,ManagedAreaName,AreaID,SEACAR_QAQCFlagCode,WaterBody,WbodyAcronym,Season
0,1,512,Water Temperature,Degrees C,CHWW1,,2021-11-25 11:45:00,2021,11,bottom,19.2,26.8325,-82.14805,Cape Haze Aquatic Preserve,9,6Q,Charlotte Harbor,CH,Fall
1,2,512,Water Temperature,Degrees C,CHWW1,,2021-11-13 16:15:00,2021,11,bottom,22.3,26.8325,-82.14805,Cape Haze Aquatic Preserve,9,6Q,Charlotte Harbor,CH,Fall
2,3,512,Water Temperature,Degrees C,CHWW1,,2021-10-26 14:15:00,2021,10,bottom,28.3,26.8325,-82.14805,Cape Haze Aquatic Preserve,9,6Q,Charlotte Harbor,CH,Fall
3,4,512,Water Temperature,Degrees C,CHWW1,,2022-09-16 12:45:00,2022,9,bottom,29.7,26.8325,-82.14805,Cape Haze Aquatic Preserve,9,6Q,Charlotte Harbor,CH,Fall
4,5,512,Water Temperature,Degrees C,CHWW1,,2022-09-20 10:30:00,2022,9,bottom,29.3,26.8325,-82.14805,Cape Haze Aquatic Preserve,9,6Q,Charlotte Harbor,CH,Fall


### 1.3 Select the data based on the 52 weeks excel file.

In [7]:
area_ab = ["GTM","EB","CH","BB","BBS"]
period_type = [" 52 week"," Month"]
dfConTime["Period"] = ""
def select_data_period(df,area,period):
    sheet_name = str(area) + str(period)
    df_period_table = pd.read_excel(gis_path + "All_Waterbodies_Season_Month_Week_Definitions.xlsx",sheet_name=sheet_name)
    df_select_area = df[df["WbodyAcronym"]==str(area)]
    df_period_table['Start Date'] = pd.to_datetime(df_period_table['Start Date'])
    df_period_table['End Date']   = pd.to_datetime(df_period_table['End Date'])
    sub_dfs = []

    for index, row in df_period_table.iterrows():
        start_date = row['Start Date']
        end_date   = row['End Date']
        sub_df = df_select_area[(df_select_area['SampleDate'] >= start_date) & (df_select_area['SampleDate'] < end_date)]
        sub_df['Period'] = row["Week"]
        sub_dfs.append(sub_df)
        
    df_period = pd.concat(sub_dfs,ignore_index=True)
    return df_period

### 1.4 Generate the aggregated mean value for weekly data.

In [8]:
sel_week_temp = []
for each in area_ab:
    df_week_temp = select_data_period(dfConTime,str(each)," 52 week")
    df_week_temp_group = df_week_temp.groupby(['WaterBody','ParameterName','ParameterUnits',
                                               'Latitude_DD','Longitude_DD','WbodyAcronym',"Period"])["ResultValue"].agg("mean").reset_index()
    sel_week_temp.append(df_week_temp_group)
df_week_select_Mean = pd.concat(sel_week_temp,ignore_index=True)

In [9]:
df_week_select_Mean.shape

(3236, 8)

In [10]:
df_week_select_Mean.head(50)

Unnamed: 0,WaterBody,ParameterName,ParameterUnits,Latitude_DD,Longitude_DD,WbodyAcronym,Period,ResultValue
0,Guana Tolomato Matanzas,Dissolved Oxygen,mg/L,29.667071,-81.257403,GTM,1,5.624042
1,Guana Tolomato Matanzas,Dissolved Oxygen,mg/L,29.667071,-81.257403,GTM,2,4.712892
2,Guana Tolomato Matanzas,Dissolved Oxygen,mg/L,29.667071,-81.257403,GTM,3,4.316725
3,Guana Tolomato Matanzas,Dissolved Oxygen,mg/L,29.667071,-81.257403,GTM,4,4.729592
4,Guana Tolomato Matanzas,Dissolved Oxygen,mg/L,29.667071,-81.257403,GTM,5,5.155749
5,Guana Tolomato Matanzas,Dissolved Oxygen,mg/L,29.667071,-81.257403,GTM,6,3.796167
6,Guana Tolomato Matanzas,Dissolved Oxygen,mg/L,29.667071,-81.257403,GTM,7,4.29338
7,Guana Tolomato Matanzas,Dissolved Oxygen,mg/L,29.667071,-81.257403,GTM,8,4.409059
8,Guana Tolomato Matanzas,Dissolved Oxygen,mg/L,29.667071,-81.257403,GTM,9,4.025436
9,Guana Tolomato Matanzas,Dissolved Oxygen,mg/L,29.667071,-81.257403,GTM,10,3.111498


### 1.5 Convert coordinate system to EPSG: 3086<a class="anchor" id="reg_coordinate"></a>


In [11]:
# Define the EPSG codes for source (EPSG:4326) and target (EPSG:3086) coordinate systems
source_epsg = 'EPSG:4326'
target_epsg = 'EPSG:3086'

# Create a PyProj Transformer for the conversion
transformer = pyproj.Transformer.from_crs(source_epsg, target_epsg, always_xy=True)

# Define a function to apply the transformation to each row of the DataFrame
def transform_coordinates(row):
    x, y = transformer.transform(row['Longitude_DD'], row['Latitude_DD'])
    return pd.Series({'x': x, 'y': y})

# Apply the transformation function to the DataFrame and create new columns for the converted coordinates
df_week_select_Mean[['x', 'y']] = df_week_select_Mean.apply(transform_coordinates, axis=1)

## 2. Prepare for batch interpolation<a class="anchor" id="reg_batch"></a>

In [12]:
area_shortnames = {
    'Guana Tolomato Matanzas': 'GTM',
    'Estero Bay': 'EB',
    'Charlotte Harbor': 'CH',
    'Biscayne Bay': 'BB',
    'Big Bend Seagrasses':'BBS'
}

param_shortnames = {
    'Salinity': 'Sal_ppt',
    'Total Nitrogen': 'TN_mgl',
    'Dissolved Oxygen': 'DO_mgl',
    'Turbidity':'Turb_ntu',
    'Secchi Depth':'Secc_m',
    'Water Temperature':'T_c'
}

### 2.2 Define the barrier files

In [13]:
barrier_folder = os.path.join(gis_path, 'Barriers')
barrier_folder

barriers = []
for file in os.listdir(barrier_folder):
    if file.endswith(".shp"):
        barriers.append(os.path.join(barrier_folder, file))

for barrier in barriers:
    print(barrier)

E:\Projects\SEACAR_WQ_2024/GIS_Data/Barriers\BBS_Barriers.shp
E:\Projects\SEACAR_WQ_2024/GIS_Data/Barriers\BB_Barriers.shp
E:\Projects\SEACAR_WQ_2024/GIS_Data/Barriers\CH_Barriers.shp
E:\Projects\SEACAR_WQ_2024/GIS_Data/Barriers\EB_Barriers.shp
E:\Projects\SEACAR_WQ_2024/GIS_Data/Barriers\GTM_Barriers.shp


### 2.3 Define waterbody boundary for spatial extent and masking

In [14]:
waterbody_extent = os.path.join(gis_path, 'OEAT_Waterbody_Boundaries', 'OEAT_Waterbody_Boundary.shp')

unique_waterbodies = []
with arcpy.da.SearchCursor(waterbody_extent, ['WaterbodyA']) as cursor:
    for row in cursor:
        unique_waterbodies.append(row[0])

print("Unique Waterbodies:", unique_waterbodies)

Unique Waterbodies: ['BBS', 'BB', 'CH', 'EB', 'GTM']


### 2.4 Define and generate weekly table.

In [15]:
waterBody = ['Big Bend Seagrasses', 'Biscayne Bay', 'Charlotte Harbor', 'Estero Bay', 'Guana Tolomato Matanzas']
parameter = ['Dissolved Oxygen', 'Salinity', 'Turbidity', 'Water Temperature']
waterBody_list = []
parameter_list= []
week_list = []
for i in waterBody:
    for j in parameter:
        for k in range(1,53):
            waterBody_list.append(i)
            parameter_list.append(j)
            week_list.append(k)

In [16]:
df_week_table = pd.DataFrame({
    "WaterBody":waterBody_list,
    "Parameter":parameter_list,
    "Period":week_list
})
df_week_table["Filename"] = ""
df_week_table["NumDataPoints"] = ""
df_week_table["RMSE"] = ""
df_week_table["ME"] = ""
df_week_table["WbodyAcronym"] = df_week_table["WaterBody"].map(area_shortnames)

In [17]:
name_ab = df_week_table["WbodyAcronym"].unique()
para_list = df_week_table["Parameter"].unique()
dfs_sub = []
for each in name_ab:
    sheet_name_use = str(each) + " 52 week"
    df_period_table_used = pd.read_excel(gis_path + "All_Waterbodies_Season_Month_Week_Definitions.xlsx",sheet_name=sheet_name_use)
    startDate = list(df_period_table_used["Start Date"])
    endDate   = list(df_period_table_used["End Date"])
    for para in para_list:
        df_temp_use = df_week_table[(df_week_table["WbodyAcronym"]==str(each))&(df_week_table["Parameter"]==str(para))]
        df_temp_use["startDate"] = startDate
        df_temp_use["endDate"]   = endDate
        dfs_sub.append(df_temp_use)
df_week_table = pd.concat(dfs_sub)
df_week_table["Year"] = df_week_table["startDate"].dt.year
df_week_table.to_csv(gis_path + 'week_table.csv')

# 1.2 Preview Dataset <a class="anchor" id="reg_preview"></a>

In [18]:
df_week_select_Mean.head(50)

Unnamed: 0,WaterBody,ParameterName,ParameterUnits,Latitude_DD,Longitude_DD,WbodyAcronym,Period,ResultValue,x,y
0,Guana Tolomato Matanzas,Dissolved Oxygen,mg/L,29.667071,-81.257403,GTM,1,5.624042,inf,inf
1,Guana Tolomato Matanzas,Dissolved Oxygen,mg/L,29.667071,-81.257403,GTM,2,4.712892,665054.6,631868.2
2,Guana Tolomato Matanzas,Dissolved Oxygen,mg/L,29.667071,-81.257403,GTM,3,4.316725,665054.6,631868.2
3,Guana Tolomato Matanzas,Dissolved Oxygen,mg/L,29.667071,-81.257403,GTM,4,4.729592,665054.6,631868.2
4,Guana Tolomato Matanzas,Dissolved Oxygen,mg/L,29.667071,-81.257403,GTM,5,5.155749,665054.6,631868.2
5,Guana Tolomato Matanzas,Dissolved Oxygen,mg/L,29.667071,-81.257403,GTM,6,3.796167,665054.6,631868.2
6,Guana Tolomato Matanzas,Dissolved Oxygen,mg/L,29.667071,-81.257403,GTM,7,4.29338,665054.6,631868.2
7,Guana Tolomato Matanzas,Dissolved Oxygen,mg/L,29.667071,-81.257403,GTM,8,4.409059,665054.6,631868.2
8,Guana Tolomato Matanzas,Dissolved Oxygen,mg/L,29.667071,-81.257403,GTM,9,4.025436,665054.6,631868.2
9,Guana Tolomato Matanzas,Dissolved Oxygen,mg/L,29.667071,-81.257403,GTM,10,3.111498,665054.6,631868.2


## 1.3 Fill NaN RowID with unique ID, IDW function needs unique ID <a class="anchor" id="reg_id"></a>

In [20]:
idw_rk.fill_nan_rowids(df_week_select_Mean, 'RowID')

df_week_select_Mean['RowID'] = df_week_select_Mean['RowID'].astype(int)
df_week_select_Mean

Unnamed: 0,WaterBody,ParameterName,ParameterUnits,Latitude_DD,Longitude_DD,WbodyAcronym,Period,ResultValue,x,y,RowID
0,Guana Tolomato Matanzas,Dissolved Oxygen,mg/L,29.667071,-81.257403,GTM,1,5.624042,inf,inf,1
1,Guana Tolomato Matanzas,Dissolved Oxygen,mg/L,29.667071,-81.257403,GTM,2,4.712892,6.650546e+05,6.318682e+05,2
2,Guana Tolomato Matanzas,Dissolved Oxygen,mg/L,29.667071,-81.257403,GTM,3,4.316725,6.650546e+05,6.318682e+05,3
3,Guana Tolomato Matanzas,Dissolved Oxygen,mg/L,29.667071,-81.257403,GTM,4,4.729592,6.650546e+05,6.318682e+05,4
4,Guana Tolomato Matanzas,Dissolved Oxygen,mg/L,29.667071,-81.257403,GTM,5,5.155749,6.650546e+05,6.318682e+05,5
...,...,...,...,...,...,...,...,...,...,...,...
3231,Big Bend Seagrasses,Water Temperature,Degrees C,29.647203,-83.421196,BBS,48,16.659756,4.559525e+05,6.268460e+05,3232
3232,Big Bend Seagrasses,Water Temperature,Degrees C,29.647203,-83.421196,BBS,49,20.009756,4.559525e+05,6.268460e+05,3233
3233,Big Bend Seagrasses,Water Temperature,Degrees C,29.647203,-83.421196,BBS,50,20.522764,4.559525e+05,6.268460e+05,3234
3234,Big Bend Seagrasses,Water Temperature,Degrees C,29.647203,-83.421196,BBS,51,16.418699,4.559525e+05,6.268460e+05,3235


# 2. Create Shapefile <a class="anchor" id="reg_create_shp"></a>

In [21]:
# Merge interested with latitude and longitude columns
seasons_con_coord = idw_rk.merge_with_lat_long1(df_week_table, df_week_select_Mean)

seasons_con_coord

Unnamed: 0,WaterBody,Parameter,Period,Filename,NumDataPoints,RMSE,ME,WbodyAcronym,startDate,endDate,Year,x,y,RowID,ResultValue
0,Big Bend Seagrasses,Dissolved Oxygen,1,,,,,BBS,2021-01-06,2021-01-12,2021,455952.502950,626846.023588,3036,8.334146
1,Big Bend Seagrasses,Dissolved Oxygen,2,,,,,BBS,2021-01-13,2021-01-19,2021,455952.502950,626846.023588,3037,8.486992
2,Big Bend Seagrasses,Dissolved Oxygen,3,,,,,BBS,2021-01-20,2021-01-26,2021,455952.502950,626846.023588,3038,8.124390
3,Big Bend Seagrasses,Dissolved Oxygen,4,,,,,BBS,2021-01-27,2021-02-02,2021,455952.502950,626846.023588,3039,7.903659
4,Big Bend Seagrasses,Dissolved Oxygen,5,,,,,BBS,2021-02-03,2021-02-09,2021,455952.502950,626846.023588,3040,8.734146
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3270,Guana Tolomato Matanzas,Water Temperature,51,,,,,GTM,2017-03-25,2017-04-01,2017,653510.005839,674232.563906,687,22.810801
3271,Guana Tolomato Matanzas,Water Temperature,52,,,,,GTM,2017-04-01,2017-04-08,2017,665054.640076,631868.177645,556,24.600348
3272,Guana Tolomato Matanzas,Water Temperature,52,,,,,GTM,2017-04-01,2017-04-08,2017,665987.547779,639659.183554,600,21.855401
3273,Guana Tolomato Matanzas,Water Temperature,52,,,,,GTM,2017-04-01,2017-04-08,2017,659731.716683,654157.848803,644,22.620557


In [None]:
idw_rk.create_shp_season1(seasons_con_coord, shpCon_folder)

# 3. Run IDW for weeks <a class="anchor" id="reg_cv_idw"></a>

In [30]:
# Empty the shapefile folder
# idw_rk_x.delete_all_files(idwCon_folder)

df_week_table[0:665]

Unnamed: 0,WaterBody,Parameter,Period,Filename,NumDataPoints,RMSE,ME,WbodyAcronym,startDate,endDate,Year
0,Big Bend Seagrasses,Dissolved Oxygen,1,SHP_BBS_DO_mgl_2021_1.shp,1,,,BBS,2021-01-06,2021-01-12,2021
1,Big Bend Seagrasses,Dissolved Oxygen,2,SHP_BBS_DO_mgl_2021_2.shp,1,,,BBS,2021-01-13,2021-01-19,2021
2,Big Bend Seagrasses,Dissolved Oxygen,3,SHP_BBS_DO_mgl_2021_3.shp,1,,,BBS,2021-01-20,2021-01-26,2021
3,Big Bend Seagrasses,Dissolved Oxygen,4,SHP_BBS_DO_mgl_2021_4.shp,1,,,BBS,2021-01-27,2021-02-02,2021
4,Big Bend Seagrasses,Dissolved Oxygen,5,SHP_BBS_DO_mgl_2021_5.shp,1,,,BBS,2021-02-03,2021-02-09,2021
...,...,...,...,...,...,...,...,...,...,...,...
660,Estero Bay,Dissolved Oxygen,37,SHP_EB_DO_mgl_2017_37.shp,3,0.219432,-0.012975,EB,2017-11-19,2017-11-25,2017
661,Estero Bay,Dissolved Oxygen,38,SHP_EB_DO_mgl_2017_38.shp,2,,,EB,2017-11-26,2017-12-02,2017
662,Estero Bay,Dissolved Oxygen,39,SHP_EB_DO_mgl_2017_39.shp,2,,,EB,2017-12-03,2017-12-09,2017
663,Estero Bay,Dissolved Oxygen,40,SHP_EB_DO_mgl_2017_40.shp,3,0.605878,-0.059142,EB,2017-12-10,2017-12-16,2017


In [None]:
# If the number of data points is less than 3，skipping calculate IDW
idw_rk_x.idw_interpolation1(df_week_table[665:], shpCon_folder, idwCon_folder, waterbody_extent, barrier_folder)

Processing file: SHP_EB_DO_mgl_2017_42.shp
E:\Projects\SEACAR_WQ_2024/GIS_Data/shapefiles/IDW_Week\SHP_EB_DO_mgl_2017_42.shp
3
File SHP_EB_DO_mgl_2017_42.shp has completed 3 cross-validation iterations.
Processing file: SHP_EB_DO_mgl_2017_43.shp
E:\Projects\SEACAR_WQ_2024/GIS_Data/shapefiles/IDW_Week\SHP_EB_DO_mgl_2017_43.shp
3
File SHP_EB_DO_mgl_2017_43.shp has completed 3 cross-validation iterations.
Processing file: SHP_EB_DO_mgl_2018_44.shp
E:\Projects\SEACAR_WQ_2024/GIS_Data/shapefiles/IDW_Week\SHP_EB_DO_mgl_2018_44.shp
3
File SHP_EB_DO_mgl_2018_44.shp has completed 3 cross-validation iterations.
Processing file: SHP_EB_DO_mgl_2018_45.shp
E:\Projects\SEACAR_WQ_2024/GIS_Data/shapefiles/IDW_Week\SHP_EB_DO_mgl_2018_45.shp
3
File SHP_EB_DO_mgl_2018_45.shp has completed 3 cross-validation iterations.
Processing file: SHP_EB_DO_mgl_2018_46.shp
E:\Projects\SEACAR_WQ_2024/GIS_Data/shapefiles/IDW_Week\SHP_EB_DO_mgl_2018_46.shp
3
File SHP_EB_DO_mgl_2018_46.shp has completed 3 cross-validati

E:\Projects\SEACAR_WQ_2024/GIS_Data/shapefiles/IDW_Week\SHP_EB_Sal_ppt_2017_30.shp
3
File SHP_EB_Sal_ppt_2017_30.shp has completed 3 cross-validation iterations.
Processing file: SHP_EB_Sal_ppt_2017_31.shp
E:\Projects\SEACAR_WQ_2024/GIS_Data/shapefiles/IDW_Week\SHP_EB_Sal_ppt_2017_31.shp
3
File SHP_EB_Sal_ppt_2017_31.shp has completed 3 cross-validation iterations.
Processing file: SHP_EB_Sal_ppt_2017_32.shp
E:\Projects\SEACAR_WQ_2024/GIS_Data/shapefiles/IDW_Week\SHP_EB_Sal_ppt_2017_32.shp
2
Not enough data for IDW interpolation in SHP_EB_Sal_ppt_2017_32.shp, skipping
Processing file: SHP_EB_Sal_ppt_2017_33.shp
E:\Projects\SEACAR_WQ_2024/GIS_Data/shapefiles/IDW_Week\SHP_EB_Sal_ppt_2017_33.shp
2
Not enough data for IDW interpolation in SHP_EB_Sal_ppt_2017_33.shp, skipping
Processing file: SHP_EB_Sal_ppt_2017_34.shp
E:\Projects\SEACAR_WQ_2024/GIS_Data/shapefiles/IDW_Week\SHP_EB_Sal_ppt_2017_34.shp
3
File SHP_EB_Sal_ppt_2017_34.shp has completed 3 cross-validation iterations.
Processing f

Processing file: SHP_EB_Turb_ntu_2017_18.shp
E:\Projects\SEACAR_WQ_2024/GIS_Data/shapefiles/IDW_Week\SHP_EB_Turb_ntu_2017_18.shp
2
Not enough data for IDW interpolation in SHP_EB_Turb_ntu_2017_18.shp, skipping
Processing file: SHP_EB_Turb_ntu_2017_19.shp
E:\Projects\SEACAR_WQ_2024/GIS_Data/shapefiles/IDW_Week\SHP_EB_Turb_ntu_2017_19.shp
2
Not enough data for IDW interpolation in SHP_EB_Turb_ntu_2017_19.shp, skipping
Processing file: SHP_EB_Turb_ntu_2017_20.shp
E:\Projects\SEACAR_WQ_2024/GIS_Data/shapefiles/IDW_Week\SHP_EB_Turb_ntu_2017_20.shp
2
Not enough data for IDW interpolation in SHP_EB_Turb_ntu_2017_20.shp, skipping
Processing file: SHP_EB_Turb_ntu_2017_21.shp
E:\Projects\SEACAR_WQ_2024/GIS_Data/shapefiles/IDW_Week\SHP_EB_Turb_ntu_2017_21.shp
2
Not enough data for IDW interpolation in SHP_EB_Turb_ntu_2017_21.shp, skipping
Processing file: SHP_EB_Turb_ntu_2017_22.shp
E:\Projects\SEACAR_WQ_2024/GIS_Data/shapefiles/IDW_Week\SHP_EB_Turb_ntu_2017_22.shp
2
Not enough data for IDW inter

# 4. Monthly Results <a class="anchor" id="month"></a>

### Set the folder path for weekly result

In [None]:
shpCon_folder = gis_path + r"shapefiles/IDW_Month"
idwCon_folder = gis_path + r"raster_output/IDW_Month"

# shpCon_folder = gis_path + r"shapefiles_Con/month"
# idwCon_folder = gis_path + r"idw_Con/month"

In [None]:
area_ab = ["GTM","EB","CH","BB","BBS"]
period_type = [" 52 week"," Month"]
dfConTime["Period"] = ""
def select_data_period1(df,area,period):
    sheet_name = str(area) + str(period)
    df_period_table = pd.read_excel(gis_path + "All_Waterbodies_Season_Month_Week_Definitions.xlsx",sheet_name=sheet_name)
    df_select_area = df[df["WbodyAcronym"]==str(area)]
    df_period_table['Start Date'] = pd.to_datetime(df_period_table['Start Date'])
    df_period_table['End Date']   = pd.to_datetime(df_period_table['End Date'])
    sub_dfs = []

    for index, row in df_period_table.iterrows():
        start_date = row['Start Date']
        end_date   = row['End Date']
        sub_df = df_select_area[(df_select_area['SampleDate'] >= start_date) & (df_select_area['SampleDate'] < end_date)]
        sub_df['Period'] = row["Month"]
        sub_dfs.append(sub_df)
        
    df_period = pd.concat(sub_dfs,ignore_index=True)
    return df_period

In [None]:
sel_month_temp = []
for each in area_ab:
    df_month_temp = select_data_period1(dfConTime,str(each)," Month")
    df_month_temp_group = df_month_temp.groupby(['WaterBody','ParameterName','ParameterUnits',
                                          'Latitude_DD','Longitude_DD','WbodyAcronym',"Period"])["ResultValue"].agg("mean").reset_index()
    sel_month_temp.append(df_month_temp_group)
df_month_select_Mean = pd.concat(sel_month_temp,ignore_index=True)

In [None]:
# Define the EPSG codes for source (EPSG:4326) and target (EPSG:3086) coordinate systems
source_epsg = 'EPSG:4326'
target_epsg = 'EPSG:3086'

# Create a PyProj Transformer for the conversion
transformer = pyproj.Transformer.from_crs(source_epsg, target_epsg, always_xy=True)

# Define a function to apply the transformation to each row of the DataFrame
def transform_coordinates(row):
    x, y = transformer.transform(row['Longitude_DD'], row['Latitude_DD'])
    return pd.Series({'x': x, 'y': y})

# Apply the transformation function to the DataFrame and create new columns for the converted coordinates
df_month_select_Mean[['x', 'y']] = df_month_select_Mean.apply(transform_coordinates, axis=1)

In [None]:
df_month_select_Mean

In [None]:
df_month_select_Mean[(df_month_select_Mean['WbodyAcronym'] == 'GTM') & 
(df_month_select_Mean['ParameterName'] == 'Turbidity') & 
(df_month_select_Mean['Period'] == 9)]

In [None]:
waterBody = ['Big Bend Seagrasses', 'Biscayne Bay', 'Charlotte Harbor', 'Estero Bay', 'Guana Tolomato Matanzas']
parameter = ['Dissolved Oxygen', 'Salinity', 'Turbidity', 'Water Temperature']
waterBody_list = []
parameter_list= []
month_list = []
for i in waterBody:
    for j in parameter:
        for k in range(1,13):
            waterBody_list.append(i)
            parameter_list.append(j)
            month_list.append(k)

In [None]:
df_month_table = pd.DataFrame({
    "WaterBody":waterBody_list,
    "Parameter":parameter_list,
    "Period":month_list
})
df_month_table["Filename"] = ""
df_month_table["NumDataPoints"] = ""
df_month_table["RMSE"] = ""
df_month_table["ME"] = ""
df_month_table["WbodyAcronym"] = df_month_table["WaterBody"].map(area_shortnames)

In [None]:
name_ab = df_month_table["WbodyAcronym"].unique()
para_list = df_month_table["Parameter"].unique()
dfs_sub = []
for each in name_ab:
    sheet_name_use = str(each) + " Month"
    df_period_table_used = pd.read_excel(gis_path + "All_Waterbodies_Season_Month_Week_Definitions.xlsx",sheet_name=sheet_name_use)
    startDate = list(df_period_table_used["Start Date"])
    endDate   = list(df_period_table_used["End Date"])
    for para in para_list:
        df_temp_use = df_month_table[(df_month_table["WbodyAcronym"]==str(each))&(df_month_table["Parameter"]==str(para))]
        df_temp_use["startDate"] = startDate
        df_temp_use["endDate"]   = endDate
        dfs_sub.append(df_temp_use)
df_month_table = pd.concat(dfs_sub)
df_month_table["Year"] = df_month_table["startDate"].dt.year
df_month_table.to_csv(gis_path + 'month_table.csv')

In [None]:
idw_rk_x.fill_nan_rowids(df_month_select_Mean, 'RowID')

df_month_select_Mean['RowID'] = df_month_select_Mean['RowID'].astype(int)
df_month_select_Mean

In [None]:
# idw_rk_x.delete_all_files(shpCon_folder)

In [None]:
# Merge interested with latitude and longitude columns
seasons_con_coord = idw_rk_x.merge_with_lat_long1(df_month_table, df_month_select_Mean)
seasons_con_coord

In [None]:
seasons_con_coord

In [None]:
# seasons_con_coord[(seasons_con_coord['WbodyAcronym'] == 'GTM') & 
# (seasons_con_coord['Parameter'] == 'Turbidity') & (seasons_con_coord['Period'] == 9)]

In [None]:
idw_rk_x.create_shp_season1(seasons_con_coord, shpCon_folder)

In [None]:
# Empty the shapefile folder
# idw_rk_x.delete_all_files(idwCon_folder)

In [None]:
importlib.reload(idw_rk_x)


# If the number of data points is less than 3，skipping calculate IDW
idw_rk_x.idw_interpolation1(df_month_table, shpCon_folder, idwCon_folder, waterbody_extent, barrier_folder)