# Carbon System Data Analysis

---
### Purpose
The purpose of this notebook is to analyze and validate the performance of the Sunburst Sensors, LLC. SAMI-pH (PHSEN) pH, SAMI-pCO$_{2}$ seawater measurements, and the Pro-Oceanus pCO2 sensor measurements at the Global Irminger Array. This is done on a deployment-by-deployment, site-by-site comparison with the pH measurements from discete water samples collected by Niskin Bottle casts during deployment and recovery of the instrumentation during mooring maintainence. 

In [None]:
import os, sys, gc
import json
import yaml
import numpy as np
import pandas as pd
import xarray as xr
import warnings
warnings.filterwarnings("ignore")

In [None]:
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
# Import the OOI M2M tool
sys.path.append("/home/andrew/Documents/OOI-CGSN/ooinet/ooinet/")
from m2m import M2M

In [None]:
sys.path.append("../../OS2022/OS2022/")
from utils import *

#### Set OOINet API access
In order access and download data from OOINet, need to have an OOINet api username and access token. Those can be found on your profile after logging in to OOINet. Your username and access token should NOT be stored in this notebook/python script (for security). It should be stored in a yaml file, kept in the same directory, named user_info.yaml.

In [None]:
# Import user info for connecting to OOINet via M2M
userinfo = yaml.load(open("../../../../QAQC_Sandbox/user_info.yaml"), Loader=yaml.FullLoader)
username = userinfo["apiname"]
token = userinfo["apikey"]

#### Connect to OOINet

In [None]:
OOINet = M2M(username, token)

---
## Discrete Bottle Data

#### Irminger Sea
First, load the bottle data from the Irminger Sea Array into a pandas dataframe. This dataset was prepared using the **```Bottle_Data```** Jupyter notebook. Once the data is loaded, then we'll do some datatype conversions, such as converting the Dates and Times into pandas datetime objects, as well as replacing the fill value of ```-9999999``` with NaNs. Finally, we'll drop the data from the table that doesn't have any DIC, Alkalinity, or pH data to get the relevant carbon system data we'll need for comparisons.

We'll also want to check the associated bottle and discrete sample flags. If the bottle data has been processed using the **```Bottle_Data```** notebook, then the Flags for the Discrete and CTD data parameters have been converted into WOCE standard:
* 1 = good data
* 2 = no evaluation
* 3 = questionable data
* 4 = bad data
* 9 = missing/no data

First, we'll load some functions for interacting witht he bottle data from **```bottle_utils.py```** module.

In [None]:
from bottle_utils import *

In [None]:
bottleData = pd.read_csv("../data/water_sampling/Irminger_Bottle_Data.csv")
bottleData

In [None]:
def convert_times(x):
    if type(x) is str:
        x = pd.to_datetime(x, utc=False)
    else:
        pass
    
    return x

In [None]:
bottleData["Start Time [UTC]"] = bottleData["Start Time [UTC]"].apply(lambda x: convert_times(x))
bottleData["CTD Bottle Closure Time [UTC]"] = bottleData["CTD Bottle Closure Time [UTC]"].apply(lambda x: convert_times(x))

With the flags interpreted to simpler scheme, can now filter the data to keep only the "good" and "suspicious" data. We'll replace the bad or missing data with NaNs so we can drop them in the next step

In [None]:
# ====================================================
# Irminger Sea data
mask = (bottleData["Discrete Alkalinity Flag"] != 1) & (bottleData["Discrete Alkalinity Flag"] != 3)
to_replace = bottleData["Discrete Alkalinity [umol/kg]"][mask].dropna().unique()
bottleData["Discrete Alkalinity [umol/kg]"].replace(to_replace=to_replace, value=np.nan, inplace=True)

mask = (bottleData["Discrete DIC Flag"] != 1) & (bottleData["Discrete DIC Flag"] != 3)
to_replace = bottleData["Discrete DIC [umol/kg]"][mask].dropna().unique()
bottleData["Discrete DIC [umol/kg]"].replace(to_replace=to_replace, value=np.nan, inplace=True)

mask = (bottleData["Discrete pH Flag"] != 1) & (bottleData["Discrete pH Flag"] != 3)
to_replace = bottleData["Discrete pH [Total scale]"][mask].dropna().unique()
bottleData["Discrete pH [Total scale]"].replace(to_replace=to_replace, value=np.nan, inplace=True)

Drop rows with NaNs in the carbon system data, since those don't have any relevant data

In [None]:
carbon_columns = ["Discrete Alkalinity [umol/kg]", "Discrete DIC [umol/kg]", "Discrete pH [Total scale]"]
carbonData = bottleData.dropna(subset=carbon_columns, how="all")

---
## PCO2A

First, we examine the **```PCO2A```** dataset from the Irminger Surface Mooring Buoy. The **```PCO2A```** is the Pro-Oceanus CO2-Pro ATM instrument. It samples both the surface ocean and atmospheric pCO<sub>2</sub>. There is one **```PCO2A```** at the Irminger Array.

Import some custom plotting functions for this notebook

In [None]:
sys.path.append("../../")
from OS2022.OS2022 import plotting

Load the dataset which has been prepared using the **```Identify_and_Download_Data```** notebook

In [None]:
pco2a = xr.open_dataset("../data/GI01SUMO-SBD12-04-PCO2AA000_combined.nc", chunks="auto")
pco2a

In [None]:
# Plot the data as a function of deployments
fig, ax = plotting.plot_variable(pco2a, "partial_pressure_co2_ssw")
ax.set_title("GI01SUMO-SBD12-04-PCO2AA000", fontsize=16)

In [None]:
fig.savefig("../results/GI01SUMO-SBD12-04-PCO2AA000/time_series.png")

### Annotations
Next, request the annotations for the **```PCO2A```** dataset

In [None]:
annotations = OOINet.get_annotations("GI01SUMO-SBD12-04-PCO2AA000")
annotations

#### Add Annotation Information to dataset
With the annotations, use the built-in function from the **```OOINet```** module to add them to the dataset.

In [None]:
pco2a = OOINet.add_annotation_qc_flag(pco2a, annotations)
pco2a

Plot the data with the annotations highlighted

In [None]:
# Plot the data as a function of deployments
fig, ax = plotting.plot_variable(pco2a, "partial_pressure_co2_ssw")

# Plot figure which highlights data points with a corresponding annotation
ax.set_alpha(0.7)
first_legend = ax.get_legend()
first_legend.set_bbox_to_anchor((1, 0.7))
ax.add_artist(first_legend)

s, = ax.plot(pco2a.time.where(pco2a.rollup_annotations_qc_results == 9), 
             pco2a.partial_pressure_co2_ssw.where(pco2a.rollup_annotations_qc_results == 9), 
             marker="o", linestyle="", color="yellow", label="Annotation")

ax.legend(handles=[s], edgecolor="black", loc="lower left", bbox_to_anchor=(1, 0.2), fontsize=12)
ax.set_title("GI01SUMO-SBD12-04-PCO2AA000", fontsize=16)

In [None]:
# fig.savefig("../results/GI01SUMO-SBD12-04-PCO2AA000/time_series_with_annotations.png")

#### Filter Annotations
Finally, we can remove the data flagged by the annotations from the dataset. However, for the purposes of this analysis, I chose to *not* remove the flagged data.

In [None]:
# pco2a = pco2a.where(pco2a.rollup_annotations_qc_results != 9, drop=True)
# pco2a

In [None]:
# fig, ax = plotting.plot_variable(pco2a, "partial_pressure_co2_ssw")

#### Match Bottle Data with PCO2A Data
Now, we want to identify the Bottle Data which corresponds to **```PCO2A```** datasets and match the bottle data to the **```PCO2A```** data. First, we request the deployment information for the **```PCO2A```** dataset in order to get the latitude, longitude, and depth of the instrument. Then, we filter the bottle data for data which was collected within 5 km horizontally and 15 m vertically of the **```PCO2A```** data. Finally, we generate a table of comparing the bottle data with the **```PCO2A```**. We do this by iterating over each row of the identified bottle data. For each bottle sample, we then take an average of the **```PCO2A```** pCO2 which occured within a week of the bottle sample to get the instrument pCO2.

In [None]:
deployments = OOINet.get_deployments("GI01SUMO-SBD12-04-PCO2AA000")
deployments

In [None]:
lat, lon, depth = deployments["latitude"].mean(), deployments["longitude"].mean(), deployments["depth"].mean()
lat, lon, depth

Identify the bottle samples which are associated with samples collected near the PHSEN instrument we are looking at. Here I utilize a horizontal maximum distance of 5 kilometers and a depth theshold of +/- 10 meters.

In [None]:
pco2a_bottles = findSamples(carbonData, (lat, lon), depth, 5, 15)
pco2a_bottles

In [None]:
fig, ax = plotting.plot_variable(pco2a, "partial_pressure_co2_ssw")
ax.set_alpha(0.7)
first_legend = ax.get_legend()
first_legend.set_bbox_to_anchor((1, 0.7))
ax.add_artist(first_legend)
xmin, xmax = ax.get_xlim()

s, = ax.plot(pco2a_bottles.dropna(subset=["Calculated pCO2 [uatm]"])["Start Time [UTC]"], 
             pco2a_bottles.dropna(subset=["Calculated pCO2 [uatm]"])["Calculated pCO2 [uatm]"], 
             marker="o", color="yellow", linestyle="", markersize=12, markeredgecolor = "black",
             label="Bottle\npCO$_{2}$")

ax.legend(handles=[s], edgecolor="black", loc="lower left", bbox_to_anchor=(1, 0.2), fontsize=12)
ax.set_xlim(xmin, xmax)
ax.set_title("Apex Surface Mooring PCO2A", fontsize=20, weight="bold")
plt.xticks(fontsize=16, weight="bold")
plt.xlabel("")
plt.yticks(fontsize=16, weight="bold")
ylabel = ax.get_ylabel()
plt.ylabel(ylabel, fontsize=20, weight="bold")

In [None]:
ax.get_legend_handles_labels()[0]

In [None]:
fig.savefig("../results/GI01SUMO-SBD12-04-PCO2AA000/time_series_with_discrete_bottles.png", facecolor="white", transparent=False)

Create a time-based filter to get the data from the instrument associated with each of the bottle time stamps so we can do a direct comparison

In [None]:
pco2a_comparison = pd.DataFrame(columns=["deploymentNumber", "time", "bottleCO2", "bottleT", "bottleS", "bottleP",
                                         "CO2_avg", "CO2_std", "T", "S"])

for ind in pco2a_bottles.index:
    # Get the time stamps of a bottle
    t = pco2a_bottles["Start Time [UTC]"].loc[ind]
    t = t.tz_localize(None)

    # Get the bottle pCO2a
    bottleCO2 = pco2a_bottles["Calculated pCO2 [uatm]"].loc[ind]
    bottleT = pco2a_bottles[["CTD Temperature 1 [deg C]", "CTD Temperature 2 [deg C]"]].loc[ind].mean(skipna=True)
    bottleS = pco2a_bottles[["CTD Salinity 1 [psu]", "CTD Salinity 2 [psu]"]].loc[ind].mean(skipna=True)
    bottleP = pco2a_bottles["CTD Pressure [db]"].loc[ind]

    # Create a time window to look for data
    dt = pd.Timedelta(days=7)
    tmin = t - dt
    tmax = t + dt

    # Find the data within the time series
    subset = pco2a.where((pco2a.time >= tmin) & (pco2a.time <= tmax), drop=True)
    # Check for unique deployments
    deps = np.unique(subset.deployment)
    # If more than one deployment, split it based on deployments
    if len(deps) > 1:
        for depNum in deps:
            dep_subset = subset.where(subset.deployment == depNum, drop=True)
            pco2_avg = dep_subset.partial_pressure_co2_ssw.mean().compute().values
            pco2_std = dep_subset.partial_pressure_co2_ssw.std().compute().values
            pco2_temp = dep_subset.sea_surface_temperature.mean().compute().values
            pco2_sal = dep_subset.met_salsurf.mean().compute().values
            pco2a_comparison = pco2a_comparison.append({
                "deploymentNumber": depNum,
                "time": t,
                "bottleCO2": bottleCO2,
                "bottleT": bottleT,
                "bottleS": bottleS,
                "bottleP": bottleP,
                "CO2_avg": pco2_avg,
                "CO2_std": pco2_std,
                "T": pco2_temp,
                "S": pco2_sal
            }, ignore_index=True)
    elif len(deps) == 1:
        pco2_avg = subset.partial_pressure_co2_ssw.mean().compute().values
        pco2_std = subset.partial_pressure_co2_ssw.std().compute().values
        pco2_temp = subset.sea_surface_temperature.mean().compute().values
        pco2_sal = subset.met_salsurf.mean().compute().values
        depNum = deps[0]
        pco2a_comparison = pco2a_comparison.append({
                "deploymentNumber": depNum,
                "time": t,
                "bottleCO2": bottleCO2,
                "bottleT": bottleT,
                "bottleS": bottleS,
                "bottleP": bottleP,
                "CO2_avg": pco2_avg,
                "CO2_std": pco2_std,
                "T": pco2_temp,
                "S": pco2_sal
            }, ignore_index=True)
    else:
        # Now put in NaNs for the missing results data
        pco2a_comparison = pco2a_comparison.append({
                "deploymentNumber": None,
                "time": t,
                "bottleCO2": bottleCO2,
                "bottleT": bottleT,
                "bottleS": bottleS,
                "bottleP": bottleP,
                "CO2_avg": None,
                "CO2_std": None,
                "T": None,
                "S": None
            }, ignore_index=True)

pco2a_comparison = pco2a_comparison.append({
                "deploymentNumber": 2,
                "time": None,
                "bottleCO2": None,
                "bottleT": None,
                "bottleS": None,
                "bottleP": None,
                "CO2_avg": None,
                "CO2_std": None,
                "T": None,
                "S": None
            }, ignore_index=True)
# Save the results
pco2a_comparison = pco2a_comparison.sort_values(by="deploymentNumber")
pco2a_comparison

Plot the matched data

In [None]:
df = pco2a_comparison.dropna(subset=["deploymentNumber"])
df["bottleCO2"] = df["bottleCO2"].astype(float)
df["CO2_avg"] = df["CO2_avg"].astype(float)
df["CO2_std"] = df["CO2_std"].astype(float)
#df = df[df["deploymentNumber"] != 3]

fig, ax = plt.subplots(figsize=(8,8))

# Set the 
levels, categories = pd.factorize(df['deploymentNumber'])
colors = [plt.cm.tab10(i) for i in levels] # using the "tab10" colormap
handles = [matplotlib.patches.Patch(color=plt.cm.tab10(i), label=c) for i, c in enumerate(categories)]
for h in handles:
    h.set_edgecolor("black")

# Plot the data
ax.scatter(df["bottleCO2"], df["CO2_avg"], c=colors, s=80, edgecolors="black")
ax.set_xlabel(xlabel="Bottle pCO$_{2}$", fontsize=20, weight="bold")
ax.set_ylabel(ylabel="PCO2W pCO$_{2}$", fontsize=20, weight="bold")
ax.legend(handles=handles, title='Deployment', fontsize=12, edgecolor="black")
ax.grid()

first_legend = ax.get_legend()
ax.add_artist(first_legend)

# Add in vertical error bars
for i, c in enumerate(categories):
    x = df["bottleCO2"][df["deploymentNumber"] == c]
    ymin = df["CO2_avg"][df["deploymentNumber"] == c] - 2*df["CO2_std"][df["deploymentNumber"] == c]
    ymax = df["CO2_avg"][df["deploymentNumber"] == c] + 2*df["CO2_std"][df["deploymentNumber"] == c]
    ax.vlines(x, ymin, ymax, colors = plt.cm.tab10(i))

#ax.set_ylim((250, 500))
#ax.set_xlim((350, 450))
# Add in a 1:1 line
x = np.arange(280, 405, 5)
y = np.arange(280, 405, 5)
s, = ax.plot(x, y, color = "black", alpha=0.7, label="1:1 Line")
ax.legend(handles=[s], edgecolor="black", loc="lower right", fontsize=12)
ax.set_title("Bottle vs. Instrument Comparison", fontsize=20, weight="bold")
plt.yticks(fontsize=12, weight="bold")
plt.xticks(fontsize=12, weight="bold")

In [None]:
fig.savefig("../results/GI01SUMO-SBD12-04-PCO2AA000/instrument_vs_bottle_co2.png", facecolor="white", transparent=False)

#### Time series analysis
This is a little snippet of code if you are interested in running a basic stationarity-test for a given time series. Experience with many datasets of carbon system data from OOI leads me to suggests an ARIMA(1,1,0) model as the best model for analysis and a good place to start.

In [None]:
from statsmodels.tsa.stattools import adfuller

def adfuller_test(sales):
    result=adfuller(sales)
    labels = ['ADF Test Statistic','p-value','#Lags Used','Number of Observations Used']
    for value,label in zip(result,labels):
        print(label+' : '+str(value) )
    if result[1] <= 0.05:
        print("P value is less than 0.05 that means we can reject the null hypothesis(Ho). Therefore we can conclude that data has no unit root and is stationary")
    else:
        print("Weak evidence against null hypothesis that means time series has a unit root which indicates that it is non-stationary ")

In [None]:
subset = pco2a.where(pco2a.deployment == 3, drop=True)
subset

In [None]:
adfuller_test(subset["partial_pressure_co2_ssw"])

In [None]:
from statsmodels.tsa.arima_model import ARIMA
import statsmodels.api as sm

In [None]:
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

In [None]:
fig = plt.figure(figsize=(12, 8))
ax1 = fig.add_subplot(2,1,1)
fig = plot_acf(subset.partial_pressure_co2_ssw.diff(dim="time"), lags=40, ax=ax1)
ax2 = fig.add_subplot(2,1,2)
fig = plot_pacf(subset.partial_pressure_co2_ssw.diff(dim="time"), lags=40, ax=ax2)

We'll calculate the noise term using a first-difference estimate to get the standard deviation of the data

---
## Instrument Datasets

Now, repeat for the remaing PHSEN and PCO2W datasets outlined in the tables in the introduction. This section is not as "step-through" as above, but the idea is you know have a good idea of the processing steps.

In [None]:
from OS2022.OS2022 import process_pco2w, process_phsen

In [None]:
# Set the reference designator
refdes = "GI01SUMO-RII11-02-PCO2WC051"

# Load the PCO2W
pco2w_40m = xr.open_dataset("../data/GI01SUMO-RII11-02-PCO2WC051_combined.nc", chunks="auto")

# Request the annotations
annotations = OOINet.get_annotations("GI01SUMO-RII11-02-PCO2WC051")

# Add the annotations to the dataset
pco2w_40m = OOINet.add_annotation_qc_flag(pco2w_40m, annotations)

# Remove the data flagged as "questionable" from the annotations
# pco2w_40m = pco2w_40m.where(pco2w_40m.rollup_annotations_qc_results != 9, drop=True)

# Plot the first look
fig, ax = plotting.plot_variable(pco2w_40m, "pco2_seawater")
ax.set_title(refdes, fontsize=16)

Save the figure

In [None]:
if not os.path.exists(f"../results/{refdes}/"):
    os.makedirs(f"../results/{refdes}/")

fig.savefig(f"../results/{refdes}/time_series.png")

#### Add in the CTD Data

In [None]:
ctdbpp_40m = xr.open_dataset("../data/GI01SUMO-RII11-02-CTDBPP031_combined.nc", chunks="auto")

In [None]:
practical_salinity = ctdbpp_40m["practical_salinity"].interp_like(pco2w_40m)
seawater_temperature = ctdbpp_40m["ctdbp_seawater_temperature"].interp_like(pco2w_40m)
seawater_pressure = ctdbpp_40m["ctdbp_seawater_pressure"].interp_like(pco2w_40m)
temp = ctdbpp_40m["temp"]

# Add to the pco2w
pco2w_40m["practical_salinity"] = practical_salinity
pco2w_40m["seawater_temperature"] = seawater_temperature
pco2w_40m["seawater_pressure"] = seawater_pressure
pco2w_40m["temp"] = temp

#### Quality Checks
Next, add in the quality checks for the data

In [None]:
qc_flags = process_pco2w.quality_checks(pco2w_40m)
pco2w_40m["qc_flags"] = qc_flags

In [None]:
fig, ax = plotting.plot_variable(pco2w_40m, "pco2_seawater")

# Plot figure which highlights data points with a corresponding annotation
ax.set_alpha(0.7)
first_legend = ax.get_legend()
first_legend.set_bbox_to_anchor((1, 0.7))
ax.add_artist(first_legend)

s1, = ax.plot(pco2w_40m.time.where(pco2w_40m.qc_flags == 3), 
             pco2w_40m.pco2_seawater.where(pco2w_40m.qc_flags == 3), 
             marker="s", linestyle="", color="yellow", label="Suspect")
s2, = ax.plot(pco2w_40m.time.where(pco2w_40m.qc_flags == 4), 
             pco2w_40m.pco2_seawater.where(pco2w_40m.qc_flags == 4), 
             marker="s", linestyle="", color="tab:red", label="Bad")

ax.legend(handles=[s1, s2], edgecolor="black", loc="lower left", bbox_to_anchor=(1, 0.2), fontsize=12)

In [None]:
if not os.path.exists(f"../results/{refdes}/"):
    os.makedirs(f"../results/{refdes}/")

fig.savefig(f"../results/{refdes}/time_series_with_quality_flags.png")

Find the associated bottle samples with this dataset:

In [None]:
deployments = OOINet.get_deployments("GI01SUMO-RII11-02-PCO2WC051")

lat, lon, depth = deployments["latitude"].mean(), deployments["longitude"].mean(), deployments["depth"].mean()
lat, lon, depth

In [None]:
pco2w_40m_bottles = findSamples(carbonData, (lat, lon), depth, 5, 20)

In [None]:
fig, ax = plotting.plot_variable(pco2w_40m, "pco2_seawater")
ax.set_alpha(0.7)
first_legend = ax.get_legend()
first_legend.set_bbox_to_anchor((1, 0.7))
ax.add_artist(first_legend)
xmin, xmax = ax.get_xlim()

s, = ax.plot(pco2w_40m_bottles.dropna(subset=["Calculated pCO2 [uatm]"])["Start Time [UTC]"], 
             pco2w_40m_bottles.dropna(subset=["Calculated pCO2 [uatm]"])["Calculated pCO2 [uatm]"], 
             marker="o", color="yellow", linestyle="", markersize=12, markeredgecolor = "black",
             label="Bottle pCO$_{2}$")

ax.legend(handles=[s], edgecolor="black", loc="lower left", bbox_to_anchor=(1, 0.2), fontsize=12)
ax.set_xlim(xmin, xmax)

In [None]:
if not os.path.exists(f"../results/{refdes}/"):
    os.makedirs(f"../results/{refdes}/")

fig.savefig(f"../results/{refdes}/time_series_with_discrete_bottles.png")

In [None]:
pco2w_40m_comparison = pd.DataFrame(columns=["deploymentNumber", "time", "CO2_avg", "CO2_std", "T", "S", "P",
                                             "bottleCO2", "bottleT", "bottleS", "bottleP"])

for ind in pco2w_40m_bottles.index:
    # Get the time stamps of a bottle
    t = pco2w_40m_bottles["Start Time [UTC]"].loc[ind]
    t = t.tz_localize(None)

    # Get the bottle pCO2a
    bottleCO2 = pco2w_40m_bottles["Calculated pCO2 [uatm]"].loc[ind]
    bottleT = pco2w_40m_bottles[["CTD Temperature 1 [deg C]", "CTD Temperature 2 [deg C]"]].loc[ind].mean(skipna=True)
    bottleS = pco2w_40m_bottles[["CTD Salinity 1 [psu]", "CTD Salinity 2 [psu]"]].loc[ind].mean(skipna=True)
    bottleP = pco2w_40m_bottles["CTD Pressure [db]"].loc[ind]

    # Create a time window to look for data
    dt = pd.Timedelta(days=7)
    tmin = t - dt
    tmax = t + dt

    # Find the data within the time series
    subset = pco2w_40m.where((pco2w_40m.time >= tmin) & (pco2w_40m.time <= tmax), drop=True)
    # Check for unique deployments
    deps = np.unique(subset.deployment)
    # If more than one deployment, split it based on deployments
    if len(deps) > 1:
        for depNum in deps:
            dep_subset = subset.where(subset.deployment == depNum, drop=True)
            pco2_avg = dep_subset.pco2_seawater.mean(skipna=True).compute().values
            pco2_std = dep_subset.pco2_seawater.std(skipna=True).compute().values
            pco2_temp = dep_subset.seawater_temperature.mean(skipna=True).compute().values
            pco2_sal = dep_subset.practical_salinity.mean(skipna=True).compute().values
            pco2_pres = dep_subset.seawater_pressure.mean(skipna=True).compute().values
            pco2w_40m_comparison = pco2w_40m_comparison.append({
                "deploymentNumber": depNum,
                "time": t,
                "CO2_avg": pco2_avg,
                "CO2_std": pco2_std,
                "T": pco2_temp,
                "S": pco2_sal,
                "P": pco2_pres,
                "bottleCO2": bottleCO2,
                "bottleT": bottleT,
                "bottleS": bottleS,
                "bottleP": bottleP,
            }, ignore_index=True)
    elif len(deps) == 1:
        pco2_avg = subset.pco2_seawater.mean(skipna=True).compute().values
        pco2_std = subset.pco2_seawater.std(skipna=True).compute().values
        pco2_temp = subset.seawater_temperature.mean(skipna=True).compute().values
        pco2_sal = subset.practical_salinity.mean(skipna=True).compute().values
        pco2_pres = subset.seawater_pressure.mean(skipna=True).compute().values
        depNum = deps[0]
        pco2w_40m_comparison = pco2w_40m_comparison.append({
                "deploymentNumber": depNum,
                "time": t,
                "CO2_avg": pco2_avg,
                "CO2_std": pco2_std,
                "T": pco2_temp,
                "S": pco2_sal,
                "P": pco2_pres,
                "bottleCO2": bottleCO2,
                "bottleT": bottleT,
                "bottleS": bottleS,
                "bottleP": bottleP,
            }, ignore_index=True)
    else:
        # Now put in NaNs for the missing results data
        pco2w_40m_comparison = pco2w_40m_comparison.append({
                "deploymentNumber": None,
                "time": t,
                "CO2_avg": None,
                "CO2_std": None,
                "T": None,
                "S": None,
                "P": None,
                "bottleCO2": bottleCO2,
                "bottleT": bottleT,
                "bottleS": bottleS,
                "bottleP": bottleP,
            }, ignore_index=True)

# Save the results
pco2w_40m_comparison = pco2w_40m_comparison.sort_values(by="deploymentNumber")
pco2w_40m_comparison

In [None]:
# Drop deployment 2 data
pco2w_40m_comparison = pco2w_40m_comparison.drop(labels=[2, 4])
pco2w_40m_comparison

In [None]:
df = pco2w_40m_comparison.dropna(subset=["deploymentNumber"])
df["bottleCO2"] = df["bottleCO2"].astype(float)
df["CO2_avg"] = df["CO2_avg"].astype(float)
df["CO2_std"] = df["CO2_std"].astype(float)
#df = df[df["deploymentNumber"] != 3]

fig, ax = plt.subplots(figsize=(8,8))

# Set the 
levels, categories = pd.factorize(df['deploymentNumber'])
colors = [plt.cm.tab10(i) for i in levels] # using the "tab10" colormap
handles = [matplotlib.patches.Patch(color=plt.cm.tab10(i), label=c) for i, c in enumerate(categories)]
for h in handles:
    h.set_edgecolor("black")

# Plot the data
ax.scatter(df["bottleCO2"], df["CO2_avg"], c=colors, s=80, edgecolors="black")
ax.set_xlabel(xlabel="Bottle pCO$_{2}$", fontsize=14)
ax.set_ylabel(ylabel="PCO2W pCO$_{2}$", fontsize=14)
ax.legend(handles=handles, title='Deployment', fontsize=12, edgecolor="black")
ax.grid()

# Add in vertical error bars
for i, c in enumerate(categories):
    x = df["bottleCO2"][df["deploymentNumber"] == c]
    ymin = df["CO2_avg"][df["deploymentNumber"] == c] - 2*df["CO2_std"][df["deploymentNumber"] == c]
    ymax = df["CO2_avg"][df["deploymentNumber"] == c] + 2*df["CO2_std"][df["deploymentNumber"] == c]
    ax.vlines(x, ymin, ymax, colors = plt.cm.tab10(i))

ax.set_ylim((280, 455))
ax.set_xlim((280, 455))
# Add in a 1:1 line
x = np.arange(280, 455, 5)
y = np.arange(280, 455, 5)
ax.plot(x, y, color = "black", alpha=0.7)
ax.set_title(refdes + " pCO$_{2}$ Comparison", fontsize=16)

In [None]:
fig.savefig(f"../results/{refdes}/instrument_vs_bottle_co2.png")

#### PCO2W 80 m

In [None]:
refdes = "GI01SUMO-RII11-02-PCO2W052"

pco2w_80m = xr.open_dataset("../data/GI01SUMO-RII11-02-PCO2WC052_combined.nc", chunks="auto")

annotations = OOINet.get_annotations("GI01SUMO-RII11-02-PCO2WC052")

pco2w_80m = OOINet.add_annotation_qc_flag(pco2w_80m, annotations)

#pco2w_80m = pco2w_80m.where(pco2w_80m.rollup_annotations_qc_results != 9, drop=True)

fig, ax = plotting.plot_variable(pco2w_80m, "pco2_seawater")


In [None]:
if not os.path.exists(f"../results/{refdes}/"):
    os.makedirs(f"../results/{refdes}/")

fig.savefig(f"../results/{refdes}/time_series.png")

#### Add the CTD Data

In [None]:
ctdbpp_80m = xr.open_dataset("../data/GI01SUMO-RII11-02-CTDBPP032_combined.nc", chunks="auto")

In [None]:
practical_salinity = ctdbpp_80m["practical_salinity"].interp_like(pco2w_80m)
seawater_temperature = ctdbpp_80m["ctdbp_seawater_temperature"].interp_like(pco2w_80m)
seawater_pressure = ctdbpp_80m["ctdbp_seawater_pressure"].interp_like(pco2w_80m)

# Add to the pco2w
pco2w_80m["practical_salinity"] = practical_salinity
pco2w_80m["seawater_temperature"] = seawater_temperature
pco2w_80m["seawater_pressure"] = seawater_pressure

#### Quality Checks
Next, add in the quality checks for the data

In [None]:
qc_flags = process_pco2w.quality_checks(pco2w_80m)
pco2w_80m["qc_flags"] = qc_flags

In [None]:
fig, ax = plotting.plot_variable(pco2w_80m, "pco2_seawater")

# Plot figure which highlights data points with a corresponding annotation
ax.set_alpha(0.7)
first_legend = ax.get_legend()
first_legend.set_bbox_to_anchor((1, 0.7))
ax.add_artist(first_legend)

s1, = ax.plot(pco2w_80m.time.where(pco2w_80m.qc_flags == 3), 
             pco2w_80m.pco2_seawater.where(pco2w_80m.qc_flags == 3), 
             marker="s", linestyle="", color="yellow", label="Suspect")
s2, = ax.plot(pco2w_80m.time.where(pco2w_80m.qc_flags == 4), 
             pco2w_80m.pco2_seawater.where(pco2w_80m.qc_flags == 4), 
             marker="s", linestyle="", color="tab:red", label="Bad")

ax.legend(handles=[s1, s2], edgecolor="black", loc="lower left", bbox_to_anchor=(1, 0.2), fontsize=12)
ax.set_title("GI01SUMO-RII11-02-PCO2WC052", fontsize=16)
#ax.set_ylim((200, 600))

In [None]:
if not os.path.exists(f"../results/{refdes}/"):
    os.makedirs(f"../results/{refdes}/")

fig.savefig(f"../results/{refdes}/time_series_with_quality_flags.png")

Dump the bad data, keep the "questionable" data

In [None]:
#pco2w_80m = pco2w_80m.where(pco2w_80m.qc_flags != 4, drop=True)

Find the associated bottle data

In [None]:
deployments = OOINet.get_deployments("GI01SUMO-RII11-02-PCO2WC052")

lat, lon, depth = deployments["latitude"].mean(), deployments["longitude"].mean(), deployments["depth"].mean()

lat, lon, depth

In [None]:
pco2w_80m_bottles = findSamples(carbonData, (lat, lon), depth, 5, 20)
pco2w_80m_bottles

In [None]:
fig, ax = plotting.plot_variable(pco2w_80m, "pco2_seawater")
ax.set_alpha(0.7)
first_legend = ax.get_legend()
first_legend.set_bbox_to_anchor((1, 0.7))
ax.add_artist(first_legend)
xmin, xmax = ax.get_xlim()

s, = ax.plot(pco2w_80m_bottles.dropna(subset=["Calculated pCO2 [uatm]"])["Start Time [UTC]"], 
             pco2w_80m_bottles.dropna(subset=["Calculated pCO2 [uatm]"])["Calculated pCO2 [uatm]"], 
             marker="o", color="yellow", linestyle="", markersize=12, markeredgecolor = "black",
             label="Bottle pCO$_{2}$")

ax.legend(handles=[s], edgecolor="black", loc="lower left", bbox_to_anchor=(1, 0.2), fontsize=12)
ax.set_xlim(xmin, xmax)

In [None]:
if not os.path.exists(f"../results/{refdes}/"):
    os.makedirs(f"../results/{refdes}/")

fig.savefig(f"../results/{refdes}/time_series_with_discrete_bottles.png")

In [None]:
pco2w_80m_comparison = pd.DataFrame(columns=["deploymentNumber", "time", "CO2_avg", "CO2_std", "T", "S", "P",
                                            "bottleCO2", "bottleT", "bottleS", "bottleP"])

for ind in pco2w_80m_bottles.index:
    # Get the time stamps of a bottle
    t = pco2w_80m_bottles["Start Time [UTC]"].loc[ind]
    t = t.tz_localize(None)

    # Get the bottle pCO2a
    bottleCO2 = pco2w_80m_bottles["Calculated pCO2 [uatm]"].loc[ind]
    bottleT = pco2w_80m_bottles[["CTD Temperature 1 [deg C]", "CTD Temperature 2 [deg C]"]].loc[ind].mean(skipna=True)
    bottleS = pco2w_80m_bottles[["CTD Salinity 1 [psu]", "CTD Salinity 2 [psu]"]].loc[ind].mean(skipna=True)
    bottleP = pco2w_80m_bottles["CTD Pressure [db]"].loc[ind]

    # Create a time window to look for data
    dt = pd.Timedelta(days=7)
    tmin = t - dt
    tmax = t + dt

    # Find the data within the time series
    subset = pco2w_80m.where((pco2w_80m.time >= tmin) & (pco2w_80m.time <= tmax), drop=True)
    # Check for unique deployments
    deps = np.unique(subset.deployment)
    # If more than one deployment, split it based on deployments
    if len(deps) > 1:
        for depNum in deps:
            dep_subset = subset.where(subset.deployment == depNum, drop=True)
            pco2_avg = dep_subset.pco2_seawater.mean().compute().values
            pco2_std = dep_subset.pco2_seawater.std().compute().values
            pco2_temp = dep_subset.seawater_temperature.mean().compute().values
            pco2_sal = dep_subset.practical_salinity.mean().compute().values
            pco2_pres = dep_subset.seawater_pressure.mean().compute().values
            pco2w_80m_comparison = pco2w_80m_comparison.append({
                    "deploymentNumber": depNum,
                    "time": t,
                    "CO2_avg": pco2_avg,
                    "CO2_std": pco2_std,
                    "T": pco2_temp,
                    "S": pco2_sal,
                    "P": pco2_pres,
                    "bottleCO2": bottleCO2,
                    "bottleT": bottleT,
                    "bottleS": bottleS,
                    "bottleP": bottleP,
                }, ignore_index=True)
    elif len(deps) == 1:
        pco2_avg = subset.pco2_seawater.mean().compute().values
        pco2_std = subset.pco2_seawater.std().compute().values
        pco2_temp = subset.seawater_temperature.mean().compute().values
        pco2_sal = subset.practical_salinity.mean().compute().values
        pco2_pres = subset.seawater_pressure.mean().compute().values
        depNum = deps[0]
        pco2w_80m_comparison = pco2w_80m_comparison.append({
                "deploymentNumber": depNum,
                "time": t,
                "CO2_avg": pco2_avg,
                "CO2_std": pco2_std,
                "T": pco2_temp,
                "S": pco2_sal,
                "P": pco2_pres,
                "bottleCO2": bottleCO2,
                "bottleT": bottleT,
                "bottleS": bottleS,
                "bottleP": bottleP,
            }, ignore_index=True)
    else:
        # Now put in NaNs for the missing results data
        pco2w_80m_comparison = pco2w_80m_comparison.append({
                "deploymentNumber": None,
                "time": t,
                "CO2_avg": None,
                "CO2_std": None,
                "T": None,
                "S": None,
                "P": None,
                "bottleCO2": bottleCO2,
                "bottleT": bottleT,
                "bottleS": bottleS,
                "bottleP": bottleP,
            }, ignore_index=True)

# Save the results
pco2w_80m_comparison = pco2w_80m_comparison.sort_values(by="deploymentNumber")
pco2w_80m_comparison

In [None]:
pco2w_80m_comparison.loc[0,"deploymentNumber"] = 3
pco2w_80m_comparison.loc[1,"deploymentNumber"] = 4
pco2w_80m_comparison

In [None]:
df = pco2w_80m_comparison.dropna(subset=["deploymentNumber"])
df["bottleCO2"] = df["bottleCO2"].astype(float)
df["CO2_avg"] = df["CO2_avg"].astype(float)
df["CO2_std"] = df["CO2_std"].astype(float)
#df = df[df["deploymentNumber"] != 3]

fig, ax = plt.subplots(figsize=(8,8))

# Set the 
levels, categories = pd.factorize(df['deploymentNumber'])
colors = [plt.cm.tab10(i) for i in levels] # using the "tab10" colormap
handles = [matplotlib.patches.Patch(color=plt.cm.tab10(i), label=c) for i, c in enumerate(categories)]
for h in handles:
    h.set_edgecolor("black")

# Plot the data
ax.scatter(df["bottleCO2"], df["CO2_avg"], c=colors, s=80, edgecolors="black")
ax.set_xlabel(xlabel="Bottle pCO$_{2}$", fontsize=14)
ax.set_ylabel(ylabel="PCO2W pCO$_{2}$", fontsize=14)
ax.legend(handles=handles, title='Deployment', fontsize=12, edgecolor="black")
ax.grid()

# Add in vertical error bars
for i, c in enumerate(categories):
    x = df["bottleCO2"][df["deploymentNumber"] == c]
    ymin = df["CO2_avg"][df["deploymentNumber"] == c] - 2*df["CO2_std"][df["deploymentNumber"] == c]
    ymax = df["CO2_avg"][df["deploymentNumber"] == c] + 2*df["CO2_std"][df["deploymentNumber"] == c]
    ax.vlines(x, ymin, ymax, colors = plt.cm.tab10(i))

ax.set_ylim((280, 500))
ax.set_xlim((280, 500))
# Add in a 1:1 line
x = np.arange(280, 505, 5)
y = np.arange(280, 505, 5)
ax.plot(x, y, color = "black", alpha=0.7)
ax.set_title(refdes + " pCO$_{2}$ Comparison", fontsize=16)

In [None]:
fig.savefig(f"../results/{refdes}/instrument_vs_bottle_co2.png")

#### PCO2W 130m

In [None]:
refdes = "GI01SUMO-RII11-02-PCO2WC053"

pco2w_130m = xr.open_dataset(f"../data/{refdes}_combined.nc", chunks="auto")

annotations = OOINet.get_annotations(refdes)

pco2w_130m = OOINet.add_annotation_qc_flag(pco2w_130m, annotations)

#pco2w_130m = pco2w_130m.where(pco2w_130m.rollup_annotations_qc_results != 9, drop=True)

fig, ax = plotting.plot_variable(pco2w_130m, "pco2_seawater")

In [None]:
if not os.path.exists(f"../results/{refdes}/"):
    os.makedirs(f"../results/{refdes}/")

fig.savefig(f"../results/{refdes}/time_series.png")

#### Add CTD Data

In [None]:
ctdbpp_130m = xr.open_dataset("../data/GI01SUMO-RII11-02-CTDBPP033_combined.nc", chunks="auto")

In [None]:
practical_salinity = ctdbpp_130m["practical_salinity"].interp_like(pco2w_130m)
seawater_temperature = ctdbpp_130m["ctdbp_seawater_temperature"].interp_like(pco2w_130m)
seawater_pressure = ctdbpp_130m["ctdbp_seawater_pressure"].interp_like(pco2w_130m)

# Add to the pco2w
pco2w_130m["practical_salinity"] = practical_salinity
pco2w_130m["seawater_temperature"] = seawater_temperature
pco2w_130m["seawater_pressure"] = seawater_pressure

#### Quality Checks
Next, add in the quality checks for the data

In [None]:
qc_flags = process_pco2w.quality_checks(pco2w_130m)
pco2w_130m["qc_flags"] = qc_flags

In [None]:
fig, ax = plotting.plot_variable(pco2w_130m, "pco2_seawater")

# Plot figure which highlights data points with a corresponding annotation
ax.set_alpha(0.7)
first_legend = ax.get_legend()
first_legend.set_bbox_to_anchor((1, 0.7))
ax.add_artist(first_legend)

s1, = ax.plot(pco2w_130m.time.where(pco2w_130m.qc_flags == 3), 
             pco2w_130m.pco2_seawater.where(pco2w_130m.qc_flags == 3), 
             marker="s", linestyle="", color="yellow", label="Suspect")
s2, = ax.plot(pco2w_130m.time.where(pco2w_130m.qc_flags == 4), 
             pco2w_130m.pco2_seawater.where(pco2w_130m.qc_flags == 4), 
             marker="s", linestyle="", color="tab:red", label="Bad")

ax.legend(handles=[s1, s2], edgecolor="black", loc="lower left", bbox_to_anchor=(1, 0.2), fontsize=12)

In [None]:
if not os.path.exists(f"../results/{refdes}/"):
    os.makedirs(f"../results/{refdes}/")

fig.savefig(f"../results/{refdes}/time_series_with_quality_flags.png")

Dump the bad data, keep the "questionable" data

In [None]:
#pco2w_130m = pco2w_130m.where(pco2w_130m.qc_flags != 4, drop=True)

In [None]:
deployments = OOINet.get_deployments(refdes)

lat, lon, depth = deployments["latitude"].mean(), deployments["longitude"].mean(), deployments["depth"].mean()
lat, lon, depth

In [None]:
pco2w_130m_bottles = findSamples(carbonData, (lat, lon), depth, 5, 20)
pco2w_130m_bottles["CTD Pressure [db]"]

In [None]:
fig, ax = plotting.plot_variable(pco2w_130m, "pco2_seawater")
ax.set_alpha(0.7)
first_legend = ax.get_legend()
first_legend.set_bbox_to_anchor((1, 0.7))
ax.add_artist(first_legend)
xmin, xmax = ax.get_xlim()

s, = ax.plot(pco2w_130m_bottles.dropna(subset=["Calculated pCO2 [uatm]"])["Start Time [UTC]"], 
             pco2w_130m_bottles.dropna(subset=["Calculated pCO2 [uatm]"])["Calculated pCO2 [uatm]"], 
             marker="o", color="yellow", linestyle="", markersize=12, markeredgecolor = "black",
             label="Bottle\npCO$_{2}$")

ax.legend(handles=[s], edgecolor="black", loc="lower left", bbox_to_anchor=(1, 0.2), fontsize=12)
ax.set_xlim(xmin, xmax)
ax.set_title(refdes, fontsize=16)

In [None]:
ax

In [None]:
if not os.path.exists(f"../results/{refdes}/"):
    os.makedirs(f"../results/{refdes}/")

fig.savefig(f"../results/{refdes}/time_series_with_discrete_bottles.png")

In [None]:
pco2w_130m_comparison = pd.DataFrame(columns=["deploymentNumber", "time", "CO2_avg", "CO2_std", "T", "S", "P",
                                            "bottleCO2", "bottleT", "bottleS", "bottleP"])

for ind in pco2w_130m_bottles.index:
    # Get the time stamps of a bottle
    t = pco2w_130m_bottles["Start Time [UTC]"].loc[ind]
    t = t.tz_localize(None)

    # Get the bottle pCO2a
    bottleCO2 = pco2w_130m_bottles["Calculated pCO2 [uatm]"].loc[ind]
    bottleT = pco2w_130m_bottles[["CTD Temperature 1 [deg C]", "CTD Temperature 2 [deg C]"]].loc[ind].mean(skipna=True)
    bottleS = pco2w_130m_bottles[["CTD Salinity 1 [psu]", "CTD Salinity 2 [psu]"]].loc[ind].mean(skipna=True)
    bottleP = pco2w_130m_bottles["CTD Pressure [db]"].loc[ind]

    # Create a time window to look for data
    dt = pd.Timedelta(days=7)
    tmin = t - dt
    tmax = t + dt

    # Find the data within the time series
    subset = pco2w_130m.where((pco2w_130m.time >= tmin) & (pco2w_130m.time <= tmax), drop=True)
    # Check for unique deployments
    deps = np.unique(subset.deployment)
    # If more than one deployment, split it based on deployments
    if len(deps) > 1:
        for depNum in deps:
            dep_subset = subset.where(subset.deployment == depNum, drop=True)
            pco2_avg = dep_subset.pco2_seawater.mean().compute().values
            pco2_std = dep_subset.pco2_seawater.std().compute().values
            pco2_temp = dep_subset.seawater_temperature.mean().compute().values
            pco2_sal = dep_subset.practical_salinity.mean().compute().values
            pco2_pres = dep_subset.seawater_pressure.mean().compute().values
            pco2w_130m_comparison = pco2w_130m_comparison.append({
                    "deploymentNumber": depNum,
                    "time": t,
                    "CO2_avg": pco2_avg,
                    "CO2_std": pco2_std,
                    "T": pco2_temp,
                    "S": pco2_sal,
                    "P": pco2_pres,
                    "bottleCO2": bottleCO2,
                    "bottleT": bottleT,
                    "bottleS": bottleS,
                    "bottleP": bottleP,
                }, ignore_index=True)
    elif len(deps) == 1:
        pco2_avg = subset.pco2_seawater.mean().compute().values
        pco2_std = subset.pco2_seawater.std().compute().values
        pco2_temp = subset.seawater_temperature.mean().compute().values
        pco2_sal = subset.practical_salinity.mean().compute().values
        pco2_pres = subset.seawater_pressure.mean().compute().values
        depNum = deps[0]
        pco2w_130m_comparison = pco2w_130m_comparison.append({
                "deploymentNumber": depNum,
                "time": t,
                "CO2_avg": pco2_avg,
                "CO2_std": pco2_std,
                "T": pco2_temp,
                "S": pco2_sal,
                "P": pco2_pres,
                "bottleCO2": bottleCO2,
                "bottleT": bottleT,
                "bottleS": bottleS,
                "bottleP": bottleP,
            }, ignore_index=True)
    else:
        # Now put in NaNs for the missing results data
        pco2w_130m_comparison = pco2w_130m_comparison.append({
                "deploymentNumber": depNum,
                "time": t,
                "CO2_avg": None,
                "CO2_std": None,
                "T": None,
                "S": None,
                "P": None,
                "bottleCO2": bottleCO2,
                "bottleT": bottleT,
                "bottleS": bottleS,
                "bottleP": bottleP,
            }, ignore_index=True)

# Save the results
pco2w_130m_comparison

In [None]:
df = pco2w_130m_comparison.dropna(subset=["deploymentNumber"])
df["bottleCO2"] = df["bottleCO2"].astype(float)
df["CO2_avg"] = df["CO2_avg"].astype(float)
df["CO2_std"] = df["CO2_std"].astype(float)
#df = df[df["deploymentNumber"] != 3]

fig, ax = plt.subplots(figsize=(8,8))

# Set the 
levels, categories = pd.factorize(df['deploymentNumber'])
colors = [plt.cm.tab10(i) for i in levels] # using the "tab10" colormap
handles = [matplotlib.patches.Patch(color=plt.cm.tab10(i), label=c) for i, c in enumerate(categories)]
for h in handles:
    h.set_edgecolor("black")

# Plot the data
ax.scatter(df["bottleCO2"], df["CO2_avg"], c=colors, s=80, edgecolors="black")
ax.set_xlabel(xlabel="Bottle pCO$_{2}$", fontsize=14)
ax.set_ylabel(ylabel="PCO2W pCO$_{2}$", fontsize=14)
ax.legend(handles=handles, title='Deployment', fontsize=12, edgecolor="black")
ax.grid()

# Add in vertical error bars
for i, c in enumerate(categories):
    x = df["bottleCO2"][df["deploymentNumber"] == c]
    ymin = df["CO2_avg"][df["deploymentNumber"] == c] - 2*df["CO2_std"][df["deploymentNumber"] == c]
    ymax = df["CO2_avg"][df["deploymentNumber"] == c] + 2*df["CO2_std"][df["deploymentNumber"] == c]
    ax.vlines(x, ymin, ymax, colors = plt.cm.tab10(i))

ax.set_ylim((250, 455))
ax.set_xlim((250, 455))
# Add in a 1:1 line
x = np.arange(250, 455, 5)
y = np.arange(250, 455, 5)
ax.plot(x, y, color = "black", alpha=0.7)
ax.set_title(refdes + " pCO$_{2}$ Comparison", fontsize=16)

### All PCO2Ws

In [None]:
def calculate_ylims(ds, param):
    yavg = ds[param].mean(skipna=True).values
    ystd = ds[param].std(skipna=True).values
    ymed = np.nanmedian(ds[param])
    # Need to check for way out-of-bounds values by comparison with median
    if ystd > ymed:
        yavg = ymed
        ystd = ymed*0.2
    ymin = yavg - 4*ystd
    ymax = yavg + 4*ystd
    return yavg, ystd, ymin, ymax

In [None]:
calculate_ylims(pco2w_40m, "pco2_seawater")

In [None]:
fig, ax = plt.subplots(nrows=3, ncols=1, figsize=(15, 18), sharex=True)

param = "pco2_seawater"

# Plot the 40 m PCO2W
# -------------------
# Calculate the figure bounds
yavg, ystd, ymin, ymax = calculate_ylims(pco2w_40m, param)

# Generate the plot figure
s = pco2w_40m.plot.scatter("time", param, ax=ax[0], hue="deployment", hue_style="discrete")

# Add the deployments
ax[0].legend(edgecolor="black", loc="center left", bbox_to_anchor=(1, 0.5), fontsize=14, title="Deployments")
deployments = np.unique(pco2w_40m["deployment"])
for depNum in deployments:
    dt = pco2w_40m.where(pco2w_40m["deployment"] == depNum, drop=True)["time"].min()
    ax[0].vlines(dt.values, yavg-4*ystd, yavg+4*ystd)
    ax[0].text(dt.values, yavg-3*ystd, str(int(depNum)), fontsize=14, weight="bold")
xmin, xmax = ax[0].get_xlim()

# Add the bottle samples
b, = ax[0].plot(pco2w_40m_bottles.dropna(subset=["Calculated pCO2 [uatm]"])["Start Time [UTC]"], 
             pco2w_40m_bottles.dropna(subset=["Calculated pCO2 [uatm]"])["Calculated pCO2 [uatm]"], 
             marker="o", color="yellow", linestyle="", markersize=12, markeredgecolor = "black",
             label="Bottle\npCO$_{2}$")
#ax[0].legend(handles=[b], edgecolor="black", loc="lower left", bbox_to_anchor=(1, 0.2), fontsize=12)


# Set the ylims
ax[0].set_xlim(xmin, xmax)
ax[0].set_ylim(ymin, ymax)
ax[0].grid()
ax[0].set_title("PCO2W: 40 m", fontsize=20, weight="bold")
ax[0].set_ylabel("pCO$_{2}$ Seawater\n[uatm]", fontsize=20, weight="bold")
# Set the yticklabels
for tick in ax[0].yaxis.get_major_ticks():
    tick.label1.set_fontsize(12)
    tick.label1.set_fontweight("bold")
    tick.label1.se
# Plot the 80 m PCO2W
# -------------------
# Calculate the figure bounds
yavg, ystd, ymin, ymax = calculate_ylims(pco2w_80m, param)

# Generate the plot figure
s = pco2w_80m.plot.scatter("time", param, ax=ax[1], hue="deployment", hue_style="discrete")

# Add the deployments
ax[1].legend(edgecolor="black", loc="center left", bbox_to_anchor=(1, 0.5), fontsize=14, title="Deployments")
deployments = np.unique(pco2w_80m["deployment"])
for depNum in deployments:
    dt = pco2w_80m.where(pco2w_80m["deployment"] == depNum, drop=True)["time"].min()
    ax[1].vlines(dt.values, yavg-4*ystd, yavg+4*ystd)
    ax[1].text(dt.values, yavg-3*ystd, str(int(depNum)), fontsize=14, weight="bold")
xmin, xmax = ax[1].get_xlim()

# Add the bottle samples
b, = ax[1].plot(pco2w_80m_bottles.dropna(subset=["Calculated pCO2 [uatm]"])["Start Time [UTC]"], 
             pco2w_80m_bottles.dropna(subset=["Calculated pCO2 [uatm]"])["Calculated pCO2 [uatm]"], 
             marker="o", color="yellow", linestyle="", markersize=12, markeredgecolor = "black",
             label="Bottle\npCO$_{2}$")
#ax[1].legend(handles=[b], edgecolor="black", loc="lower left", bbox_to_anchor=(1, 0.2), fontsize=12)


# Set the ylims
ax[1].set_xlim(xmin, xmax)
ax[1].set_ylim(ymin, ymax)
ax[1].grid()
ax[1].set_title("PCO2W: 80 m", fontsize=20, weight="bold")
ax[1].set_ylabel("pCO$_{2}$ Seawater\n[uatm]", fontsize=20, weight="bold")

for tick in ax[1].yaxis.get_major_ticks():
    tick.label1.set_fontsize(12)
    tick.label1.set_fontweight("bold")
    
# Plot the 130 m PCO2W
# -------------------
# Calculate the figure bounds
yavg, ystd, ymin, ymax = calculate_ylims(pco2w_130m, param)

# Generate the plot figure
s = pco2w_130m.plot.scatter("time", param, ax=ax[2], hue="deployment", hue_style="discrete")

# Add the deployments
ax[2].legend(edgecolor="black", loc="center left", bbox_to_anchor=(1, 0.5), fontsize=14, title="Deployments")
deployments = np.unique(pco2w_130m["deployment"])
for depNum in deployments:
    dt = pco2w_130m.where(pco2w_130m["deployment"] == depNum, drop=True)["time"].min()
    ax[2].vlines(dt.values, yavg-4*ystd, yavg+4*ystd)
    ax[2].text(dt.values, yavg-3*ystd, str(int(depNum)), fontsize=14, weight="bold")
xmin, xmax = ax[2].get_xlim()

# Add the bottle samples
b, = ax[2].plot(pco2w_130m_bottles.dropna(subset=["Calculated pCO2 [uatm]"])["Start Time [UTC]"], 
             pco2w_130m_bottles.dropna(subset=["Calculated pCO2 [uatm]"])["Calculated pCO2 [uatm]"], 
             marker="o", color="yellow", linestyle="", markersize=12, markeredgecolor = "black",
             label="Bottle\npCO$_{2}$")
#ax[2].legend(handles=[b], edgecolor="black", loc="lower left", bbox_to_anchor=(1, 0.2), fontsize=12)

ax[2].set_xlim(xmin, xmax)
# Set the ylims
ax[2].set_ylim(ymin, ymax)
ax[2].grid()
ax[2].set_title("PCO2W: 130 m", fontsize=20, weight="bold")
ax[2].set_ylabel("pCO$_{2}$ Seawater\n[uatm]", fontsize=20, weight="bold")

for tick in ax[2].yaxis.get_major_ticks():
    tick.label1.set_fontsize(12)
    tick.label1.set_fontweight("bold")
    
for tick in ax[2].xaxis.get_major_ticks():
    tick.label1.set_fontsize(12)
    tick.label1.set_fontweight("bold")


fig.autofmt_xdate()
plt.xlabel("")


In [None]:
fig.savefig("../results/all_pco2w_timeseries_with_bottle_data.png", facecolor="white", transparent=False)

In [None]:
fig, ax = plt.subplots(nrows=3, ncols=1, figsize=(15, 18), sharex=True)

param = "pco2_seawater"

# Plot the 40 m PCO2W
# -------------------
# Calculate the figure bounds
yavg, ystd, ymin, ymax = calculate_ylims(pco2w_40m, param)

# Generate the plot figure
s = pco2w_40m.plot.scatter("time", param, ax=ax[0], hue="deployment", hue_style="discrete")

# Add the deployments
ax[0].legend(edgecolor="black", loc="center left", bbox_to_anchor=(1, 0.5), fontsize=14, title="Deployments")
deployments = np.unique(pco2w_40m["deployment"])
for depNum in deployments:
    dt = pco2w_40m.where(pco2w_40m["deployment"] == depNum, drop=True)["time"].min()
    ax[0].vlines(dt.values, yavg-4*ystd, yavg+4*ystd)
    ax[0].text(dt.values, yavg-3*ystd, str(int(depNum)), fontsize=14, weight="bold")

# Set the ylims
ax[0].set_ylim(ymin, ymax)
ax[0].grid()
ax[0].set_title("PCO2W: 40 m", fontsize=16, weight="bold")
ax[0].set_ylabel("pCO$_{2}$ Seawater\n[uatm]", fontsize=16, weight="bold")
#plt.yticks(fontsize=12, weight="bold")


# Plot the 80 m PCO2W
# -------------------
# Calculate the figure bounds
yavg, ystd, ymin, ymax = calculate_ylims(pco2w_80m, param)

# Generate the plot figure
s = pco2w_80m.plot.scatter("time", param, ax=ax[1], hue="deployment", hue_style="discrete")

# Add the deployments
ax[1].legend(edgecolor="black", loc="center left", bbox_to_anchor=(1, 0.5), fontsize=14, title="Deployments")
deployments = np.unique(pco2w_80m["deployment"])
for depNum in deployments:
    dt = pco2w_80m.where(pco2w_80m["deployment"] == depNum, drop=True)["time"].min()
    ax[1].vlines(dt.values, yavg-4*ystd, yavg+4*ystd)
    ax[1].text(dt.values, yavg-3*ystd, str(int(depNum)), fontsize=14, weight="bold")

# Set the ylims
ax[1].set_ylim(ymin, ymax)
ax[1].grid()
ax[1].set_title("PCO2W: 80 m", fontsize=16, weight="bold")
ax[1].set_ylabel("pCO$_{2}$ Seawater\n[uatm]", fontsize=16, weight="bold")
#plt.yticks(fontsize=12, weight="bold")

# Plot the 130 m PCO2W
# -------------------
# Calculate the figure bounds
yavg, ystd, ymin, ymax = calculate_ylims(pco2w_130m, param)

# Generate the plot figure
s = pco2w_130m.plot.scatter("time", param, ax=ax[2], hue="deployment", hue_style="discrete")

# Add the deployments
ax[2].legend(edgecolor="black", loc="center left", bbox_to_anchor=(1, 0.5), fontsize=14, title="Deployments")
deployments = np.unique(pco2w_130m["deployment"])
for depNum in deployments:
    dt = pco2w_130m.where(pco2w_130m["deployment"] == depNum, drop=True)["time"].min()
    ax[2].vlines(dt.values, yavg-4*ystd, yavg+4*ystd)
    ax[2].text(dt.values, yavg-3*ystd, str(int(depNum)), fontsize=14, weight="bold")

# Set the ylims
ax[2].set_ylim(ymin, ymax)
ax[2].grid()
ax[2].set_title("PCO2W: 130 m", fontsize=16, weight="bold")
ax[2].set_ylabel("pCO$_{2}$ Seawater\n[uatm]", fontsize=16, weight="bold")
ax[2].xaxis.get_label().set_fontsize(16)
plt.xticks(fontsize=12, weight="bold")
#plt.yticks(fontsize=12, weight="bold")

fig.autofmt_xdate()

In [None]:
fig.savefig("../results/all_pco2w_timeseries.png", facecolor="white", transparent=False)

In [None]:
pco2w_40m.attrs

In [None]:
pco2w_40m_comparison["instrument"] = "40m"
pco2w_80m_comparison["instrument"] = "80m"
pco2w_130m_comparison["instrument"] = "130m"

pco2w_comparison = pco2w_40m_comparison.append(pco2w_80m_comparison).append(pco2w_130m_comparison)
pco2w_comparison.to_excel("../results/PCO2W_comparison.xlsx", index=False)

Plot the comparison of data, dropping the NaN

In [None]:
import seaborn as sns
import matplotlib.patches

In [None]:
df = pco2w_comparison.dropna(subset=["deploymentNumber"])
df["bottleCO2"] = df["bottleCO2"].astype(float)
df["CO2_avg"] = df["CO2_avg"].astype(float)
df["CO2_std"] = df["CO2_std"].astype(float)
#df = df[df["deploymentNumber"] != 3]

fig, ax = plt.subplots(figsize=(8,8))

# Set the 
levels, categories = pd.factorize(df['instrument'])
colors = [plt.cm.tab10(i) for i in levels] # using the "tab10" colormap
handles = [matplotlib.patches.Patch(color=plt.cm.tab10(i), label=c) for i, c in enumerate(categories)]
for h in handles:
    h.set_edgecolor("black")

# Plot the data
ax.scatter(df["bottleCO2"], df["CO2_avg"], c=colors, s=80, edgecolors="black")
ax.set_xlabel(xlabel="Bottle pCO$_{2}$", fontsize=20, weight="bold")
ax.set_ylabel(ylabel="PCO2W pCO$_{2}$", fontsize=20, weight="bold")
ax.legend(handles=handles, title='Instrument', fontsize=12, edgecolor="black")
ax.grid()

first_legend = ax.get_legend()
ax.add_artist(first_legend)

# Add in vertical error bars
for i, c in enumerate(categories):
    x = df["bottleCO2"][df["instrument"] == c]
    ymin = df["CO2_avg"][df["instrument"] == c] - 2*df["CO2_std"][df["instrument"] == c]
    ymax = df["CO2_avg"][df["instrument"] == c] + 2*df["CO2_std"][df["instrument"] == c]
    ax.vlines(x, ymin, ymax, colors = plt.cm.tab10(i))

ax.set_ylim((250, 500))
ax.set_xlim((250, 500))
# Add in a 1:1 line
x = np.arange(200, 505, 5)
y = np.arange(200, 505, 5)
s, = ax.plot(x, y, color = "black", alpha=0.7, label="1:1 Line")
ax.legend(handles=[s], edgecolor="black", loc="lower right", fontsize=12)

for tick in ax.xaxis.get_major_ticks():
    tick.label1.set_fontsize(12)
    tick.label1.set_fontweight("bold")
    
for tick in ax.yaxis.get_major_ticks():
    tick.label1.set_fontsize(12)
    tick.label1.set_fontweight("bold")
    
#ax.legend(handles=[s], edgecolor="black", loc="lower right", fontsize=12)

In [None]:
fig.savefig("../results/pco2w_comparison.png", facecolor="white", transparent=False)

### PHSEN 20 m

In [None]:
from OS2022.OS2022 import process_phsen

In [None]:
refdes = "GI01SUMO-RII11-02-PHSENE041"

phsen_20m = xr.open_dataset("../data/GI01SUMO-RII11-02-PHSENE041_combined.nc", chunks="auto")

annotations = OOINet.get_annotations("GI01SUMO-RII11-02-PHSENE041")

phsen_20m = OOINet.add_annotation_qc_flag(phsen_20m, annotations)

#phsen_20m = phsen_20m.where(phsen_20m.rollup_annotations_qc_results != 9, drop=True)

fig, ax = plotting.plot_variable(phsen_20m, "seawater_ph")
ax.set_title(refdes, fontsize=16)

In [None]:
if not os.path.exists(f"../results/{refdes}"):
    os.makedirs(f"../results/{refdes}")
    
fig.savefig(f"../results/{refdes}/time_series.png")

#### CTD Data

In [None]:
ctdmo_20m = xr.open_dataset("../data/GI01SUMO-RII11-02-CTDMOQ011_combined.nc", chunks="auto")

seawater_temperature = ctdmo_20m["ctdmo_seawater_temperature"].interp_like(phsen_20m)
seawater_pressure = ctdmo_20m["ctdmo_seawater_pressure"].interp_like(phsen_20m)

# Add to the pco2w
phsen_20m["seawater_temperature"] = seawater_temperature
phsen_20m["seawater_pressure"] = seawater_pressure

#### Quality Checks
Next, add in the quality checks for the data

In [None]:
qc_flags = process_phsen.quality_checks(phsen_20m)
phsen_20m["qc_flags"] = qc_flags

In [None]:
fig, ax = plotting.plot_variable(phsen_20m, "seawater_ph")

# Plot figure which highlights data points with a corresponding annotation
ax.set_alpha(0.7)
first_legend = ax.get_legend()
first_legend.set_bbox_to_anchor((1, 0.7))
ax.add_artist(first_legend)

s1, = ax.plot(phsen_20m.time.where(phsen_20m.qc_flags == 3), 
             phsen_20m.seawater_ph.where(phsen_20m.qc_flags == 3), 
             marker="s", linestyle="", color="yellow", label="Suspect")
s2, = ax.plot(phsen_20m.time.where(phsen_20m.qc_flags == 4), 
             phsen_20m.seawater_ph.where(phsen_20m.qc_flags == 4), 
             marker="s", linestyle="", color="tab:red", label="Bad")

ax.legend(handles=[s1, s2], edgecolor="black", loc="lower left", bbox_to_anchor=(1, 0.2), fontsize=12)
ax.set_title(refdes, fontsize=16)

In [None]:
if not os.path.exists(f"../results/{refdes}"):
    os.makedirs(f"../results/{refdes}")
    
fig.savefig(f"../results/{refdes}/time_series.png")

In [None]:
# Drop the bad data
phsen_20m = phsen_20m.where(phsen_20m.qc_flags != 4, drop=True)

In [None]:
deployments = OOINet.get_deployments(refdes)

lat, lon, depth = deployments["latitude"].mean(), deployments["longitude"].mean(), deployments["depth"].mean()

lat, lon, depth

In [None]:
phsen_20m_bottles = findSamples(carbonData, (lat, lon), depth, 5, 20)
phsen_20m_bottles

In [None]:
fig, ax = plotting.plot_variable(phsen_20m, "seawater_ph")
ax.set_alpha(0.7)
first_legend = ax.get_legend()
first_legend.set_bbox_to_anchor((1, 0.7))
ax.add_artist(first_legend)
xmin, xmax = ax.get_xlim()

s, = ax.plot(phsen_20m_bottles.dropna(subset=["Calculated pH"])["Start Time [UTC]"], 
             phsen_20m_bottles.dropna(subset=["Calculated pH"])["Calculated pH"], 
             marker="o", color="yellow", linestyle="", markersize=12, markeredgecolor = "black",
             label="Bottle pCO$_{2}$")

ax.legend(handles=[s], edgecolor="black", loc="lower left", bbox_to_anchor=(1, 0.2), fontsize=12)
ax.set_xlim(xmin, xmax)
ax.set_title(refdes, fontsize=16)

In [None]:
if not os.path.exists(f"../results/{refdes}/"):
    os.makedirs(f"../results/{refdes}/")
    
fig.savefig(f"../results/{refdes}/time_series_with_discrete_bottles")

In [None]:
phsen_20m_comparison = pd.DataFrame(columns=["deploymentNumber", "time", "PH_avg", "PH_std", "T", "S", "P",
                                            "bottlePH", "bottleT", "bottleS", "bottleP"])

for ind in phsen_20m_bottles.index:
    # Get the time stamps of a bottle
    t = phsen_20m_bottles["Start Time [UTC]"].loc[ind]
    t = t.tz_localize(None)

    # Get the bottle pCO2a
    bottlePH = phsen_20m_bottles["Calculated pH"].loc[ind]
    bottleT = phsen_20m_bottles[["CTD Temperature 1 [deg C]", "CTD Temperature 2 [deg C]"]].loc[ind].mean(skipna=True)
    bottleS = phsen_20m_bottles[["CTD Salinity 1 [psu]", "CTD Salinity 2 [psu]"]].loc[ind].mean(skipna=True)
    bottleP = phsen_20m_bottles["CTD Pressure [db]"].loc[ind]

    # Create a time window to look for data
    dt = pd.Timedelta(days=7)
    tmin = t - dt
    tmax = t + dt

    # Find the data within the time series
    subset = phsen_20m.where((phsen_20m.time >= tmin) & (phsen_20m.time <= tmax), drop=True)
    # Check for unique deployments
    deps = np.unique(subset.deployment)
    # If more than one deployment, split it based on deployments
    if len(deps) > 1:
        for depNum in deps:
            dep_subset = subset.where(subset.deployment == depNum, drop=True)
            pco2_avg = dep_subset.seawater_ph.mean(skipna=True).compute().values
            pco2_std = dep_subset.seawater_ph.std(skipna=True).compute().values
            pco2_temp = dep_subset.seawater_temperature.mean().compute().values
            pco2_sal = dep_subset.practical_salinity.mean().compute().values
            pco2_pres = dep_subset.seawater_pressure.mean().compute().values
            phsen_20m_comparison = phsen_20m_comparison.append({
                    "deploymentNumber": depNum,
                    "time": t,
                    "PH_avg": pco2_avg,
                    "PH_std": pco2_std,
                    "T": pco2_temp,
                    "S": pco2_sal,
                    "P": pco2_pres,
                    "bottlePH": bottlePH,
                    "bottleT": bottleT,
                    "bottleS": bottleS,
                    "bottleP": bottleP,
                }, ignore_index=True)
    elif len(deps) == 1:
        pco2_avg = subset.seawater_ph.mean(skipna=True).compute().values
        pco2_std = subset.seawater_ph.std(skipna=True).compute().values
        pco2_temp = subset.seawater_temperature.mean().compute().values
        pco2_sal = subset.practical_salinity.mean().compute().values
        pco2_pres = subset.seawater_pressure.mean().compute().values
        depNum = deps[0]
        phsen_20m_comparison = phsen_20m_comparison.append({
                "deploymentNumber": depNum,
                "time": t,
                "PH_avg": pco2_avg,
                "PH_std": pco2_std,
                "T": pco2_temp,
                "S": pco2_sal,
                "P": pco2_pres,
                "bottlePH": bottlePH,
                "bottleT": bottleT,
                "bottleS": bottleS,
                "bottleP": bottleP,
            }, ignore_index=True)
    else:
        # Now put in NaNs for the missing results data
        phsen_20m_comparison = phsen_20m_comparison.append({
                "deploymentNumber": depNum,
                "time": t,
                "PH_avg": None,
                "PH_std": None,
                "T": None,
                "S": None,
                "P": None,
                "bottlePH": bottlePH,
                "bottleT": bottleT,
                "bottleS": bottleS,
                "bottleP": bottleP,
            }, ignore_index=True)

# Save the results
phsen_20m_comparison

In [None]:
import matplotlib

In [None]:
df = phsen_20m_comparison.dropna(subset=["deploymentNumber"])
df["bottlePH"] = df["bottlePH"].astype(float)
df["PH_avg"] = df["PH_avg"].astype(float)
df["PH_std"] = df["PH_std"].astype(float)
#df = df[df["deploymentNumber"] != 3]

fig, ax = plt.subplots(figsize=(8,8))

# Set the 
levels, categories = pd.factorize(df['deploymentNumber'])
colors = [plt.cm.tab10(i) for i in levels] # using the "tab10" colormap
handles = [matplotlib.patches.Patch(color=plt.cm.tab10(i), label=c) for i, c in enumerate(categories)]
for h in handles:
    h.set_edgecolor("black")

# Plot the data
ax.scatter(df["bottlePH"], df["PH_avg"], c=colors, s=80, edgecolors="black")
ax.set_xlabel(xlabel="Bottle pH", fontsize=14)
ax.set_ylabel(ylabel="PHSEN pH", fontsize=14)
ax.legend(handles=handles, title='Deployment', fontsize=12, edgecolor="black")
ax.grid()

# Add in vertical error bars
for i, c in enumerate(categories):
    x = df["bottlePH"][df["deploymentNumber"] == c]
    ymin = df["PH_avg"][df["deploymentNumber"] == c] - 2*df["PH_std"][df["deploymentNumber"] == c]
    ymax = df["PH_avg"][df["deploymentNumber"] == c] + 2*df["PH_std"][df["deploymentNumber"] == c]
    ax.vlines(x, ymin, ymax, colors = plt.cm.tab10(i))

ax.set_ylim((8.05, 8.15))
ax.set_xlim((8.05, 8.15))
# Add in a 1:1 line
x = np.arange(8, 8.205, 0.05)
y = np.arange(8, 8.205, 0.05)
ax.plot(x, y, color = "black", alpha=0.7)

In [None]:
fig.savefig(f"../results/{refdes}/instrument_vs_bottle_comparison.png")

### PHSEN SUMO 100m

In [None]:
refdes = "GI01SUMO-RII11-02-PHSENE042"

phsen_100m = xr.open_dataset("../data/GI01SUMO-RII11-02-PHSENE042_combined.nc", chunks="auto")

annotations = OOINet.get_annotations("GI01SUMO-RII11-02-PHSENE042")

phsen_100m = OOINet.add_annotation_qc_flag(phsen_100m, annotations)

#phsen_100m = phsen_100m.where(phsen_100m.rollup_annotations_qc_results != 9, drop=True)

fig, ax = plotting.plot_variable(phsen_100m, "seawater_ph")
ax.set_title(refdes, fontsize=16)

In [None]:
if not os.path.exists(f"../results/{refdes}"):
    os.makedirs(f"../results/{refdes}")
    
fig.savefig(f"../results/{refdes}/time_series.png")

#### CTD Data

In [None]:
ctdmo_100m = xr.open_dataset("../data/GI01SUMO-RII11-02-CTDMOQ013_combined.nc", chunks="auto")

seawater_temperature = ctdmo_100m["ctdmo_seawater_temperature"].interp_like(phsen_100m)
seawater_pressure = ctdmo_100m["ctdmo_seawater_pressure"].interp_like(phsen_100m)

# Add to the pco2w
phsen_100m["seawater_temperature"] = seawater_temperature
phsen_100m["seawater_pressure"] = seawater_pressure

#### Quality Checks
Next, add in the quality checks for the data

In [None]:
qc_flags = process_phsen.quality_checks(phsen_100m)
phsen_100m["qc_flags"] = qc_flags

In [None]:
fig, ax = plotting.plot_variable(phsen_100m, "seawater_ph")

# Plot figure which highlights data points with a corresponding annotation
ax.set_alpha(0.7)
first_legend = ax.get_legend()
first_legend.set_bbox_to_anchor((1, 0.7))
ax.add_artist(first_legend)

s1, = ax.plot(phsen_100m.time.where(phsen_100m.qc_flags == 3), 
             phsen_100m.seawater_ph.where(phsen_100m.qc_flags == 3), 
             marker="s", linestyle="", color="yellow", label="Suspect")
s2, = ax.plot(phsen_100m.time.where(phsen_100m.qc_flags == 4), 
             phsen_100m.seawater_ph.where(phsen_100m.qc_flags == 4), 
             marker="s", linestyle="", color="tab:red", label="Bad")

ax.legend(handles=[s1, s2], edgecolor="black", loc="lower left", bbox_to_anchor=(1, 0.2), fontsize=12)
ax.set_title(refdes, fontsize=16)

In [None]:
if not os.path.exists(f"../results/{refdes}"):
    os.makedirs(f"../results/{refdes}")
    
fig.savefig(f"../results/{refdes}/time_series.png")

In [None]:
# Drop the bad data
phsen_100m = phsen_100m.where(phsen_100m.qc_flags != 4, drop=True)

In [None]:
deployments = OOINet.get_deployments(refdes)

lat, lon, depth = deployments["latitude"].mean(), deployments["longitude"].mean(), deployments["depth"].mean()

In [None]:
phsen_100m_bottles = findSamples(carbonData, (lat, lon), depth, 5, 20)
phsen_100m_bottles

In [None]:
fig, ax = plotting.plot_variable(phsen_100m, "seawater_ph")
ax.set_alpha(0.7)
first_legend = ax.get_legend()
first_legend.set_bbox_to_anchor((1, 0.7))
ax.add_artist(first_legend)
xmin, xmax = ax.get_xlim()

s, = ax.plot(phsen_100m_bottles.dropna(subset=["Calculated pH"])["Start Time [UTC]"], 
             phsen_100m_bottles.dropna(subset=["Calculated pH"])["Calculated pH"], 
             marker="o", color="yellow", linestyle="", markersize=12, markeredgecolor = "black",
             label="Bottle pH")

ax.legend(handles=[s], edgecolor="black", loc="lower left", bbox_to_anchor=(1, 0.2), fontsize=12)
ax.set_xlim(xmin, xmax)
ax.set_title(refdes, fontsize=16)

In [None]:
if not os.path.exists(f"../results/{refdes}"):
    os.makedirs(f"../results/{refdes}")
    
fig.savefig(f"../results/{refdes}/time_series_with_discrete_bottles.png")

In [None]:
phsen_100m_comparison = pd.DataFrame(columns=["deploymentNumber", "time", "PH_avg", "PH_std", "T", "S", "P",
                                            "bottlePH", "bottleT", "bottleS", "bottleP"])

for ind in phsen_100m_bottles.index:
    # Get the time stamps of a bottle
    t = phsen_100m_bottles["Start Time [UTC]"].loc[ind]
    t = t.tz_localize(None)

    # Get the bottle pCO2a
    bottlePH = phsen_100m_bottles["Calculated pH"].loc[ind]
    bottleT = phsen_100m_bottles[["CTD Temperature 1 [deg C]", "CTD Temperature 2 [deg C]"]].loc[ind].mean(skipna=True)
    bottleS = phsen_100m_bottles[["CTD Salinity 1 [psu]", "CTD Salinity 2 [psu]"]].loc[ind].mean(skipna=True)
    bottleP = phsen_100m_bottles["CTD Pressure [db]"].loc[ind]

    # Create a time window to look for data
    dt = pd.Timedelta(days=7)
    tmin = t - dt
    tmax = t + dt

    # Find the data within the time series
    subset = phsen_100m.where((phsen_100m.time >= tmin) & (phsen_100m.time <= tmax), drop=True)
    # Check for unique deployments
    deps = np.unique(subset.deployment)
    # If more than one deployment, split it based on deployments
    if len(deps) > 1:
        for depNum in deps:
            dep_subset = subset.where(subset.deployment == depNum, drop=True)
            pco2_avg = dep_subset.seawater_ph.mean(skipna=True).compute().values
            pco2_std = dep_subset.seawater_ph.std(skipna=True).compute().values
            pco2_temp = dep_subset.seawater_temperature.mean().compute().values
            pco2_sal = dep_subset.practical_salinity.mean().compute().values
            pco2_pres = dep_subset.seawater_pressure.mean().compute().values
            phsen_100m_comparison = phsen_100m_comparison.append({
                    "deploymentNumber": depNum,
                    "time": t,
                    "PH_avg": pco2_avg,
                    "PH_std": pco2_std,
                    "T": pco2_temp,
                    "S": pco2_sal,
                    "P": pco2_pres,
                    "bottlePH": bottlePH,
                    "bottleT": bottleT,
                    "bottleS": bottleS,
                    "bottleP": bottleP,
                }, ignore_index=True)
    elif len(deps) == 1:
        pco2_avg = subset.seawater_ph.mean(skipna=True).compute().values
        pco2_std = subset.seawater_ph.std(skipna=True).compute().values
        pco2_temp = subset.seawater_temperature.mean().compute().values
        pco2_sal = subset.practical_salinity.mean().compute().values
        pco2_pres = subset.seawater_pressure.mean().compute().values
        depNum = deps[0]
        phsen_100m_comparison = phsen_100m_comparison.append({
                "deploymentNumber": depNum,
                "time": t,
                "PH_avg": pco2_avg,
                "PH_std": pco2_std,
                "T": pco2_temp,
                "S": pco2_sal,
                "P": pco2_pres,
                "bottlePH": bottlePH,
                "bottleT": bottleT,
                "bottleS": bottleS,
                "bottleP": bottleP,
            }, ignore_index=True)
    else:
        # Now put in NaNs for the missing results data
        phsen_100m_comparison = phsen_100m_comparison.append({
                "deploymentNumber": depNum,
                "time": t,
                "PH_avg": None,
                "PH_std": None,
                "T": None,
                "S": None,
                "P": None,
                "bottlePH": bottlePH,
                "bottleT": bottleT,
                "bottleS": bottleS,
                "bottleP": bottleP,
            }, ignore_index=True)

# Save the results
phsen_100m_comparison

In [None]:
df = phsen_100m_comparison.dropna(subset=["deploymentNumber"])
df["bottlePH"] = df["bottlePH"].astype(float)
df["PH_avg"] = df["PH_avg"].astype(float)
df["PH_std"] = df["PH_std"].astype(float)
#df = df[df["deploymentNumber"] != 3]

fig, ax = plt.subplots(figsize=(8,8))

# Set the 
levels, categories = pd.factorize(df['deploymentNumber'])
colors = [plt.cm.tab10(i) for i in levels] # using the "tab10" colormap
handles = [matplotlib.patches.Patch(color=plt.cm.tab10(i), label=c) for i, c in enumerate(categories)]
for h in handles:
    h.set_edgecolor("black")

# Plot the data
ax.scatter(df["bottlePH"], df["PH_avg"], c=colors, s=80, edgecolors="black")
ax.set_xlabel(xlabel="Bottle pH", fontsize=14)
ax.set_ylabel(ylabel="PHSEN pH", fontsize=14)
ax.legend(handles=handles, title='Deployment', fontsize=12, edgecolor="black")
ax.grid()

# Add in vertical error bars
for i, c in enumerate(categories):
    x = df["bottlePH"][df["deploymentNumber"] == c]
    ymin = df["PH_avg"][df["deploymentNumber"] == c] - 2*df["PH_std"][df["deploymentNumber"] == c]
    ymax = df["PH_avg"][df["deploymentNumber"] == c] + 2*df["PH_std"][df["deploymentNumber"] == c]
    ax.vlines(x, ymin, ymax, colors = plt.cm.tab10(i))

ax.set_ylim((8, 8.10))
ax.set_xlim((8, 8.10))
# Add in a 1:1 line
x = np.arange(8, 8.2, 0.05)
y = np.arange(8, 8.2, 0.05)
ax.plot(x, y, color = "black", alpha=0.7)

In [None]:
fig.savefig(f"../results/{refdes}/instrument_vs_bottle_comparison.png")

### FLMA PHSEN

In [None]:
import process_pco2w, process_phsen

In [None]:
refdes = "GI03FLMA-RIS01-04-PHSENF000"

phsen_flma = xr.open_dataset("../data/GI03FLMA-RIS01-04-PHSENF000_combined.nc", chunks="auto")
#phsen_flma = process_phsen.phsen_instrument(phsen_flma)

annotations = OOINet.get_annotations(refdes)

phsen_flma = OOINet.add_annotation_qc_flag(phsen_flma, annotations)

#phsen_flma = phsen_flma.where(phsen_flma.rollup_annotations_qc_results != 9, drop=True)

fig, ax = plotting.plot_variable(phsen_flma, "seawater_ph")
ax.set_title(refdes, fontsize=16)

In [None]:
if not os.path.exists(f"../results/{refdes}"):
    os.makedirs(f"../results/{refdes}")
    
fig.savefig(f"../results/{refdes}/time_series.png")

#### CTD Data
Add in the CTD data from the colocated CTD in order to get the seawater temperature and pressure (salinity is included in teh data stream)

In [None]:
ctdmo_flma = xr.open_dataset("../data/GI03FLMA-RIM01-02-CTDMOG040_combined.nc", chunks="auto")

seawater_temperature = ctdmo_flma["ctdmo_seawater_temperature"].interp_like(phsen_flma)
seawater_pressure = ctdmo_flma["ctdmo_seawater_pressure"].interp_like(phsen_flma)

# Add to the pco2w
phsen_flma["seawater_temperature"] = seawater_temperature
phsen_flma["seawater_pressure"] = seawater_pressure

#### Quality Checks
Next, add in the quality checks for the data

In [None]:
qc_flags = process_phsen.quality_checks(phsen_flma)
phsen_flma["qc_flags"] = qc_flags

In [None]:
fig, ax = plotting.plot_variable(phsen_flma, "seawater_ph")

# Plot figure which highlights data points with a corresponding annotation
ax.set_alpha(0.7)
first_legend = ax.get_legend()
first_legend.set_bbox_to_anchor((1, 0.7))
ax.add_artist(first_legend)

s1, = ax.plot(phsen_flma.time.where(phsen_flma.qc_flags == 3), 
             phsen_flma.seawater_ph.where(phsen_flma.qc_flags == 3), 
             marker="s", linestyle="", color="yellow", label="Suspect")
s2, = ax.plot(phsen_flma.time.where(phsen_flma.qc_flags == 4), 
             phsen_flma.seawater_ph.where(phsen_flma.qc_flags == 4), 
             marker="s", linestyle="", color="tab:red", label="Bad")

ax.legend(handles=[s1, s2], edgecolor="black", loc="lower left", bbox_to_anchor=(1, 0.2), fontsize=12)
ax.set_title(refdes, fontsize=16)

In [None]:
if not os.path.exists(f"../results/{refdes}"):
    os.makedirs(f"../results/{refdes}")
    
fig.savefig(f"../results/{refdes}/time_series_with_quality_flags.png")

Drop the bad data

In [None]:
phsen_flma = phsen_flma.where(phsen_flma.qc_flags != 4, drop=True)

#### Match Bottle Data

In [None]:
deployments = OOINet.get_deployments(refdes)

lat, lon, depth = deployments["latitude"].mean(), deployments["longitude"].mean(), deployments["depth"].mean()
lat, lon, depth

In [None]:
phsen_flma_bottles = findSamples(carbonData, (lat, lon), depth, 5, 20)
phsen_flma_bottles

In [None]:
phsen_flma.attrs["id"]

In [None]:
refdes = "GI03FLMA-RIS01-04-PHSENF000"
fig, ax = plotting.plot_variable(phsen_flma, "seawater_ph")
ax.set_alpha(0.7)
first_legend = ax.get_legend()
first_legend.set_bbox_to_anchor((1, 0.7))
ax.add_artist(first_legend)
xmin, xmax = ax.get_xlim()

s, = ax.plot(phsen_flma_bottles.dropna(subset=["Calculated pH"])["Start Time [UTC]"], 
             phsen_flma_bottles.dropna(subset=["Calculated pH"])["Calculated pH"], 
             marker="o", color="yellow", linestyle="", markersize=12, markeredgecolor = "black",
             label="Bottle\npH")

ax.legend(handles=[s], edgecolor="black", loc="lower left", bbox_to_anchor=(1, 0.2), fontsize=12)
ax.set_xlim(xmin, xmax)
ax.set_title(refdes, fontsize=16, weight="bold")
ylabel = ax.yaxis.get_label_text()
ax.set_ylabel(ylabel, fontsize=16, weight="bold")
ax.set_xlabel("")


for tick in ax.xaxis.get_major_ticks():
    tick.label1.set_fontsize(12)
    tick.label1.set_fontweight("bold")
    
for tick in ax.yaxis.get_major_ticks():
    tick.label1.set_fontsize(12)
    tick.label1.set_fontweight("bold")

In [None]:
if not os.path.exists(f"../results/{refdes}"):
    os.makedirs(f"../results/{refdes}")
    
fig.savefig(f"../results/{refdes}/time_series_with_discrete_bottles.png", facecolor="white", transparent=False)

#### Bottle vs. Instrument Comparison

In [None]:
phsen_flma_comparison = pd.DataFrame(columns=["deploymentNumber", "time", "PH_avg", "PH_std", "T", "S", "P",
                                            "bottlePH", "bottleT", "bottleS", "bottleP"])

for ind in phsen_flma_bottles.index:
    # Get the time stamps of a bottle
    t = phsen_flma_bottles["Start Time [UTC]"].loc[ind]
    t = t.tz_localize(None)

    # Get the bottle pCO2a
    bottlePH = phsen_flma_bottles["Calculated pH"].loc[ind]
    bottleT = phsen_flma_bottles[["CTD Temperature 1 [deg C]", "CTD Temperature 2 [deg C]"]].loc[ind].mean(skipna=True)
    bottleS = phsen_flma_bottles[["CTD Salinity 1 [psu]", "CTD Salinity 2 [psu]"]].loc[ind].mean(skipna=True)
    bottleP = phsen_flma_bottles["CTD Pressure [db]"].loc[ind]

    # Create a time window to look for data
    dt = pd.Timedelta(days=7)
    tmin = t - dt
    tmax = t + dt

    # Find the data within the time series
    subset = phsen_flma.where((phsen_flma.time >= tmin) & (phsen_flma.time <= tmax), drop=True)
    # Check for unique deployments
    deps = np.unique(subset.deployment)
    # If more than one deployment, split it based on deployments
    if len(deps) > 1:
        for depNum in deps:
            dep_subset = subset.where(subset.deployment == depNum, drop=True)
            pco2_avg = dep_subset.seawater_ph.mean(skipna=True).compute().values
            pco2_std = dep_subset.seawater_ph.std(skipna=True).compute().values
            pco2_temp = dep_subset.seawater_temperature.mean().compute().values
            pco2_sal = dep_subset.practical_salinity.mean().compute().values
            pco2_pres = dep_subset.seawater_pressure.mean().compute().values
            phsen_flma_comparison = phsen_flma_comparison.append({
                    "deploymentNumber": depNum,
                    "time": t,
                    "PH_avg": pco2_avg,
                    "PH_std": pco2_std,
                    "T": pco2_temp,
                    "S": pco2_sal,
                    "P": pco2_pres,
                    "bottlePH": bottlePH,
                    "bottleT": bottleT,
                    "bottleS": bottleS,
                    "bottleP": bottleP,
                }, ignore_index=True)
    elif len(deps) == 1:
        pco2_avg = subset.seawater_ph.mean(skipna=True).compute().values
        pco2_std = subset.seawater_ph.std(skipna=True).compute().values
        pco2_temp = subset.seawater_temperature.mean().compute().values
        pco2_sal = subset.practical_salinity.mean().compute().values
        pco2_pres = subset.seawater_pressure.mean().compute().values
        depNum = deps[0]
        phsen_flma_comparison = phsen_flma_comparison.append({
                "deploymentNumber": depNum,
                "time": t,
                "PH_avg": pco2_avg,
                "PH_std": pco2_std,
                "T": pco2_temp,
                "S": pco2_sal,
                "P": pco2_pres,
                "bottlePH": bottlePH,
                "bottleT": bottleT,
                "bottleS": bottleS,
                "bottleP": bottleP,
            }, ignore_index=True)
    else:
        # Now put in NaNs for the missing results data
        phsen_flma_comparison = phsen_flma_comparison.append({
                "deploymentNumber": depNum,
                "time": t,
                "PH_avg": None,
                "PH_std": None,
                "T": None,
                "S": None,
                "P": None,
                "bottlePH": bottlePH,
                "bottleT": bottleT,
                "bottleS": bottleS,
                "bottleP": bottleP,
            }, ignore_index=True)

# Save the results
phsen_flma_comparison

Plot the comparison

In [None]:
df = phsen_flma_comparison.dropna(subset=["deploymentNumber"])
df["bottlePH"] = df["bottlePH"].astype(float)
df["PH_avg"] = df["PH_avg"].astype(float)
df["PH_std"] = df["PH_std"].astype(float)
#df = df[df["deploymentNumber"] != 3]

fig, ax = plt.subplots(figsize=(8,8))

# Set the 
levels, categories = pd.factorize(df['deploymentNumber'])
colors = [plt.cm.tab10(i) for i in levels] # using the "tab10" colormap
handles = [matplotlib.patches.Patch(color=plt.cm.tab10(i), label=c) for i, c in enumerate(categories)]
for h in handles:
    h.set_edgecolor("black")

# Plot the data
ax.scatter(df["bottlePH"], df["PH_avg"], c=colors, s=80, edgecolors="black")
ax.set_xlabel(xlabel="Bottle pH$", fontsize=14)
ax.set_ylabel(ylabel="PHSEN pH$", fontsize=14)
ax.legend(handles=handles, title='Deployment', fontsize=12, edgecolor="black")
ax.grid()

# Add in vertical error bars
for i, c in enumerate(categories):
    x = df["bottlePH"][df["deploymentNumber"] == c]
    ymin = df["PH_avg"][df["deploymentNumber"] == c] - 2*df["PH_std"][df["deploymentNumber"] == c]
    ymax = df["PH_avg"][df["deploymentNumber"] == c] + 2*df["PH_std"][df["deploymentNumber"] == c]
    ax.vlines(x, ymin, ymax, colors = plt.cm.tab10(i))

#ax.set_ylim((250, 500))
#ax.set_xlim((350, 450))
# Add in a 1:1 line
x = np.arange(8, 8.2, 0.05)
y = np.arange(8, 8.2, 0.05)
ax.plot(x, y, color = "black", alpha=0.7)

In [None]:
fig.savefig(f"../results/{refdes}/instrument_vs_bottle_comparison.png")

### FLMB

In [None]:
import process_pco2w, process_phsen

In [None]:
refdes = "GI03FLMB-RIS01-04-PHSENF000"

phsen_flmb = xr.open_dataset("../data/GI03FLMB-RIS01-04-PHSENF000/GI03FLMB-RIS01-04-PHSENF000-recovered_inst-phsen_abcdef_instrument.nc", chunks="auto")
phsen_flmb = process_phsen.phsen_instrument(phsen_flmb)

annotations = OOINet.get_annotations(refdes)

phsen_flmb = OOINet.add_annotation_qc_flag(phsen_flmb, annotations)

#phsen_flmb = phsen_flmb.where(phsen_flmb.rollup_annotations_qc_results != 9, drop=True)

fig, ax = plotting.plot_variable(phsen_flmb, "seawater_ph")
ax.set_title(refdes, fontsize=16)

In [None]:
phsen_flmb

In [None]:
if not os.path.exists(f"../results/{refdes}"):
    os.makedirs(f"../results/{refdes}")
    
fig.savefig(f"../results/{refdes}/time_series.png")

#### CTD Data
Add in the CTD data from the colocated CTD in order to get the seawater temperature and pressure (salinity is included in teh data stream)

In [None]:
ctdmo_flmb = xr.open_dataset("../data/GI03FLMB-RIM01-02-CTDMOG060_combined.nc", chunks="auto")

seawater_temperature = ctdmo_flmb["ctdmo_seawater_temperature"].interp_like(phsen_flmb)
seawater_pressure = ctdmo_flmb["ctdmo_seawater_pressure"].interp_like(phsen_flmb)

# Add to the pco2w
phsen_flmb["seawater_temperature"] = seawater_temperature
phsen_flmb["seawater_pressure"] = seawater_pressure

#### Quality Checks
Next, add in the quality checks for the data

In [None]:
qc_flags = process_phsen.quality_checks(phsen_flmb)
phsen_flmb["qc_flags"] = qc_flags

In [None]:
fig, ax = plotting.plot_variable(phsen_flmb, "seawater_ph")

# Plot figure which highlights data points with a corresponding annotation
ax.set_alpha(0.7)
first_legend = ax.get_legend()
first_legend.set_bbox_to_anchor((1, 0.7))
ax.add_artist(first_legend)

s1, = ax.plot(phsen_flmb.time.where(phsen_flmb.qc_flags == 3), 
             phsen_flmb.seawater_ph.where(phsen_flmb.qc_flags == 3), 
             marker="s", linestyle="", color="yellow", label="Suspect")
s2, = ax.plot(phsen_flmb.time.where(phsen_flmb.qc_flags == 4), 
             phsen_flmb.seawater_ph.where(phsen_flmb.qc_flags == 4), 
             marker="s", linestyle="", color="tab:red", label="Bad")

ax.legend(handles=[s1, s2], edgecolor="black", loc="lower left", bbox_to_anchor=(1, 0.2), fontsize=12)
ax.set_title(refdes, fontsize=16)

In [None]:
if not os.path.exists(f"../results/{refdes}"):
    os.makedirs(f"../results/{refdes}")
    
fig.savefig(f"../results/{refdes}/time_series_with_quality_flags.png")

Drop the bad data

In [None]:
phsen_flmb = phsen_flmb.where(phsen_flmb.qc_flags != 4, drop=True)

#### Match Bottle Data

In [None]:
deployments = OOINet.get_deployments(refdes)

lat, lon, depth = deployments["latitude"].mean(), deployments["longitude"].mean(), deployments["depth"].mean()
lat, lon, depth

In [None]:
phsen_flmb_bottles = findSamples(carbonData, (lat, lon), depth, 5, 20)
phsen_flmb_bottles

In [None]:
fig, ax = plotting.plot_variable(phsen_flmb, "seawater_ph")
ax.set_alpha(0.7)
first_legend = ax.get_legend()
first_legend.set_bbox_to_anchor((1, 0.7))
ax.add_artist(first_legend)
xmin, xmax = ax.get_xlim()

s, = ax.plot(phsen_flmb_bottles.dropna(subset=["Calculated pH"])["Start Time [UTC]"], 
             phsen_flmb_bottles.dropna(subset=["Calculated pH"])["Calculated pH"], 
             marker="o", color="yellow", linestyle="", markersize=12, markeredgecolor = "black",
             label="Bottle pH")

ax.legend(handles=[s], edgecolor="black", loc="lower left", bbox_to_anchor=(1, 0.2), fontsize=12)
ax.set_xlim(xmin, xmax)
ax.set_title(refdes, fontsize=16)

In [None]:
if not os.path.exists(f"../results/{refdes}"):
    os.makedirs(f"../results/{refdes}")
    
fig.savefig(f"../results/{refdes}/time_series_with_discrete_bottles.jpg", format="jpg")

#### Bottle vs. Instrument Comparison

In [None]:
phsen_flmb_comparison = pd.DataFrame(columns=["deploymentNumber", "time", "PH_avg", "PH_std", "T", "S", "P",
                                            "bottlePH", "bottleT", "bottleS", "bottleP"])

for ind in phsen_flmb_bottles.index:
    # Get the time stamps of a bottle
    t = phsen_flmb_bottles["Start Time [UTC]"].loc[ind]
    t = t.tz_localize(None)

    # Get the bottle pCO2a
    bottlePH = phsen_flmb_bottles["Calculated pH"].loc[ind]
    bottleT = phsen_flmb_bottles[["CTD Temperature 1 [deg C]", "CTD Temperature 2 [deg C]"]].loc[ind].mean(skipna=True)
    bottleS = phsen_flmb_bottles[["CTD Salinity 1 [psu]", "CTD Salinity 2 [psu]"]].loc[ind].mean(skipna=True)
    bottleP = phsen_flmb_bottles["CTD Pressure [db]"].loc[ind]

    # Create a time window to look for data
    dt = pd.Timedelta(days=7)
    tmin = t - dt
    tmax = t + dt

    # Find the data within the time series
    subset = phsen_flmb.where((phsen_flmb.time >= tmin) & (phsen_flmb.time <= tmax), drop=True)
    # Check for unique deployments
    deps = np.unique(subset.deployment)
    # If more than one deployment, split it based on deployments
    if len(deps) > 1:
        for depNum in deps:
            dep_subset = subset.where(subset.deployment == depNum, drop=True)
            pco2_avg = dep_subset.seawater_ph.mean(skipna=True).compute().values
            pco2_std = dep_subset.seawater_ph.std(skipna=True).compute().values
            pco2_temp = dep_subset.seawater_temperature.mean().compute().values
            pco2_sal = dep_subset.practical_salinity.mean().compute().values
            pco2_pres = dep_subset.seawater_pressure.mean().compute().values
            phsen_flmb_comparison = phsen_flmb_comparison.append({
                    "deploymentNumber": depNum,
                    "time": t,
                    "PH_avg": pco2_avg,
                    "PH_std": pco2_std,
                    "T": pco2_temp,
                    "S": pco2_sal,
                    "P": pco2_pres,
                    "bottlePH": bottlePH,
                    "bottleT": bottleT,
                    "bottleS": bottleS,
                    "bottleP": bottleP,
                }, ignore_index=True)
    elif len(deps) == 1:
        pco2_avg = subset.seawater_ph.mean(skipna=True).compute().values
        pco2_std = subset.seawater_ph.std(skipna=True).compute().values
        pco2_temp = subset.seawater_temperature.mean().compute().values
        pco2_sal = subset.practical_salinity.mean().compute().values
        pco2_pres = subset.seawater_pressure.mean().compute().values
        depNum = deps[0]
        phsen_flmb_comparison = phsen_flmb_comparison.append({
                "deploymentNumber": depNum,
                "time": t,
                "PH_avg": pco2_avg,
                "PH_std": pco2_std,
                "T": pco2_temp,
                "S": pco2_sal,
                "P": pco2_pres,
                "bottlePH": bottlePH,
                "bottleT": bottleT,
                "bottleS": bottleS,
                "bottleP": bottleP,
            }, ignore_index=True)
    else:
        # Now put in NaNs for the missing results data
        phsen_flmb_comparison = phsen_flmb_comparison.append({
                "deploymentNumber": depNum,
                "time": t,
                "PH_avg": None,
                "PH_std": None,
                "T": None,
                "S": None,
                "P": None,
                "bottlePH": bottlePH,
                "bottleT": bottleT,
                "bottleS": bottleS,
                "bottleP": bottleP,
            }, ignore_index=True)

# Save the results
phsen_flmb_comparison

Plot the comparison

In [None]:
df = phsen_flmb_comparison.dropna(subset=["deploymentNumber"])
df["bottlePH"] = df["bottlePH"].astype(float)
df["PH_avg"] = df["PH_avg"].astype(float)
df["PH_std"] = df["PH_std"].astype(float)
#df = df[df["deploymentNumber"] != 3]

fig, ax = plt.subplots(figsize=(8,8))

# Set the 
levels, categories = pd.factorize(df['deploymentNumber'])
colors = [plt.cm.tab10(i) for i in levels] # using the "tab10" colormap
handles = [matplotlib.patches.Patch(color=plt.cm.tab10(i), label=c) for i, c in enumerate(categories)]
for h in handles:
    h.set_edgecolor("black")

# Plot the data
ax.scatter(df["bottlePH"], df["PH_avg"], c=colors, s=80, edgecolors="black")
ax.set_xlabel(xlabel="Bottle pH$", fontsize=14)
ax.set_ylabel(ylabel="PHSEN pH$", fontsize=14)
ax.legend(handles=handles, title='Deployment', fontsize=12, edgecolor="black")
ax.grid()

# Add in vertical error bars
for i, c in enumerate(categories):
    x = df["bottlePH"][df["deploymentNumber"] == c]
    ymin = df["PH_avg"][df["deploymentNumber"] == c] - 2*df["PH_std"][df["deploymentNumber"] == c]
    ymax = df["PH_avg"][df["deploymentNumber"] == c] + 2*df["PH_std"][df["deploymentNumber"] == c]
    ax.vlines(x, ymin, ymax, colors = plt.cm.tab10(i))

ax.set_ylim((8, 8.15))
ax.set_xlim((8, 8.15))
# Add in a 1:1 line
x = np.arange(8, 8.205, 0.05)
y = np.arange(8, 8.205, 0.05)
ax.plot(x, y, color = "black", alpha=0.7)

In [None]:
fig.savefig(f"../results/{refdes}/instrument_vs_bottle_comparison.png")

### PHSEN Comparison
Now, combine all of the phsen instruments and compare with one another

In [None]:
fig, ax = plt.subplots(nrows=2, ncols=2, figsize=(20, 16), sharex=True, sharey=True)

param = "seawater_ph"

# Plot the 20 m SUMO PHSEN
# ------------------------
# Calculate the figure bounds
yavg, ystd, ymin, ymax = calculate_ylims(phsen_20m, param)

# Generate the plot figure
s = phsen_20m.plot.scatter("time", param, ax=ax[0][0], hue="deployment", hue_style="discrete")

# Add the deployments
ax[0][0].legend(edgecolor="black", loc="center left", bbox_to_anchor=(1, 0.5), fontsize=14, title="Deployments")
deployments = np.unique(phsen_20m["deployment"])
for depNum in deployments:
    dt = phsen_20m.where(phsen_20m["deployment"] == depNum, drop=True)["time"].min()
    ax[0][0].vlines(dt.values, yavg-4*ystd, yavg+4*ystd)
    ax[0][0].text(dt.values, yavg-1.5*ystd, str(int(depNum)), fontsize=14, weight="bold")
xmin, xmax = ax[0][0].get_xlim()

# Add the bottle samples
b, = ax[0][0].plot(phsen_20m_bottles.dropna(subset=["Calculated pH"])["Start Time [UTC]"], 
             phsen_20m_bottles.dropna(subset=["Calculated pH"])["Calculated pH"], 
             marker="o", color="yellow", linestyle="", markersize=12, markeredgecolor = "black",
             label="Bottle\npH")
#ax[0].legend(handles=[b], edgecolor="black", loc="lower left", bbox_to_anchor=(1, 0.2), fontsize=12)


# Set the ylims
ax[0][0].set_xlim(xmin, xmax)
ax[0][0].set_ylim(ymin, ymax)
ax[0][0].grid()
ax[0][0].set_title("PHSEN: SUMO @ 20 m", fontsize=20, weight="bold")
ax[0][0].set_ylabel("pH Seawater\n[Total scale]", fontsize=20, weight="bold")
#plt.yticks(fontsize=12, weight="bold")

for tick in ax[0][0].yaxis.get_major_ticks():
    tick.label1.set_fontsize(12)
    tick.label1.set_fontweight("bold")


# Plot the 100 m SUMO PHSEN
# ------------------------
# Calculate the figure bounds
yavg, ystd, ymin, ymax = calculate_ylims(phsen_100m, param)

# Generate the plot figure
s = phsen_100m.plot.scatter("time", param, ax=ax[1][0], hue="deployment", hue_style="discrete")

# Add the deployments
ax[1][0].legend(edgecolor="black", loc="center left", bbox_to_anchor=(1, 0.5), fontsize=14, title="Deployments")
deployments = np.unique(phsen_100m["deployment"])
for depNum in deployments:
    dt = phsen_100m.where(phsen_100m["deployment"] == depNum, drop=True)["time"].min()
    ax[1][0].vlines(dt.values, yavg-4*ystd, yavg+4*ystd)
    ax[1][0].text(dt.values, yavg-1.25*ystd, str(int(depNum)), fontsize=14, weight="bold")
xmin, xmax = ax[1][0].get_xlim()

# Add the bottle samples
b, = ax[1][0].plot(phsen_100m_bottles.dropna(subset=["Calculated pH"])["Start Time [UTC]"], 
             phsen_100m_bottles.dropna(subset=["Calculated pH"])["Calculated pH"], 
             marker="o", color="yellow", linestyle="", markersize=12, markeredgecolor = "black",
             label="Bottle\npH")
#ax[0].legend(handles=[b], edgecolor="black", loc="lower left", bbox_to_anchor=(1, 0.2), fontsize=12)
for tick in ax[1][0].xaxis.get_major_ticks():
    tick.label1.set_fontsize(12)
    tick.label1.set_fontweight("bold")
    
for tick in ax[1][0].yaxis.get_major_ticks():
    tick.label1.set_fontsize(12)
    tick.label1.set_fontweight("bold")

# Set the ylims
ax[1][0].set_xlim(xmin, xmax)
ax[1][0].set_ylim(ymin, ymax)
ax[1][0].grid()
ax[1][0].set_title("PHSEN: SUMO @ 100 m", fontsize=20, weight="bold")
ax[1][0].set_ylabel("pH Seawater\n[Total scale]", fontsize=20, weight="bold")
#plt.yticks(fontsize=12, weight="bold")

# Plot the FLMA PHSEN
# ------------------------
# Calculate the figure bounds
yavg, ystd, ymin, ymax = calculate_ylims(phsen_flma, param)

# Generate the plot figure
s = phsen_flma.plot.scatter("time", param, ax=ax[0][1], hue="deployment", hue_style="discrete")

# Add the deployments
ax[0][1].legend(edgecolor="black", loc="center left", bbox_to_anchor=(1, 0.5), fontsize=14, title="Deployments")
deployments = np.unique(phsen_flma["deployment"])
for depNum in deployments:
    dt = phsen_flma.where(phsen_flma["deployment"] == depNum, drop=True)["time"].min()
    ax[0][1].vlines(dt.values, yavg-4*ystd, yavg+4*ystd)
    ax[0][1].text(dt.values, yavg-2.8*ystd, str(int(depNum)), fontsize=14, weight="bold")
xmin, xmax = ax[0][1].get_xlim()

# Add the bottle samples
b, = ax[0][1].plot(phsen_flma_bottles.dropna(subset=["Calculated pH"])["Start Time [UTC]"], 
             phsen_flma_bottles.dropna(subset=["Calculated pH"])["Calculated pH"], 
             marker="o", color="yellow", linestyle="", markersize=12, markeredgecolor = "black",
             label="Bottle\npH")
#ax[0].legend(handles=[b], edgecolor="black", loc="lower left", bbox_to_anchor=(1, 0.2), fontsize=12)


# Set the ylims
ax[0][1].set_xlim(xmin, xmax)
ax[0][1].set_ylim(ymin, ymax)
ax[0][1].grid()
ax[0][1].set_title("PHSEN: FLMA @ 30 m", fontsize=20, weight="bold")
ax[0][1].set_ylabel(None)


#plt.yticks(fontsize=12, weight="bold")

# Plot the FLMB PHSEN
# ------------------------
# Calculate the figure bounds
yavg, ystd, ymin, ymax = calculate_ylims(phsen_flmb, param)

# Generate the plot figure
s = phsen_flmb.plot.scatter("time", param, ax=ax[1][1], hue="deployment", hue_style="discrete")

# Add the deployments
ax[1][1].legend(edgecolor="black", loc="center left", bbox_to_anchor=(1, 0.5), fontsize=14, title="Deployments")
deployments = np.unique(phsen_flmb["deployment"])
for depNum in deployments:
    dt = phsen_flmb.where(phsen_flmb["deployment"] == depNum, drop=True)["time"].min()
    ax[1][1].vlines(dt.values, yavg-4*ystd, yavg+4*ystd)
    ax[1][1].text(dt.values, yavg-3.3*ystd, str(int(depNum)), fontsize=14, weight="bold")
xmin, xmax = ax[1][1].get_xlim()

# Add the bottle samples
b, = ax[1][1].plot(phsen_flmb_bottles.dropna(subset=["Calculated pH"])["Start Time [UTC]"], 
             phsen_flmb_bottles.dropna(subset=["Calculated pH"])["Calculated pH"], 
             marker="o", color="yellow", linestyle="", markersize=12, markeredgecolor = "black",
             label="Bottle\npH")
#ax[0].legend(handles=[b], edgecolor="black", loc="lower left", bbox_to_anchor=(1, 0.2), fontsize=12)


# Set the ylims
ax[1][1].set_xlim(xmin, xmax)
ax[1][1].set_ylim(ymin, ymax)
ax[1][1].grid()
ax[1][1].set_title("PHSEN: FLMB @ 30 m", fontsize=20, weight="bold")
ax[1][1].set_ylabel(None)

for tick in ax[1][1].xaxis.get_major_ticks():
    tick.label1.set_fontsize(12)
    tick.label1.set_fontweight("bold")

fig.autofmt_xdate()
plt.xlabel("")

In [None]:
fig.savefig("../results/PHSEN_time_series_all.png", facecolor="white", transparent=False)

In [None]:
phsen_20m_comparison["instrument"] = "sumo_20m"
phsen_100m_comparison["instrument"] = "sumo_100m"
phsen_flma_comparison["instrument"] = "flma"
phsen_flmb_comparison["instrument"] = "flmb"
phsen_comparison = phsen_20m_comparison.append(phsen_100m_comparison).append(phsen_flma_comparison).append(phsen_flmb_comparison)
phsen_comparison.to_excel("../results/PHSEN_comparison.xlsx", index=False)

In [None]:
df = phsen_comparison.dropna(subset=["deploymentNumber"])
df["bottlePH"] = df["bottlePH"].astype(float)
df["PH_avg"] = df["PH_avg"].astype(float)
df["PH_std"] = df["PH_std"].astype(float)
#df = df[df["deploymentNumber"] != 3]

fig, ax = plt.subplots(figsize=(8,8))

# Set the 
levels, categories = pd.factorize(df['instrument'])
colors = [plt.cm.tab10(i) for i in levels] # using the "tab10" colormap
handles = [matplotlib.patches.Patch(color=plt.cm.tab10(i), label=c) for i, c in enumerate(categories)]
for h in handles:
    h.set_edgecolor("black")

# Plot the data
ax.scatter(df["bottlePH"], df["PH_avg"], c=colors, s=80, edgecolors="black")
ax.set_xlabel(xlabel="Bottle pH", fontsize=16, weight="bold")
ax.set_ylabel(ylabel="PHSEN pH", fontsize=16, weight="bold")
ax.legend(handles=handles, title='Instrument', fontsize=12, edgecolor="black", loc="upper right")
ax.grid()

first_legend = ax.get_legend()
ax.add_artist(first_legend)

# Add in vertical error bars
for i, c in enumerate(categories):
    x = df["bottlePH"][df["instrument"] == c]
    ymin = df["PH_avg"][df["instrument"] == c] - 2*df["PH_std"][df["instrument"] == c]
    ymax = df["PH_avg"][df["instrument"] == c] + 2*df["PH_std"][df["instrument"] == c]
    ax.vlines(x, ymin, ymax, colors = plt.cm.tab10(i))

ax.set_ylim((8, 8.2))
ax.set_xlim((8, 8.2))
# Add in a 1:1 line
x = np.arange(8, 8.205, 0.05)
y = np.arange(8, 8.205, 0.05)
s, = ax.plot(x, y, color = "black", alpha=0.7, label="1:1 line")
ax.legend(handles=[s], edgecolor="black", loc="lower right", fontsize=12)


for tick in ax.xaxis.get_major_ticks():
    tick.label1.set_fontsize(12)
    tick.label1.set_fontweight("bold")
    
for tick in ax.yaxis.get_major_ticks():
    tick.label1.set_fontsize(12)
    tick.label1.set_fontweight("bold")
    


In [None]:
fig.savefig("../results/phsen_comparison.png", facecolor="white", transparent=False)

## Here, add in the stuff about looking at deployment 3 of FLMA

In [None]:
flma_dep3 = phsen_flma.where(phsen_flma.deployment == 3, drop=True)
bottles3 = phsen_flma_comparison[phsen_flma_comparison["deploymentNumber"]==3]

In [None]:
bottles3

In [None]:
dif3 = flma_dep3.seawater_ph.diff(dim="time").compute()
dif3

In [None]:
dif3_std = dif3.std().values
dif3_std

In [None]:
# Plot the time series with the difference value
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(15, 12))

# Plot the data
ax.plot(flma_dep3["time"], flma_dep3[param], marker=".", linestyle="", color="tab:green", markersize=8, label="Data")
ax.vlines(flma_dep3["time"], flma_dep3[param]-2*dif3_std, flma_dep3[param]+2*dif3_std, color="tab:green", alpha=0.1, label="2$\sigma$")
ax.plot(bottles3["time"], bottles3["bottlePH"], marker="o", linestyle="", color="yellow", markersize=16, markeredgecolor="black", label="Bottle pH")
#s = flma_dep3.plot.scatter("time", param, ax=ax, hue="deployment", hue_style="discrete")
handles, labels = ax.get_legend_handles_labels()
handles[2] = matplotlib.patches.Patch(color=handles[2].get_color(), label=labels[2])
handles[2].set_edgecolor("black")
ax.legend(handles=handles, fontsize=14)
ax.grid()
ax.set_ylabel(flma_dep3[param].attrs["long_name"], fontsize=16, weight="bold")
ax.set_title("PHSEN: FLMA @ 30 m, Deployment #3", fontsize=16, weight="bold")
fig.autofmt_xdate()


for tick in ax.xaxis.get_major_ticks():
    tick.label1.set_fontsize(12)
    tick.label1.set_fontweight("bold")
    
for tick in ax.yaxis.get_major_ticks():
    tick.label1.set_fontsize(12)
    tick.label1.set_fontweight("bold")

In [None]:
fig.savefig("../results/FLMA_PHSEN_Dep3_time_series_with_bottles.png", facecolor="white", transparent=False)

#### FLMA Deployment 4

In [None]:
flma_dep4 = phsen_flma.where(phsen_flma.deployment == 4, drop=True)
bottles4 = phsen_flma_comparison[phsen_flma_comparison["deploymentNumber"]==4]

In [None]:
bottles4

In [None]:
dif4 = flma_dep4.seawater_ph.diff(dim="time").compute()
dif4

In [None]:
dif4_std = dif4.std().values
dif4_std

In [None]:
# Plot the time series with the difference value
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(15, 12))

# Plot the data
ax.plot(flma_dep4["time"], flma_dep4[param], marker=".", linestyle="", color="tab:red", markersize=8, label="Data")
ax.vlines(flma_dep4["time"], flma_dep4[param]-2*dif4_std, flma_dep4[param]+2*dif4_std, color="tab:red", alpha=0.1, label="2$\sigma$")
ax.plot(bottles4["time"], bottles4["bottlePH"], marker="o", linestyle="", color="yellow", markersize=16, markeredgecolor="black", label="Bottle pH")
#s = flma_dep3.plot.scatter("time", param, ax=ax, hue="deployment", hue_style="discrete")
handles, labels = ax.get_legend_handles_labels()
handles[2] = matplotlib.patches.Patch(color=handles[2].get_color(), label=labels[2])
handles[2].set_edgecolor("black")
ax.legend(handles=handles, fontsize=14)
ax.grid()
ax.set_ylabel(flma_dep4[param].attrs["long_name"], fontsize=16, weight="bold")
ax.set_title("PHSEN: FLMA @ 30 m, Deployment #4", fontsize=16, weight="bold")
fig.autofmt_xdate()
ax.set_ylim((7.95, 8.15))


for tick in ax.xaxis.get_major_ticks():
    tick.label1.set_fontsize(12)
    tick.label1.set_fontweight("bold")
    
for tick in ax.yaxis.get_major_ticks():
    tick.label1.set_fontsize(12)
    tick.label1.set_fontweight("bold")

In [None]:
fig.savefig("../results/FLMA_PHSEN_Dep4_time_series_with_bottles.png", facecolor="white", transparent=False)