# Discrete data from OOI

## Introducing the Alfresco server and 'Discrete Summary'


Our objective in this section is to interpret **Discrete Summary** data files from Regional Cabled Array 
cruises, 2016 to present. This necessitates some explanation of what is meant by "**Discrete Summary**".


Casts are rosette frameworks (or sometimes ROVs) that are deployed on cables to descend through 
the water column. By means of paying out cable, rosettes carry open sample bottles to pre-determined
depths. At these 'sample' depths, a mechanism is triggered to close the ends of the bottle, 
capturing a water sample. 
Also carried on the rosette frame are continuously-operating sensors.
Continuous data is excerpted for times corresponding to the closure times of the sample bottles.
When the samples are recovered they are analyzed in the lab, during or after the cruise. 
Additionally some environmental parameters such as Bicarbonate concentration are
calculated from a set of sensor measurements. The combination of continuous, sampled,
and derived values are respectively given prefixes 'ctd', 'discrete' and
'calculated'. Taken together in summarized tabular form they are referred to as 
'discrete samples'. 


Each rosette cast is given a unique identifier such as 'cast-001'. For Cabled Array cruises
the typical rosette carries 24 sample bottles. The cast and recovery operation takes on the order
of an hour. Some casts are timed to coincide with shallow profiler *profiles*. We are interested
in whether the cast data and the profiler data correspond well. This would indicate that the data
are viable for further analysis.


Note that the continuous data from the rosette (which is excerpted as described above) can also be
compared to profiler data. This is not done here.


* Cruise data are available from the OOI 'Alfresco' server
    * [Alfresco browse site](https://alfresco.oceanobservatories.org/alfresco/faces/jsp/browse/browse.jsp)
    * Navigate: Cabled array > Cruise data > (year of choice) > Ship Data > Water Sampling
    * Example: (year of choice) = 2021 is folder **Cabled-12_TN393_2021-07-30**
        * TN refers to the ship: Thompson. Atlantis = AT; Revelle = RR; Sikuliaq = SKQ (2016 cruise)
        * 393 refers to the cruise number
            * Cabled-12 indicates the OOI Regional Cabled Array; where 12 is simply an index number (by year)
            * For the OOI cabled array the annual cruise uses Newport OR as the base of operations
            * The cruise consists of a series of sorties or 'legs', each typically 5-10 days
            * The provided date 2021-07-30 indicates the year and approximate date of the cruise start
    * In the Water Sampling folder there are two files of interest
        * **Cabled-12_TN393_Discrete_Summary-README.pdf** is a guide to deciphering the data, particularly ephemeris flags
        * **Cabled-12_TN393_Discrete_Summary.csv** is a tabular data file 
            * This file is the **Discrete Summary** of interest
            * It summarizes cast sample data and synchronous continuous sensor data; see below

* **Discrete Summary** data are associated with Niskin bottle trigger events
    * A Niskin bottle is deployed in 'open' state on a submerged framework
        * Most commonly this framework is a [rosette](https://www.whoi.edu/what-we-do/explore/instruments/instruments-sensors-samplers/rosette-sampler/)
        * A Remotely Operated Vehicle (ROV) can also be used as a Niskin bottle framework
        * For our purposes here we just use the term 'rosette' to refer to the sample bottle framework
    * Submerged rosettes capture water samples in Niskin bottles
        * Rosettes also carry sensors that continuously record data, typically at one sample per second
            * Examples of continuous sensor data includes seawater conductivity, temperature, depth, and fluorescence
    * Niskin bottles are lengths of pipe with two end caps
        * They are deployed in an open state so seawater flows through them
        * When they are triggered: Caps under tension are released so as to seal the ends of the pipe
        * As a results a seawater sample from some depth is captured for later (shipboard laboratory) analysis
   
* Continuous data
    * Rosette downcast (from shipboard down through the water column) is optimal for continuous sensors
        * The water is relatively undisturbed as the rosette is lowered
    * Rosette upcast is when Niskin bottles are typically triggered
        * On ascent: The Rosette is left at a trigger depth for 30 - 60 seconds before the actual trigger
            * This allows the sensors to stabilize / equilibrate
    * The continuous data is available under file extension **`.hex`** plus supporting **`.xmlcon`** files
        * These can be analyzed using (freely available) software from SeaBird: search 'seabird ctd software'
    * The Niskin bottle firing times are timestamped
        * The corresponding continuous data can be extracted, i.e. corresponding to the bottle firing times
        * This extracted continuous data is stored in 'bottle' files, extension **`.btl`**
    * A typical cast might take an hour; producing 3600 sensor values from a continuous sensor
        * That cast might include 24 Niskin bottle captures. 
        * Hence only '24' of the 3600 continuous sensor data values pertain to the **Discrete Summary**
            * The continuous profile (from the downcast) is dealt with separately.
                * Minor point: Some continuous sensors make use of 'plumbing' with intrinsic inter-sensor delays
                * Small time corrections ('alignments') are applied
    * Discrete sampling is used as a 'gold standard' to inform subsequent continuous profile data cleaning
    
    
### Additional details


* Clarify: Nitrate versus nitrite
* Clarify: Discrete versus Calculated pCO2 
* The repo folder **`DiscreteSummaries`** contains:
    * Tabular data (**`.csv`**) for 2015 - 2021
    * README metadata (**`.pdf`**) for 2015 - 2021
    * Renamed for simplicity: **`RCA2021_TN_DiscSumm.csv`** and **`README_RCA2021.pdf`** etcetera


### Statement of the analytical problem


The high-level objective is to reach scientific insight from the OOI observational data. The 
corresponding fundamental question concerning shallow profiler data is 'How stable and reliable 
are the data from a given profile?' To approach this question we would like independent observations.
This is the value of the cruise water sample data and the associated continuous data from the 
rosette casts. To make progress here we have a series of comparatives: First comparing discrete
data from one cast to another, then secondly comparing discrete data upcasts with continuous data downcasts.
Thirdly we can compare upcast and downcast data with shallow profiler data. For brevity we adopt 
these terms:


* Upcast comparisons (discrete data between two rosette upcasts)
* Discrete-continuous comparisons (upcast versus downcast data)
* Rosette-Profiler comparisons 


This data runs 2016 through 2021, from month-long summer cruises. 2018 is of particular
focus here owing to including many successful casts. 

## Upcast distances


Two distinct rosette upcasts have both spatial and time separation.
This suggests coherence times and distances: How far (spatial / time) to move before cast data are uncorrelated? 

We have three shallow profiler sites (abbreviated OOS, OSB and AXB) and seven years of data: 2015 -- 2021.
The easiest place to start is to look at the cast summaries for each year to identify comparable casts.


**`SP`** is Shallow Profiler station. Cast location is stated relative; but the cast might go much deeper than 200 meters.
**`DP`** is Deep Profiler station
**`CTD-002`** is a CTD cast with index (skips are common)
**`500m W`** indicates an offset from the station position
** Needed: Station differences for Deep and Shallow profilers at each site


### 2015

```
1.
```

### 2016

```
1.
```

### 2017

```
1.
```

### 2018


```
1. OSB SP, CTD-002, 2018-06-26T20:23:04.000Z, 500m W
2. AXB SP, CTD-003, 2018-07-02T20:59:24.000Z, 500m E
3. OSB SP, CTD-006, 2018-07-10T20:45:57.000Z, 250m W 
4. OSB SP, CTD-008, 2018-07-10T22:43:40.000Z, 500m W
5. OOS SP, CTD-009, 2018-07-15T04:39:48.000Z,   0m 
6. OOS SP, CTD-011, 2018-07-17T22:38:58.000Z, 250m E
7. OOS DP, CTD-013, 2018-07-23T23:20:04.000Z, 500m SW
8. OSB DP, CTD-014, 2018-07-25T17:47:23.000Z, 750m E
9. AXB SP, CTD-016, 2018-07-27T06:21:49.000Z, 500m E
```


Comparatives: 


* **`OSB 1 vs 3 vs 4 (vs 8, DP):    ['CTD-002', 'CTD-006', 'CTD-008', 'CTD-014']`**
* **`AXB we have 2 vs 9             ['CTD-003', 'CTD-016']`**
* **`OOS we have 5 vs 6 (vs 7, DP)  ['CTD-009', 'CTD-011', 'CTD-013']`**


### 2019


```
1. OOS SP, CTD-005, 2019-06-13T21:38:14.000Z,  250m W
2. OSB SP, CTD-006, 2019-06-14T11:24:21.000Z,  250m W
3. AXB DP, CTD-007, 2019-06-17T22:27:51.000Z, 1000m E (SP: 550m E)
4. OOS SP, CTD-008, 2019-06-24T05:30:10.000Z,  250m W
5. OSB SP, CTD-010, 2019-06-27T19:38:18.000Z,  250m W
6. OOS SP, CTD-012, 2019-06-28T17:31:26.000Z,  250m W
7. AXB DP, CTD-013, 2019-07-07T06:22:44.000Z, 1000m E (SP: 550m E)
8. AXB DP, CTD-014, 2019-07-07T08:17:00.000Z, 1000m E (SP: 550m E)
```


Comparatives: 


* OSB we have 2 vs 5
* AXB we have 3 vs 7 vs 8
* OOS we have 1 vs 4 vs 6



### 2020


```
1. OSB SP, CTD-001, 2020-08-03T15:17:41.000Z,    0m
2. AXB DP, CTD-003, 2020-08-08T21:53:32.000Z, 1000m E
3. AXB DP, CTD-004, 2020-08-08T23:36:18.000Z, 1000m E
4. OSB SP, CTD-006, 2020-08-11T04:28:22.000Z, 1300m NE
5. OSB SP, CTD-007, 2020-08-11T05:48:08.000Z, 1300m NE
6. OOS SP, CTD-009, 2020-08-11T23:39:18.000Z,  500m W
7. OOS SP, CTD-010, 2020-08-12T00:49:20.000Z,  500m W
```


Comparatives: 


* OSB we have 1 vs 4 vs 5
* AXB we have 2 vs 3
* OOS we have 6 vs 7


### 2021


```
1. AXB SP, CTD-001, 2021-08-03T01:25:44.000Z, 500m NW
2. AXB SP, CTD-003, 2021-08-04T13:48:19.000Z, 500m E
3. OSB SP, CTD-005, 2021-08-06T01:43:57.000Z, 500m SW
4. OOS SP, CTD-006, 2021-08-26T09:16:44.000Z, 500m E
5. AXB DP, CTD-007, 2021-09-01T05:03:07.000Z, 1000m E
```


Comparatives:


* AXB we have 1 versus 2 versus 5

The **numpy** library has a **datetime64** type used in this notebook. Pandas has a related type called **datetime**. 
There is also a common need to sort out time differences, for example using **timestamps**. For an explanation of 
Python constructs such as **`datetime`**, **`datetime64`** and **`timestamp`**: Search for a good web
resource such as
[this one](https://www.kite.com/python/answers/how-to-convert-between-datetime,-timestamp,-and-datetime64-in-python).


The following cell reads in discrete summary data (CSV files pulled from the OOI alfresco server) for use in 
intercomparison of CTD casts and comparison of those casts with shallow profiler data. 

In [1]:
def ReadDiscreteSummary(fnm):
    """
    Discrete summary files bring together three data sources: Niskin bottle sample analysis data (with 
    discrete sample times), calculated (post-processed) data values, and continuous sensor data subsets
    (from times that correspond to Niskin bottle captures). The discrete summary is a CSV file with rows
    corresponding to Niskin bottle samples and columns corresponding to these three data types. The
    CSV file typically has many NaN values (-9999999). The file is loaded into a DataFrame which selects
    out a large subset of the full complement of columns. These selected columns are subsequently
    renamed to make the data a little simpler to code against. 'CTD Temperature 1 [deg C]' becomes 'temp1'
    for example. Time values are then converted to datetime64 (herein abbreviated dt64). This applies 
    to both cast start time -- typically when the rosette entered the water -- and the Niskin closure times.
    """
    
    # Read the Discrete Summary CSV into a Dataframe dsDf
    dsDf = pd.read_csv(fnm, usecols=["Station", "Start Latitude [degrees]", "Start Longitude [degrees]",            \
                                   "Start Time [UTC]", "Cast", "Cast Flag", "Bottom Depth at Start Position [m]", \
                                   "CTD File Flag", "Niskin/Bottle Position", "Niskin Flag",                      \
                                   "CTD Bottle Closure Time [UTC]",                                               \
                                   "CTD Pressure [db]",                                                           \
                                   "CTD Depth [m]", "CTD Latitude [deg]", "CTD Longitude [deg]",                  \
                                   "CTD Temperature 1 [deg C]",                                                   \
                                   "CTD Temperature 2 [deg C]",                                                   \
                                   "CTD Salinity 1 [psu]",                                                        \
                                   "CTD Salinity 2 [psu]",                                                        \
                                   "CTD Oxygen [mL/L]",                                                           \
                                   "CTD Oxygen Saturation [mL/L]",                                                \
                                   "CTD Fluorescence [mg/m^3]",                                                   \
                                   "CTD Beam Attenuation [1/m]",                                                  \
                                   "CTD Beam Transmission [%]",                                                   \
                                   "CTD pH",                                                                      \
                                   "Discrete Oxygen [mL/L]",                                                      \
                                   "Discrete Chlorophyll [ug/L]",                                                 \
                                   "Discrete Phaeopigment [ug/L]",                                                \
                                   "Discrete Fo/Fa Ratio",                                                        \
                                   "Discrete Phosphate [uM]",                                                     \
                                   "Discrete Silicate [uM]",                                                      \
                                   "Discrete Nitrate [uM]",                                                       \
                                   "Discrete Nitrite [uM]",                                                       \
                                   "Discrete Ammonium [uM]",                                                      \
                                   "Discrete Salinity [psu]",                                                     \
                                   "Discrete DIC [umol/kg]",                                                      \
                                   "Discrete pH [Total scale]",                                                   \
                                   "Calculated DIC [umol/kg]",                                                    \
                                   "Calculated pCO2 [uatm]",                                                      \
                                   "Calculated pH",                                                               \
                                   "Calculated CO2aq [umol/kg]",                                                  \
                                   "Calculated Bicarb [umol/kg]",                                                 \
                                   "Calculated CO3 [umol/kg]",                                                    \
                                   "Calculated Omega-C",                                                          \
                                   "Calculated Omega-A"                                                           \
                                  ])
    
    dsDf.columns=['station', 'lat', 'lon', 'cast start', 'cast', 'cast flag', 'bottom depth',    \
                'ctd flag', 'niskin', 'niskin flag', 'closure time', 'ctd pressure',           \
                'ctd depth', 'ctd lat', 'ctd lon', 'temp1', 'temp2', 'sal1', 'sal2',           \
                'oxygen', 'saturation', 'fluorescence', 'attenuation', 'transmission',         \
                'pH', 'discrete oxygen', 'discrete chlorophyll', 'discrete phaeopigment',      \
                'discrete Fo/Fa', 'discrete phosphate', 'discrete silicate',                   \
                'discrete nitrate', 'discrete nitrite', 'discrete ammonium',                   \
                'discrete salinity', 'discrete DIC', 'discrete pH',                            \
                'calculated DIC', 'calculated pCO2', 'calculated pH', 'calculated CO2aq',      \
                'calculated bicarb', 'calculated CO3', 'calculated Omega-C',                   \
                'calculated Omega-A'
               ]

    dsDf['cast start']   = pd.to_datetime(dsDf['cast start'], errors='ignore')
    dsDf['closure time'] = pd.to_datetime(dsDf['closure time'], errors='ignore')

    # return: Using df.T will transpose the DataFrame: index becomes the above attributes and columns are samples
    return dsDf.replace(-9999999.000, np.NaN)  

dsDf = ReadDiscreteSummary(os.getcwd()+"/DiscreteSummaries/RCA2018_RR_DiscSumm.csv")

NameError: name 'os' is not defined

In [2]:
dsA, dsB, dsC, dsT, dsS, dsO, dsH, dsI, dsN, dsP, dsU, dsV, dsW = ReadOSB_JuneJuly2018_1min()

NameError: name 'ReadOSB_JuneJuly2018_1min' is not defined

In [3]:
# OSB 2018 (see comparatives above)
cast_list = ['CTD-002', 'CTD-006', 'CTD-008', 'CTD-014']
a = [GetDiscreteSummaryCastSubset(dsDf, c, ['closure time', 'ctd depth', 'temp1', 'temp2',  \
                                            'sal1', 'sal2', 'oxygen', 'discrete oxygen',    \
                                            'fluorescence', 'discrete chlorophyll',         \
                                            'pH', 'discrete pH', 'calculated pH',           \
                                            'discrete nitrate', 'discrete salinity',        \
                                            'discrete phaeopigment']) for c in cast_list]

print(a[0]['closure time'].iloc[0], '---', a[1]['closure time'].iloc[0], '---',\
      a[2]['closure time'].iloc[0], '---', a[3]['closure time'].iloc[0], '\n')


# notes: how to get into human-readable time gaps:
# 1 + int((theDatetime - dt64(str(theDatetime)[0:4] + '-01-01')) / td64(1, 'D'))
pIdcs_CTD002, pDf18 = GetProfileDataFrameIndicesForSomeTime('osb', '2018', a[0]['closure time'].iloc[0], 60)
pIdcs_CTD006, pDf18 = GetProfileDataFrameIndicesForSomeTime('osb', '2018', a[1]['closure time'].iloc[0], 60)

# For CTD-002 the comparative profiler Dataframe index is 617; for CTD-006 it is 640

for idx in pIdcs_CTD002: print(pDf18['ascent_start'][idx], '  ~~compare to cast~~>  ', a[0]['closure time'].iloc[0])
for idx in pIdcs_CTD006: print(pDf18['ascent_start'][idx], '  ~~compare to cast~~>  ', a[1]['closure time'].iloc[0])

NameError: name 'GetDiscreteSummaryCastSubset' is not defined

In [4]:
pDf18['ascent_start'][617]

NameError: name 'pDf18' is not defined

## Comparatives

The next cells compare shallow profiler data with discrete summary data. This table gives correspondence terms. 

```
Temperature: dsT / Ts0.temp        compares with two sensors 'temp1' and 'temp2'
Salinity:    dsS / Ss0.salinity    compares with three sensors 'sal1', 'sal2', 'discrete salinity' 
DOxygen:     dsO / Os0.doxygen     compares with 'oxygen' and 'discrete oxygen'
chlor-a:     dsA / As0.chlora      compares with 'fluorescence' and 'discrete chlorophyll'
backscatter: dsB / Bs0.backscatter No obvious comparison...
cdom:        dsC / Cs0.cdom        No obvious comparison...
PAR:                               No obvious comparison...
SPKIR:                             No obvious comparison...
current:                           No obvious comparison

pH:                                compares with 'pH', 'discrete pH' 
nitrate:                           compares wtih 'discrete nitrate'
```

In [5]:
# Temperature (rosette has two temperature sensors 'temp1' and 'temp2')

# Shallow profiler data subsets
Ts0 = dsT.sel(time=slice(pDf18['ascent_start'][pIdcs_CTD002[0]], pDf18['ascent_end'][pIdcs_CTD002[0]]))
Ts1 = dsT.sel(time=slice(pDf18['ascent_start'][pIdcs_CTD006[0]], pDf18['ascent_end'][pIdcs_CTD006[0]]))

# two shallow casts 14 days apart are: a[0] and a[1]
fig, axs = plt.subplots(2, 1, figsize=(8, 10), tight_layout=True)
axs[0].plot(a[0]['temp1'], a[0]['ctd depth'], linewidth=6, color='m')
axs[0].plot(a[0]['temp2'], a[0]['ctd depth'], color='k', marker='v', ms=9.)
axs[1].plot(a[1]['temp1'], a[1]['ctd depth'], linewidth=6, color='m')
axs[1].plot(a[1]['temp2'], a[1]['ctd depth'], color='k')
axs[0].set(ylim = (240., 0.), title='First Cast + Profiler: Temperature')
axs[1].set(ylim = (240., 0.), title='Second Cast + Profiler')

axs[0].plot(Ts0.temp, -Ts0.z, marker='.', ms=12., color='k', mfc='y')
axs[1].plot(Ts1.temp, -Ts1.z, marker='.', ms=12., color='k', mfc='y')

print()

NameError: name 'dsT' is not defined

In [6]:
# Salinity: sal1, sal2, discrete salinity

# Shallow profiler data subsets
Ss0 = dsS.sel(time=slice(pDf18['ascent_start'][pIdcs_CTD002[0]], pDf18['ascent_end'][pIdcs_CTD002[0]]))
Ss1 = dsS.sel(time=slice(pDf18['ascent_start'][pIdcs_CTD006[0]], pDf18['ascent_end'][pIdcs_CTD006[0]]))

# compare the two shallow casts a[0] and a[1]
fig, axs = plt.subplots(2, 1, figsize=(8, 10), tight_layout=True)
axs[0].plot(a[0]['sal1'], a[0]['ctd depth'], linewidth=6, color='m')
axs[0].plot(a[0]['sal2'], a[0]['ctd depth'], color='k')
axs[1].plot(a[1]['sal1'], a[1]['ctd depth'], linewidth=6, color='g')
axs[1].plot(a[1]['sal2'], a[1]['ctd depth'], color='y')

axs[0].scatter(a[0]['discrete salinity'], a[0]['ctd depth'], color='g', s=400.)
axs[1].scatter(a[1]['discrete salinity'], a[1]['ctd depth'], color='k', s=400.)

axs[0].set(ylim = (240., 0.), title='First Cast + Profiler: Salinity')
axs[1].set(ylim = (240., 0.), title='Second Cast + Profiler')

axs[0].scatter(Ss0.salinity, -Ss0.z, color='b', s=49.)
axs[1].scatter(Ss1.salinity, -Ss1.z, color='b', s=49.)

print()

NameError: name 'dsS' is not defined

In [7]:
# Dissolved Oxygen

empirical_scalar = 42.

# Shallow profiler data subsets
Os0 = dsO.sel(time=slice(pDf18['ascent_start'][pIdcs_CTD002[0]], pDf18['ascent_end'][pIdcs_CTD002[0]]))
Os1 = dsO.sel(time=slice(pDf18['ascent_start'][pIdcs_CTD006[0]], pDf18['ascent_end'][pIdcs_CTD006[0]]))

# compare the two shallow casts a[0] and a[1]
fig, axs = plt.subplots(2, 1, figsize=(8, 10), tight_layout=True)
axs[0].plot(empirical_scalar*a[0]['oxygen'], a[0]['ctd depth'], linewidth=6, color='m')
axs[1].plot(empirical_scalar*a[1]['oxygen'], a[1]['ctd depth'], linewidth=6, color='g')

axs[0].scatter(empirical_scalar*a[0]['discrete oxygen'], a[0]['ctd depth'], color='g', s = 400.)
axs[1].scatter(empirical_scalar*a[1]['discrete oxygen'], a[1]['ctd depth'], color='k', s = 400.)

axs[0].set(ylim = (240., 0.), title='First Cast + Profiler: Dissolved Oxygen')
axs[1].set(ylim = (240., 0.), title='Second Cast + Profiler')

axs[0].scatter(Os0.doxygen, -Os0.z, color='b', s=49.)
axs[1].scatter(Os1.doxygen, -Os1.z, color='b', s=49.)

print()

NameError: name 'dsO' is not defined

In [8]:
# Chlorophyll

# Shallow profiler data subsets
As0 = dsA.sel(time=slice(pDf18['ascent_start'][pIdcs_CTD002[0]], pDf18['ascent_end'][pIdcs_CTD002[0]]))
As1 = dsA.sel(time=slice(pDf18['ascent_start'][pIdcs_CTD006[0]], pDf18['ascent_end'][pIdcs_CTD006[0]]))

# compare the two shallow casts a[0] and a[1]
fig, axs = plt.subplots(2, 1, figsize=(8, 10), tight_layout=True)
axs[0].plot(a[0]['fluorescence'], a[0]['ctd depth'], lw=3, color='m', marker='v', ms=5., mfc='k')
axs[1].plot(a[1]['fluorescence'], a[1]['ctd depth'], lw=3, color='m', marker='v', ms=5., mfc='k')
axs[0].set(ylim = (240., 0.), ylabel='depth (m)', xlabel='Chlor-A mass concentration, ug/l')
axs[0].set(title='Chlor-a: Shallow profiler (blue) versus cast data\n' + \
                  'Cast: green is sample bottle lab analysis; maroon is a rosette fluorometer')
axs[1].set(ylim=(240., 0.), ylabel='depth (m)', xlabel='Chlor-A mass concentration, ug/l')
axs[1].set(title='Chlor-a: Shallow profiler (blue) versus cast data\n' + \
                  'Cast: green is sample bottle lab analysis; maroon is a rosette fluorometer')

# shallow profiler
axs[0].plot(As0.chlora, -As0.z, marker='.', color='b', ms=10.)
axs[1].plot(As1.chlora, -As1.z, marker='.', color='b', ms=10.)

# niskin bottles
axs[0].plot(a[0]['discrete chlorophyll'], a[0]['ctd depth'], marker='^', color='g', ms=8.)
axs[1].plot(a[1]['discrete chlorophyll'], a[1]['ctd depth'], marker='^', color='g', ms=8.)


print()

NameError: name 'dsA' is not defined

In [9]:
As0.chlora.attrs

NameError: name 'As0' is not defined

In [10]:
As1.coords

NameError: name 'As1' is not defined

In [11]:
# CDOM

# Shallow profiler data subsets
Cs0 = dsC.sel(time=slice(pDf18['ascent_start'][pIdcs_CTD002[0]], pDf18['ascent_end'][pIdcs_CTD002[0]]))
Cs1 = dsC.sel(time=slice(pDf18['ascent_start'][pIdcs_CTD006[0]], pDf18['ascent_end'][pIdcs_CTD006[0]]))

# compare the two shallow casts a[0] and a[1]
fig, axs = plt.subplots(2, 1, figsize=(8, 10), tight_layout=True)
axs[0].set(ylim = (240., 0.), title='First Cast + Profiler: Fluorescence / CDOM')
axs[1].set(ylim = (240., 0.), title='Second Cast + Profiler')

axs[0].scatter(Cs0.cdom, -Cs0.z, color='b', s=49.)
axs[1].scatter(Cs1.cdom, -Cs1.z, color='b', s=49.)

print()

NameError: name 'dsC' is not defined

In [12]:
# pH

# Shallow profiler data subsets
Hs0 = dsH.sel(time=slice(pDf18['ascent_start'][pIdcs_CTD002[0]], pDf18['ascent_end'][pIdcs_CTD002[0]]))
Hs1 = dsH.sel(time=slice(pDf18['ascent_start'][pIdcs_CTD006[0]], pDf18['ascent_end'][pIdcs_CTD006[0]]))

# compare the two shallow casts a[0] and a[1]
fig, axs = plt.subplots(2, 1, figsize=(8, 10), tight_layout=True)
axs[0].plot(a[0]['pH'], a[0]['ctd depth'], linewidth=6, color='m')
# axs[0].plot(a[0]['discrete pH'], a[0]['ctd depth'], color='k')
axs[1].plot(a[1]['pH'], a[1]['ctd depth'], linewidth=6, color='g')
# axs[1].plot(a[1]['discrete pH'], a[1]['ctd depth'], color='y')
axs[0].set(ylim = (240., 0.), title='First Cast + Profiler: pH')
axs[1].set(ylim = (240., 0.), title='Second Cast + Profiler')

# Modify four elements of these two lines: Xs0, Xs0, Xs1, Xs1
# Modify the sensor type (e.g. .temp) to reflect X
axs[0].scatter(Hs0.ph, -Hs0.z, color='b', s=49.)
axs[1].scatter(Hs1.ph, -Hs1.z, color='b', s=49.)

print()

NameError: name 'dsH' is not defined

In [13]:
Hs0

NameError: name 'Hs0' is not defined

In [14]:
# Nitrate

# Shallow profiler data subsets
Ns0 = dsN.sel(time=slice(pDf18['ascent_start'][pIdcs_CTD002[0]], pDf18['ascent_end'][pIdcs_CTD002[0]]))
Ns1 = dsN.sel(time=slice(pDf18['ascent_start'][pIdcs_CTD006[0]], pDf18['ascent_end'][pIdcs_CTD006[0]]))

# compare the two shallow casts a[0] and a[1]
fig, axs = plt.subplots(2, 1, figsize=(8, 10), tight_layout=True)
axs[0].scatter(a[0]['discrete nitrate'], a[0]['ctd depth'], color='m', s=400.)
axs[1].scatter(a[1]['discrete nitrate'], a[1]['ctd depth'], color='g', s=400.)
axs[0].set(ylim = (240., 0.), title='First Cast + Profiler: Nitrate')
axs[1].set(ylim = (240., 0.), title='Second Cast + Profiler')

# Modify four elements of these two lines: Xs0, Xs0, Xs1, Xs1
# Modify the sensor type (e.g. .temp) to reflect X
axs[0].scatter(Ns0.nitrate, -Ns0.z, color='b', s=49.)
axs[1].scatter(Ns1.nitrate, -Ns1.z, color='b', s=49.)

print()

NameError: name 'dsN' is not defined

In [15]:
# compare the two shallow casts a[0] and a[1]
fig, axs = plt.subplots(figsize=(15, 10), tight_layout=True)
axs.plot(a[2]['temp1'], a[2]['ctd depth'], linewidth=6, color='m')
axs.plot(a[2]['temp2'], a[2]['ctd depth'], color='k')
axs.plot(a[3]['temp1'], a[3]['ctd depth'], linewidth=6, color='g')
axs.plot(a[3]['temp2'], a[3]['ctd depth'], color='y')
axs.set(ylim = (3000., 0.), title='Cast and Profiler Temperatures'); 
print()

NameError: name 'plt' is not defined

In [16]:
# This cell gives us an idea (for a given time interval, in months) how much of the time the shallow profiler was working
# Here it is configured for June+July, 2018. The shallow profiler ran for the end of June and much of July

year = '2018'
month = '06'
next_month = '08'
t0, t1 = dt64(year + '-' + month + '-01'), dt64(year + '-' + next_month + '-01')
nDays = (t1 - t0).astype(int)
nTotal, nMidn, nNoon = ProfileEvaluation(t0, t1, pDf18)

print("For " + year + " and month " + month + ': There are ')
print(nDays, 'days or', nDays*9, 'possible profiles')
print("There were, over this time, en fait:")
print(nTotal, 'profiles;', nMidn, 'at local midnight and', nNoon, 'at local noon')
print()


NameError: name 'dt64' is not defined

### Notes from conv with RCA Data Manager


The goal here is to improve the discrete comparison above by looking at the continuous signals 
from the Rosette casts. 


Procedure: Navigate the alfresco server to get two files associated with each cast.


- cabled array
- cruise data
- ship data
- CTD data


All of the files that come off the CTD are here.
Recall OOI uses "streams" for fine-grain signals. 
For example DO or Chlorophyll streams would be tied to the Rosette CTD, i.e. plugged in as an analog input. 
The voltages are recorded in a `.hex` file.
There are two tasks: Identifying the right voltage and converting it to a physical parameter value.
For the latter the software we use (free from SEABIRD) is called SeaCalc. It runs on WIndows and 
makes use of a metadata file with extension `.XMLCON`.


```
TN382_CTD009.XMLCON
TN382_CTD009.hex
```


The output will be a CSV-like file with considerable metadata at the top.


The voltages are digitized at some steady sampling rate, more on this below, so they have a
sampling interval, a UTC start time, and a sample index. 


Sidebar on Niskin bottles firing: The system is designed to clip segments of continuous data
via index that correspond to 30 seconds or so at the time of a Niskin bottle sample capture.
Hence each Niskin bottle is associated with two scan indices for a continuous sample range.
When the data are sub-sampled according to these start/stop indices a `.bl` (for bottle)
file is produced. That's what we are vaulting past with this procedure.


Sidebar on casts: The downcast is the rosette dropping into pristine water; so that is 
considered qualitatively better data from the continuous sensors (CTD, fluorometer etc).
The up-cast -- with pauses to equilibrate -- is when the Niskin bottles fire. There are 
some additional arcane procedures associated with fine tuning the data (out of scope).


- `.hex`, `.XMLCON` > SeaCalc > `out.cnv`
- Run > Data Conversion > Wizard
- One hex file per cast
- Select output variables of interest


Output file will consist of a lot of header information and a CSV-like data matrix.
`NMEA UTC time` is the data start time. Use `interval/seconds` to calculate sample
time; so for example 24 Hz corresponds to 0.041 second intervals and each scan line
follows this interval. 
