<!--QC_INFORMATION-->
<img align="left" style="padding-right:10px;" width="250" src="_code_software/img/Marine_logo.jpg"></br>
<h1><center>Ocean Science - CTD Profiles Processing notebook</center></h1>
<span style='color:Blue'>  

### The following Notebook holds a self contained SOP throughout

### There are two main types of Jupyter Notebook cells:
- Markdown cells (like the one you are reading) display human readable text
- Code cells contain blocks of Python code used for specific computing tasks    
### This notebook is operated by running a series of code cells
### Users are encouraged to run each code cell individually using 'Run' in the toolbar above
### Actual code cells can be hidden (default) or displayed through the toggle that will appear when you run the first cell
### See below for some hints and tips
- No prior knowledge of Python is required to run these Notebooks
- Running cells and running the notebook itself will quickly become famaliar to new users
- Hidden or invisible code cells can be selected by clicking just underneath the preceeding markdown cell
- Minimal adjusting of folder directory paths within the code is required on the first run only
- When a cell has just ran, the position of the Notebook may jump up or down, scrolling to reorientate yourself will quickly become routine
- When a cell is still running (some are slow) an egg-timer will replace the book icon on the very top bar of the browser
    - Also, if a cell is running it will have a * inside the cell identifier to the top left of that cell: i.e. In [*]:
    - if the cell has not yet run it will be empty: In [ ]:
    - if the cell has just completed running, it will have the latest run number displayed, e.g. In [27]:
### A small number of cells will require or accept user input through a dropdown interface
### Note: most code cells, when run, will output relevant information underneath them
        
</span>


## Checklist for CTD operator in preparation for, and during the survey:

1. Get copies of the sensor calibration sheets for all sensors on the CTD for the duration of the survey.
2. Check the master XMLCON file has been populated with the correct calibration information.
3. Populate a CTD logsheet and water sampling logsheet for every cast.
4. Follow the CTD data acquisition checklist/SOP during each cast.
5. If deployed on the CTD, download the SBE35 after each cast. Include the CTD ID/number in the filename e.g. SBE35_{CRUISEID}_CTD{#}.asc.
6. Add logsheet information to Excel template and name {CRUISEID}_Log.xlsx.
7. At the end of the survey scan all CTD logsheets and bottle firing logsheets and save in the logsheets folder as a pdf.

## Processing overview
<img align="centre" src="_code_software/img/overview.jpg">

   
## The first cell carries out a number of tasks:
- It loads all Python tookboxes needed by the Notebook as follows:

    Toolbox versions that software was developed to use:
    - json 2.0.9
    - ipywidgets 8.0.4
    - numpy 1.24.3
    - pandas 1.5.3
    - re 2.2.1
    - seawater 3.3.4
    - bokeh 3.1.0
    - xlrd 2.0.1

 
- It sets location of the master SeaBird .psa templates used to create the PSA files for the different SBE processing steps. This is hardcoded. Please do not change.
- It provides a toggle switch to hide or display the Python code cells
- Once run it requests user input as described below:
   
### Set cruise name
Insert Cruise ID below. Cruise ID should be in the format CXxxxxx

### Set variable "proc_mode" to indicate file conversion mode: 
    0 = process all files through Sea-Bird application; 
    1 = only process new files through Sea-Bird application.

### Set variable "heave_mode" to indicate if heave function is required, heave function is strongly recommended for most cases
    0 = Run Heave function [Default/Recommended]; 
    1 = Bypass the Heave function.
    
### Set default oxygen alignment value
As the CTD package profiles the water column, a given parcel of water passes over the oxygen sensor slightly later than the temperature and conductivity sensors. Therefore values returned from the sensors are slightly out of step for the timestamp in the data file. Experience suggests for the MI CTD rig setup a value of 2 seconds aligns the oxygen sensor outputs with the temperature and conductivity.
This will be visually checked during processing and can be adjusted at that stage, where appropriate.

### Are there stations that have part-casts to be combined?

	If so, provide a dictionary of the form:
	combined = {'initial cast CNV filename' : {'CNV file to be merged','cast status','CNV file to be merged','cast status'} } See commented out example below. 

	combined = {'CE21003_CTD002.CNV': {'CE21003_CTD002B.CNV': 'D/U','CE21003_CTD002C.CNV': 'U'}}

	The above dictionary drives the function combine_files2cast(data_all,combined) to rename the profile name for files 'CE21003_CTD002B.CNV' & 'CE21003_CTD002C.CNV' to 'CE21003_CTD002.CNV'.

	If no files require merging leave line commented out (default).

### When starting a new profiles Notebook, for example for a new survey not processed before, it is good practice to clear the memory first
### Do this by clicking 'Kernel' on the toolbar at the top of the browser page, then click 'Restart & Clear Output'
### To get started with the Notebook, run this first cell below by clicking anywhere on it and clicking 'Run' on the toolbar above
- And best of luck! :)

In [None]:
                                                                        ### Cell 1 ###

# Remove warnings being displayed when dashboard in production - comment out next 2 lines during dev and testing
import warnings
warnings.filterwarnings('ignore')
import json
import numpy as np
import os
import pandas as pd
import re
import sys
import xml
import bokeh
import shutil
import xml
import yaml
import gc
import time
import platform
from pathlib import Path


from IPython.display import display, HTML
import chevron               ## If not installed open the Anaconda prompt and run >> conda install -c conda-forge chevron
import ipywidgets as widgets ## If not installed using the Anaconda prompt run >> conda install -c conda-forge ipywidgets
import seawater              ## If not installed using the Anaconda prompt run >> conda install -c conda-forge seawater
import xlrd                  ## If not installed using the Anaconda prompt run >> conda install -c conda-forge xlrd
from typing import Dict

# Import bokeh functions
from bokeh.plotting import show
from bokeh.io import output_notebook, push_notebook
from bokeh.resources import INLINE

# Import bespoke functions
import scripts.ctd as ctd
import scripts.ctd_bokeh as ctd_bokeh
import scripts.sensor_configuration as sensor_configuration
import scripts.data_processing as data_processing
import scripts.bottle_processing as bottle_processing
import scripts.calculations as calculations
import scripts.seabird_processes as seabird_processes

# Print toolbox versions to screen
#print("Toolbox version:")
#print('\n'.join(f'{m.__name__} {m.__version__}' for m in globals().values() if getattr(m, '__version__', None)))

# Print PSA folder location to screen
wd = os.getcwd()
PSA_template_folder = os.path.normpath(os.path.join(wd,'psa_templates'))
print("PSA master template folder is : %s\n" % PSA_template_folder)

# Set drop down options
proc_dict: Dict[str, int] = {'Process all files through Sea-Bird application' : 0,
             'Only process new files through Sea-Bird application' : 1}

heave_dict = {'Run Heave function [Default/Recommended]' : 0,
              'Bypass the Heave function': 1}

# Define widgets
# base data folder
base_path_widget = widgets.Text(
    placeholder='Base directory path',
    description='Base directory path',
    disabled=False,
    value = Path.cwd().joinpath("CTD").as_posix(),
    style = {'description_width': '15%'},
    layout = widgets.Layout(width='1000px')
)
# Cruise ID - string
cruiseID_widget = widgets.Text(
    placeholder='Type Cruise ID here',
    description='Cruise ID:',
    value="BU24",
    disabled=False,
    style = {'description_width': '50%'}
)

# Processing Mode - int;
proc_mode_widget = widgets.Dropdown(
    options = list(proc_dict.keys()),
    description = 'Proc Mode:',
    disable = False,
    value = list(proc_dict.keys())[-1],
    style = {'description_width': '30%'},
    layout = widgets.Layout(width='500px'))

# Heave Mode - int;
heave_mode_widget = widgets.Dropdown(
    options = list(heave_dict.keys()),
    description = 'Heave Mode:',
    disable = False,
    value = 'Bypass the Heave function',
    style = {'description_width': '30%'},
    layout = widgets.Layout(width='500px'))

# Oxygen Alignment - int
# Any number between 0 and 10
oxygen_alignment_widget = widgets.Dropdown(
    options = list(str(x) for x in range(0,10)),
    description = 'Oxygen Alignment:',
    value = '2',
    disable = False,
    style = {'description_width': '50%'}

)

# Combined Widget - this allows a user to combine casts which have may been split into multiple files due to technical difficulties
combined_widget = widgets.Text(
    placeholder='Provide dictionary of files for combining here when cast logging split',
    description='Files for combining:',
    disabled=False,
    style = {'description_width': '15%'},
    layout = widgets.Layout(width='1000px')
)

# Bin Unit - metre or decibar
bin_unit_widget = widgets.Dropdown(
    options = ['metre','decibar'],
    description = 'Binning unit:',
    disable = False,
    style = {'description_width': '50%'})

# Combine all widgets and display them
display(widgets.VBox([base_path_widget, cruiseID_widget, proc_mode_widget, heave_mode_widget, oxygen_alignment_widget, combined_widget, bin_unit_widget]))

# The following code will hide all code from the user 
# A button will appear under this cell that will allow you to toggle on and off the code
# display(HTML("<style>.container { width:100% !important; }</style>"))
# HTML('''<script>
# code_show=true;
# function code_toggle() {
# if (code_show){
# $('div.input').hide();
# } else {
# $('div.input').show();
# }
# code_show = !code_show
# }
# $( document ).ready(code_toggle);
# </script>
# <form action="javascript:code_toggle()"><input type="submit" value="Click here to toggle on/off the raw code"></form>
# ''')

## Run next cell to implement user inputs and to set working directories

### Sets up folders within the working directory if they do not already exist and sets variables for folder paths
Folder structure contents:
<li> <b>psa</b> </li>
- Contains the PSA files to configure the SBE processing steps. These files are generated by this script from the PSA templates in the master folder named previously.

<li> <b>raw_files</b> </li>
    - Copy of the SBE files here (.HEX, .XMLCON, .BL, .ROS, etc) for the casts to be processed. Exclude any deck test files.

<li> <b>logsheets</b> </li>
    - Where a log file has been created this folder should contain an Excel file with a digitised events log, CTD logsheets and bottle sampling logsheets. File should be named "CRUISEID_Log.xlsx".

<li> <b>SBE35</b> </li>
    - When a SBE35 temperature logger has been deployed on the CTD rig, one file per cast should be placed here. Filename should follow convention "SBE35_CRUISEID_CTDxxx.txt".

<li> <b>cal_samples</b> </li>
    - Water sample salinity and Winkler oxygen data files with Bedford Numbers to identify samples.

<li> <b>output</b></li>- final binned and calibrated data files merged with metadata (datetime, position, station name, etc). Sub-folders:


- bottle - contains SBE generated .BTL files and contatenated bottle firing details with metadata as CSVs.

- screen_2Hz - contains SBE processed .CNV files at 2Hz resolution after DATACNV, FILTER, WILDEDIT, CELLTM and BINAVG routines

- all_2Hz - contains notebook processed data saved as CSV files after each major processing step (SBE processing - sensor alignment and derived measurement recalculation - auto heave flagging)

- plots - folder contains plots output after processing for cruise report section or use during the cruise.

<span style='color:Blue'>  

### On initial setup and first run, need to adjust the directory or 'filepath' within the following code cell
### If running from a local machine (or network) update/enter the full directory address found between the single quotation marks on the first filepath = os.path.normpath... line
### Using the '#' symbol to comment/uncomment out a line, ensure only the correct: filepath = os.path.... line is uncommented (without a # in front).

### If running from one of the vessel's CTD PC machines, uncomment only the final filepath = os.path.... line
</span>

In [None]:
                                                                                ### Cell 2 ###
    # Set cruise name as variable here
cruiseID = cruiseID_widget.value
base_path = base_path_widget.value

if cruiseID == '':
    
    print("\033[1;31m*** You have not provided a Cruise ID, please enter Cruise ID ***\033[0m")
else:
    print("cruiseID: %s\n" % cruiseID)

    # Set variable to indicate conversion mode: 
    #    0 = process all files through Sea-Bird application; 
    #    1 = only process new files through Sea-Bird application.

    proc_mode = proc_dict[str(proc_mode_widget.value)]

    if proc_mode == 0:
        print("All profiles will be processed, even if previously converted by SBE processing.")
    elif proc_mode == 1:
        print("Only profiles that haven't been previously processed will be run through the SBE software conversion steps.")
    else:
        raise IOError("\033[1;31mError: value for proc_mode is not valid. Please update to 0 or 1 as appropriate.\033[0m")

    # Set variable to indicate if Heave function is required [Heave function is strongly recommended for almost all cases]: 
    #    0 = Run Heave function [Default/Recommended]; 
    #    1 = Bypass the Heave function.

    heave_mode = heave_dict[str(heave_mode_widget.value)]

    if heave_mode == 0:
        print("All profiles, under proc_mode selection, will undergo heave flagging.")
    elif heave_mode == 1:
        print("Heave flagging turned off for this run.")
    else:
        raise IOError("\033[1;31mError: value for heave_mode is not valid. Please update to 0 or 1 as appropriate.\033[0m")

    oxy_align_default = oxygen_alignment_widget.value

    print("Oxygen sensor alignment default value: %s" % oxy_align_default)

    combined = combined_widget.value

    if combined=='':
        print("No files listed for merging.")
        combined = None
    elif isinstance(yaml.load(combined, Loader=yaml.Loader), dict):
        print("Files listed for merging: %s" % combined)
        combined = yaml.load(combined, Loader=yaml.Loader)
    else:
        raise IOError("Please update 'Files for combining field above. Either leave blank or provide a dictionary.'")
        combined = None
    print("Data will in binned to 1 %s\n" % bin_unit_widget.value)

    
    # Set cruise working directory
    pyear = '20'+cruiseID[2:4]
    pvessel = cruiseID[0:2]
    filepaths_dictionary = {'CE': os.path.join(f'{base_path}'),#, pvessel, pyear, cruiseID),
                            'TC': os.path.join(f'{base_path}')#, pyear, cruiseID)
                           } ### Need to check/change these to appropriate addresses

                        ### Change path to local Jupyter Notebook working drive directory
    filepath = os.path.normpath(os.path.join(f'{base_path}'))#,pvessel,pyear,cruiseID))  ## local directory on a local computer [adjust as required]

    print("Working directory is: %s" % filepath)

    backup_server_path = os.path.normpath(os.path.join('Z:/2.1 Oceanographic', cruiseID, f'{cruiseID}_processed_CTD_data_BACKUP'))

    print('Backup directory is: %s' % backup_server_path)

    # Set the directory filepaths as variables
    directories = {'raw': 'raw_files', 'logs': 'logsheets', 'sbe35_raw': 'SBE35', 'cals': 'cal_samples', 'psa': 'psa', 'out': 'output'} 
    directories_out = {'bottle': 'bottle', 'screen_2Hz': 'screen_2Hz', 'all_2Hz': 'all_2Hz', 'plots': 'plots'}

    def dir_path(path,name):
        d = os.path.join(path,name)
        if not os.path.exists(d):
            os.mkdir(d)
        return d

    #REF: this could be more efficient -> use directory directly?
    print("Processing directory folder locations:")
    for k,i in directories.items():
        directories[k] = dir_path(filepath, i)
    for k,i in directories_out.items():
        directories_out[k] = dir_path(directories.get("out"), i)
        print(directories_out[k])
    out = directories.get("out", "")
    raw = directories.get("raw", "")
    logs = directories.get("logs", "") 
    psa = directories.get("psa", "")
    sbe35_raw = directories.get("sbe35_raw", "")

    bottle = directories_out.get("bottle", "")
    screen_2Hz = directories_out.get("screen_2Hz", "")
    all_2Hz = directories_out.get("all_2Hz", "")
    plots = directories_out.get("plots", "")
    # Check if files present in the raw files working directory
    if len(os.listdir(raw))>0:
        count, countn = 0, 0
        for filename in os.listdir(raw):
            #print(filename)
            if cruiseID in filename.upper():
                count += 1
            else:
                countn += 1
               
        if countn != 0:
            print('\n\033[1;31m*** Check filenames. Sea-Bird files present in the raw_files directory file do not follow filename convention <CRUISE>_CTD<NUMBER> ***\n\033[0m')
        else:
            print('\nFiles present in raw_files directory: %s' % count)
        
    else:
        print('\n\033[1;31m*** No Sea-Bird files present in the raw_files directory. If running the notebook for the first time for this cruise, before proceeding copy across the raw SBE files to the "raw_files" folder in the working directory. ***\n\033[0m')

####################################################################################################################################        
# Get HEX filenames
####################################################################################################################################

filelist = os.listdir(raw)
hexfilelist = []

for item in filelist:
    if item.endswith(".hex"):
        item = item.split('.')
        hexfilelist.append(item[0].upper())
print('\tNumber of HEX files available in cruise folder: %s' % (len(hexfilelist)))
        
####################################################################################################################################        
# Extract metadata from HDR files
####################################################################################################################################
print("\nExtracting cast metadata from the header information for each cast for reference.")
df_NMEA = data_processing.get_NMEA_from_header(raw,'hdr')

# Check all fields populated
print('\tNumber of HDR files available in cruise folder: %s' % (len(df_NMEA)))
df_missingNMEA = df_NMEA.isnull().sum()

for item in ['Lat','Long','Upload Time','UTC Time']:
    if df_missingNMEA[item]!=0:
        counts = df_missingNMEA[item]
        print("\033[1;31mACTION *** %s missing in %s HDR files *** Ensure %s entered into logsheet from paper logs for:\033[0m" % (item, counts, tem))
        print(df_NMEA[df_NMEA[item].isnull()]['CTD number'].tolist())
    else:
        print("\t%s present in all HDR files" % (item))


####################################################################################################################################
# Check if a logsheet has been populated and provided for the cruise
####################################################################################################################################

logsheets = os.path.join(logs,'%s_Log.xls' % cruiseID)
if os.path.exists(logsheets):
    # Load CTD event metadata from logsheets
    print('\nLogsheet file saved in archive. Loading metadata from the logsheet file.')
    ctd_log = pd.read_excel(logsheets, 
                               sheet_name='CTD logs',
                               usecols = "C,D,E,F,H,O,P,R")
    ctd_log.columns = ['Cruise','Event number','CTD Cast number','Standard Station Name','CTD number',                        
                       'Latitude [degrees_north]','Longitude [degrees_east]','Bot. depth [m]']
    ctd_log['Event number'] = ctd_log['Event number'].astype(int)
    ctd_log['CTD number']  = ctd_log['CTD number'].str.upper()

    print("\tNumber of CTD events in logsheet: %s" % len(ctd_log))
    
    # Check cruise matches ID provided for processing and is unique within the logsheet
    log_cruise_values = ctd_log['Cruise'].unique().tolist()
    if len(log_cruise_values)!=1:
        print("\033[1;31m\nACTION *** Multiple cruises in the logsheet. Please check for typos or split logsheet by cruises.\033[0m")
        print("\tCruises listed in logsheet: %s" % log_cruise_values)
    else:
        if log_cruise_values[0] != cruiseID:
            print("\033[1;31m\nACTION *** Cruise recorded in the logsheet does not match the cruise ID provided for processing. Please correct the logsheet.\033[0m")
            print("\tCruise listed in logsheet: %s" % log_cruise_values[0])
    
    # Check CNV filenames against filenames in logsheet
    a = list(set(hexfilelist) - set(ctd_log['CTD number'].unique().tolist()))
    b = list(set(ctd_log['CTD number'].unique().tolist()) - set(hexfilelist))
    
    if len(a) !=0 or len(b) != 0:
        print('\033[1;31m\nACTION *** Discrepancy in available CTD metadata between the Log and the header files. ***\033[0m')
        print("\tFilenames not in the CTD logsheet \t\t\t\t%s" % a)
        print("\tCTD logsheet filenames without files in the raw data folder \t%s" % b)

else:
    print('Logsheet file not provided.')


## This cell below generates a list of sensors, voltage channels and sensor calibration coefficients by cast from the configuration files

### Before starting CTD processing review list of sensors:
- Have sensors been switched between channels?
- Are serial numbers consistant or show changes during the cruise?
- Are all sensors you expect to see in the list?
- Does this match what you expect to see?

### Also perform a sensor calibration coefficients check:
- Are the coefficients for each sensor consistent across the casts? 
- If they change does this match records of instrument changes from the CTD technician?

In [None]:
                                                                                ### Cell 3 ###
sensor_configuration_and_labels = sensor_configuration.sensor_config(raw, cruiseID)
   
df_cast_sensors = sensor_configuration_and_labels['cast_sensors']
master_sensor_df = sensor_configuration_and_labels['cast_labels']

display(master_sensor_df)

master_sensor_table = os.path.join(out,'master_sensor_table.csv')
master_sensor_df.to_csv(master_sensor_table)

master_sensor_coeffs = {'OxygenSensor': ['Soc','offset','A','B','C','D0','D1','D2','E','Tau20','H1','H2','H3'],
                        'TurbidityMeter': ['ScaleFactor','DarkVoltage'],
                        'WET_LabsCStar': ['M','B','PathLength'],
                        'TransWetlabAC3Sensor': ['Ch2o','Vh2o','VDark','X'],
                        'FluoroWetlabWetstarSensor': ['ScaleFactor','Vblank'],
                        'AltimeterSensor': ['ScaleFactor','Offset'],
                        'FluoroWetlabECO_AFL_FL_Sensor': ['ScaleFactor','Vblank'],                      
                       }

sensor_counts = sensor_configuration.get_sensor_coefficients(master_sensor_coeffs = master_sensor_coeffs,
                                                          df_cast_sensors=df_cast_sensors,
                                                          raw_data_directory=raw,
                                                          output_directory=out)          
display(sensor_counts)
print("Sensor coefficient summaries saved to: %s" % out)

### Cell 10 ###
#check if psa file is present from previous run - may affect file list in driver.txt and running of Seabird software
print(psa)
psanums=[]
for fname in os.listdir(psa):
    if fname.endswith('.psa'):
        psanums.append(fname)
        os.remove(os.path.join(psa,fname))

if len(psanums) > 0:
    print('There were '+ str(len(psanums)) + ' psa files in the psa folder. These have now been removed prior to the fresh batch run.\n') 
else:
    print('No psa files in psa folder. All good to proceed.\n')

data = ctd.generate_psa_files(sensor_counts=sensor_counts,
                              PSA_template_folder = PSA_template_folder,
                              raw=raw,
                              screen_2Hz=screen_2Hz,
                              bottle=bottle,
                              psa=psa,
                              proc_mode=proc_mode)

print("\nIf hex or xmlcon files are missing further investigation and intervention required.")
print("\n\tCasts without .HEX file: %s" % len(data.hexMissing))
print("\tCasts missing .HDR file only: %s" % len(data.headerMissing))
print("\tCasts missing .BL file only or .BL file is empty: %s" % len(data.blMissing))
print("\tCasts missing both .HDR and .BL file: %s" % len(data.headerANDblMissing))
print("\tCasts with .HDR and .BL file: %s\n" % len(data.headerANDblPresent))
print("\tCasts with .CNV file: %s\n" % len(data.cnvPresent))



## Initial data conversion and bottle file creation
<img align="centre" src="_code_software/img/part1.jpg">


### The following cell creates a batch file for running SeaBird Data Processing software:

It runs the Data Conversion, Bottle Summary, WildEdit, Filter and CellTM routines from the SBE software, as called from the command line. In the last step BinAvg creates a 2Hz version of the data are created for provisional plotting to identify the surf soak and end of cast, as well as determine the alignment of the oxygen sensor (more details in the next step)
The cell then calls on and runs the SeaBird data processing modules to batch process the data. These seabird GUI 'windows' might pop up on your screen or be seen as active on your taskbar
- It is required that a relatively new version of Seabird Data Processing Software be installed on the machine running the Notebooks

### Below is the Sea-Bird Data Processing PSA file default setup for MI Ocean Climate cruise CTDs
#### Data Conversion
This SBE Data Processing module converts the raw files (.HEX) to ASCII and applies calibrations as provided by the instrument configuration file (.XMLCON). Before running this stage:
1. The XMLCON file must be checked for accuracy and updated as required.
2. All sensor outputs should be converted to engineering units as recommended by the manufacturer.
3. Output at full 24 Hz resolution.

The setup for the Data Conversion module applied in this script is shown here:

<img align="centre" src="_code_software/img/datconv.jpg">

#### Bottle Summary
This module calculates sensor values covering the period of bottle firing on the up cast for each bottle on the rosette.

The setup for the Bottle Summary module applied in this script is shown here:

<img align="centre" src="_code_software/img/botsumm.jpg">

#### Wild Edit
This module should be run to remove pressure spikes if present. Pressure spikes can be identified from the pressure vs time plots. Wild Edit must be run on pressure **only** and before Filter as pressure spikes can cause Filter to smooth data incorrectly.

Note that if a data file is particularly corrupted, WildEdit might need to be run more than once, with different block sizes nd number of standard deviations.

#### Filter
Filter must be run on the pressure channel before any editing is carried out. Filter smoothes out response-time issues in the sensors, which may affect processing at later stages, such as for CellTM. The typical filter time constant is equal to fours times the scan rate. For the SBE911<i>plus</i> pressure sensor this is 0.15 seconds.

#### Cell Thermal Mass
Cell Thermal Mass filters conductivity cell thermal mass effect from the measured conductivity. The recommended Sea-Bird settings should be applied to both primary and secondary sensors.

For SBE9<i>plus</i> with TC duct and 3000 rpm pump the recommendations are as follows:<br>
alpha = 0.03<br>
1/beta = 7

The setup for respective modules applied in this script are shown here:
<img align="centre" src="_code_software/img/phase1.jpg"> 
### Order of processing steps follow previous work carried out by BODC and associated scientists

### This cell also carries out aditional steps as follows:
- determine the down and up-cast components of a profile and collate all profiles into single CSV file,
- merge multiple files from a cast where data loggin interupted resulting in multiple files for the cast.
- determine the start of the down-cast (minimum depth/pressure of the down-cast after the pump has switched on).

In [None]:
                                                                                ### Cell 4 ###

if len(os.listdir(psa))>0:    
    # Set the batch processing driver file
    driver1 = os.path.join(filepath,'driver1.txt')

    # Add processing stages to the driver file
    drv = open(driver1,'w')

    with open(driver1, 'w') as drv:
        # Check for each psa individually. Add processing stages to the driver file
        for item in os.listdir(psa):
            if any(x in item for x in('datcnv_headerANDblPresent','datcnv_headerANDblMissing','datcnv_headerMissing','datcnv_blMissing' )):        
                drv.write(f"datcnv -Y /p{os.path.join(psa,item)}\n")
        for item in os.listdir(psa):
            if 'MI_botsum' in item:
                drv.write(f"bottlesum /p{os.path.join(psa,item)}\n")       
        for item in os.listdir(psa):
            if 'wildedit' in item:
                drv.write(f"wildedit /p{os.path.join(psa,item)}\n")
        for item in os.listdir(psa):
            if 'filter' in item:
                drv.write(f"filter /p{os.path.join(psa,item)}\n")
        for item in os.listdir(psa):
            if 'cellTM' in item:
                drv.write(f"cellTM /p{os.path.join(psa,item)}\n")
        for item in os.listdir(psa):
            if 'binavg2Hz' in item:
                drv.write(f"binavg /p{os.path.join(psa,item)}\n")

    # Run SBE batch using driver file
    platform_os = platform.system()
    if platform_os == "Linux":
        os.system("wine sbebatch %s %s %s %s" % (driver1,raw,bottle,screen_2Hz))
    else:
        os.system("sbebatch %s %s %s %s" % (driver1,raw,bottle,screen_2Hz))

    print("Check for successful completion of batch processing")
    
else:
    print("Processing mode set to new files only. No new files to be processed.")

# Concatenate profile data into a csv file and flag data as either down or up cast (adds column 'cast' to the dataframe and populates as 'D' or 'U' based on being before or after the maximum depth).
files = os.listdir(screen_2Hz)
profile_casts = [w.upper() for w in files]

# Set output file for concatenated 2Hz data
all_2Hz_csv = os.path.join(all_2Hz,'cruise_SBEproc_2Hz.csv')
# determine if 2Hz data file already exists
all_2Hz_file_exists = os.path.exists(all_2Hz_csv)

data_all = pd.DataFrame()
if proc_mode == 0 or all_2Hz_file_exists==False:
    data_all = data_processing.cnv2df(cruiseID,
                                      files,
                                      params=[],
                                      raw_folder = raw,
                                      directory = screen_2Hz,
                                      ud_id = True,
                                      z_cord = 'prDM',
                                      )
    print("Number of CTD events loaded from processed files: %s" % len(data_all["profile"].unique().tolist()))
    print("Number of data rows loaded from processed files: %s" % len(data_all))
    
    # Merge split files from single cast. Uses the dictionary "combined" set up at the start of the file.
    data_all = data_processing.combine_files2cast(data_all,combined,3)

    # Save aggregated DataFrame for archive
    data_all.to_csv(all_2Hz_csv)
    df_profile = data_all.copy(deep=True)
       
elif proc_mode == 1 and all_2Hz_file_exists==True:
    all_2Hz_existing = pd.read_csv(all_2Hz_csv, index_col = 0)
    all_2Hz_casts = all_2Hz_existing['profile'].unique().tolist()
    #combined = {'CE21003_CTD002.CNV': [{'CE21003_CTD002B.CNV': 'U'}]}
    if combined != None:
        for item in combined.keys():
            if item in all_2Hz_casts:
                for xitem in combined[item].keys():
                    all_2Hz_casts.append(xitem)
    new_casts = list(set(profile_casts) - set(all_2Hz_casts))
    
    if len(new_casts)>0:
        files_new = [w.lower() for w in new_casts]
        data_add = data_processing.cnv2df(cruiseID,
                                          files_new,
                                          params=[],
                                          raw_folder = raw,
                                          directory = screen_2Hz,
                                          ud_id = True,
                                          z_cord = 'prDM',
                                          )
        print("Number of new CTD events loaded from processed files: %s" % len(data_add.profile.unique().tolist()))
        print("Number of new data rows loaded from processed files: %s" % len(data_add))
        data_all = pd.concat([all_2Hz_existing, data_add], ignore_index=False, sort=False).sort_index()
        
        # Merge split files from single cast. Uses the dictionary "combined" set up at the start of the file.
        data_all = data_processing.combine_files2cast(data_all,combined,3)

        # Save aggregated DataFrame for archive
        data_all.to_csv(all_2Hz_csv)
        df_profile = data_all.copy(deep=True)
    else:
        print("No newly processed casts for concatenation to existing data file.")
        df_profile = all_2Hz_existing.copy(deep=True)
        
######## determine the start of the down-cast (minimum depth/pressure of the down-cast after the pump has switched on).

# Get minimum downcast depth after the pump switches on
pump_csv = os.path.join(out,'pump_on_time.csv')
# determine if surface soak file already exists
pump_file_exists = os.path.exists(pump_csv)

# Determine whether previously saved pump and surface soak data are to be over-written
if proc_mode == 0 or pump_file_exists==False:
    pumpdf = data_processing.start_dcast(df_profile,'profile','prDM')
    pumpdf.to_csv(pump_csv)
    print("\nSurface soak details run for all processed casts and saved to: %s" % pump_csv)
elif proc_mode == 1 and pump_file_exists==True:
    pumpdf_existing = pd.read_csv(pump_csv, index_col = 0)
    pump_casts = pumpdf_existing['profile'].unique().tolist()
    new_casts = list(set(profile_casts) - set(pump_casts))
    if len(new_casts)>0:
        pumpdf_add = data_processing.start_dcast(df_profile[df_profile['profile'].isin(new_casts)],'profile','prDM')
        pumpdf_merged = pd.concat([pumpdf_existing, pumpdf_add], ignore_index=False, sort=False).sort_index()
        pumpdf_merged.to_csv(pump_csv)
        print("\nSurface soak details for newly processed casts appended to existing records and saved to: %s" % pump_csv)
    else:
        print("No newly processed casts for surface soak identification.")


## Plot data to check surface soak identification and interactively determine oxygen sensor alignment
<img align="centre" src="_code_software/img/part2.jpg">

## Run following cell to display surface soak and oxygen alignment dashboards
- If required, adjust the point at the end of the surface soak (per cast) by selecting a single point on plot and clicking 'Set Cast Start'
- If required, assess best oxygen alignment, but can only be applied per cruise and not on a cast by cast basis
- For your info: on Marine Institute cruises, with their specific oxygen sensor setup, experience shows that a 2 second allignment works throughout for every cruise.

In [None]:
                                                                                ### Cell 5 ###
# Load 2Hz CTD data
profile_data = os.path.join(all_2Hz,'cruise_SBEproc_2Hz.csv')
df_profile = pd.read_csv(profile_data)
df_profile = df_profile.rename(columns={'Unnamed: 0': 'Cycles'})

# Load pump on time for each cast
pumpdf = pd.read_csv(os.path.join(out,'pump_on_time.csv'))
pumpdf = pumpdf.rename(columns={'Unnamed: 0': 'Cycles'})

surface_soak_bokeh = ctd_bokeh.bokeh_layout(profile_data=df_profile,
                                           pump_data=pumpdf,
                                           output_path=out,
                                           downcast_data=None)

output_notebook(INLINE)
show(surface_soak_bokeh.surface_soak_screening, notebook_handle=True)
push_notebook()

if 'sbeox0Mm/L' in surface_soak_bokeh.param_list:
    # Set oxygen alignment value
    oxygen1_align = widgets.Text(value = oxy_align_default)
    print("Update oxygen sensor 1 alignment value (in seconds) here if appropriate:")
    display(oxygen1_align)
if 'sbeox1Mm/L' in surface_soak_bokeh.param_list:
    oxygen2_align = widgets.Text(value = oxy_align_default)
    print("Update oxygen sensor 2 alignment value (in seconds) here if appropriate:")
    display(oxygen2_align)

## Complete CTD processing steps

### The following cell performs the following tasks:
- load data from archived SDB processed to 2 Hz CSV file into a dataframe,
- if oxygen sensors are deployed, then advance oxygen voltage channel(s) by number of seconds determined from the interactive plots above (or the set defualt value),
- at the end of the up-cast, flag the 'cast' where pressure is less than 2 dbar (~height of the rig) to indicate rig breaking surface (flag = 'S'),
- load the down-cast start time (as seconds elapsed since data acquisition started) and flag the 'cast' to indicate the surface soak (flag = 'SS'),
- drop channels labeled 'NotInUse','pumps','Start',
- set temperatures outside range -5 to 40 degrees C as NaN,
- recalculate practical salinities, potential temperature, sigma-theta and sound velocity (EOS-80 toolbox),
- recalculate the oxygen concentration in umol/L and saturation using algorithms defined in SBE Application notes 64 and 64-3,
- save data to CSV file.


In [None]:
                                                                                ### Cell 6 ###
# Load data from csv
profile_data = os.path.join(all_2Hz,'cruise_SBEproc_2Hz.csv')

df = pd.read_csv(profile_data)

p_full_list = df.columns.tolist()

# Where oxygen sensor(s) deployed on the CTD rig align sensors based on visual check results
if 'sbeox0V' in p_full_list:
    try:
        oxy1_align_value = str(oxygen1_align.value)
    except NameError:
        oxy1_align_value = str(oxy_align_default)
    # Update the oxygen voltage alignment
    df['sbeox0V'] = df['sbeox0V'].shift(-2*int(oxy1_align_value))
    # Add the oxygen sensor voltage change {Sets the oxygen voltage dV/dt for use later in the oxygen calculations}
    df['oxy0dV/dt'] = df['sbeox0V'].diff()/df['timeS'].diff()

    print("Oxygen sensor 1 voltage alignment value: %s seconds" % oxy1_align_value)
    
if 'sbeox1V' in p_full_list:
    try:
        oxy2_align_value = str(oxygen2_align.value)
    except NameError:
        oxy2_align_value = str(oxy_align_default)
    # Update the oxygen voltage alignment
    df['sbeox1V'] = df['sbeox1V'].shift(-2*int(oxy2_align_value))
    # Add the oxygen sensor voltage change {Sets the oxygen voltage dV/dt for use later in the oxygen calculations}
    df['oxy1dV/dt'] = df['sbeox1V'].diff()/df['timeS'].diff()

    print("Oxygen sensor 2 voltage alignment value: %s seconds\n" % oxy2_align_value)

# Load profile start time from file
pumpdf = pd.read_csv(os.path.join(out,'pump_on_time.csv'))

pumpdf.columns=['Cycles','profile','Start','prDM']

print("Number of CTDs with surface soak identified: %s\n" % len(pumpdf))

# Add the start timeS for each profile from the manual inspection of the plot {Removes the surface soak and data where the frame is partially out of the water}
print("Number of rows in profile dataframe prior to start/end merge: %s" % len(df))
df = pd.merge(df, pumpdf[['profile','Start']], on = 'profile', how = 'inner')
df['profile'] = df['profile'].str.replace('.CNV','',regex=True)
print("Number of rows in profile dataframe after start/end merge: %s\n" % len(df))

# Set cast channel to indicate cycles where the time elapsed covers the surface soak ('SS') and the rig is close to the surface ('S').
# In this script the is cst is flagged 'S' where depth is less than 2m (approx. height of the CTD frame) at the end of the upcast.
surface_depth = 2
mask = (df['depSM'] < surface_depth) & (df['cast']=='U')
df.loc[mask,'cast'] = 'S'
mask = df.timeS < df.Start
df.loc[mask,'cast'] = 'SS'

# Set the columns to be kept {analgous to SBE Strip routine}
p_full_list = df.columns.tolist()

keep_cols = ['profile','cast']
for item in p_full_list:
    if item not in ['profile','cast','NotInUse','NotInUse.1','NotInUse.2','NotInUse.3','NotInUse.4','NotInUse.5','NotInUse.6','NotInUse.7','pumps','Start','Unnamed: 0']:
        keep_cols.append(item)

df = df[keep_cols].copy(deep=True)
df = df.rename(columns={'profile': 'CTD number'})

# Set temperature values to nan for outside range -5 to 40 degrees Celsius
if "t090C" in df:
    df.loc[df['t090C']<-5,'t090C']=np.nan
    df.loc[df['t090C']>40,'t090C']=np.nan

if "t190C" in df:
    df.loc[df['t190C']<-5,'t190C']=np.nan
    df.loc[df['t190C']>40,'t190C']=np.nan

## Rederive calculated channels
# Recalculate salinities (practical)
if set(['c0S/m', 't090C', 'prDM']).issubset(p_full_list):
    df['sal00'] = seawater.eos80.salt(df['c0S/m'].div(4.2914),df['t090C'],df['prDM'])
else:
    print("Salinity (sal00) not calculate as one of c0S/m, t090C an prDM not in file.")
if set(['c1S/m', 't190C', 'prDM']).issubset(p_full_list):
    df['sal11'] = seawater.eos80.salt(df['c1S/m'].div(4.2914),df['t190C'],df['prDM'])
else:
    print("Salinity (sal11) not calculate as one of c1S/m, t190C an prDM not in file.")

# Generate potential temperature
if set(['sal00', 't090C', 'prDM']).issubset(p_full_list):
    df['potemp090C'] = seawater.eos80.ptmp(df['sal00'],df['t090C'],df['prDM'],pr=0)
else:
    print("Potential temperature (potemp090C) not calculate as one of sal00, t090C an prDM not in file.")
if set(['sal11', 't090C', 'prDM']).issubset(p_full_list):
    df['potemp190C'] = seawater.eos80.ptmp(df['sal11'],df['t190C'],df['prDM'],pr=0)
else:
    print("Potential temperature (potemp190C) not calculate as one of sal11, t190C an prDM not in file.")

# Generate sigma-theta
if set(['sal00', 't090C', 'prDM']).issubset(p_full_list):
    df['sigma-theta00'] = seawater.eos80.pden(df['sal00'],df['t090C'],df['prDM'],pr=0) - 1000
else:
    print("Potential density (sigma-theta00) not calculate as one of sal00, t090C an prDM not in file.")
if set(['sal11', 't190C', 'prDM']).issubset(p_full_list):
    df['sigma-theta11'] = seawater.eos80.pden(df['sal11'],df['t190C'],df['prDM'],pr=0) - 1000
else:
    print("Potential density (sigma-theta11) not calculate as one of sal11, t190C an prDM not in file.")

# Generate sound velocity
if set(['sal00', 't090C', 'prDM']).issubset(p_full_list):
    df['svel00'] = seawater.eos80.svel(df['sal00'],df['t090C'],df['prDM'])
else:
    print("Sound velocity (svel00) not calculate as one of sal00, t090C an prDM not in file.")
if set(['sal11', 't190C', 'prDM']).issubset(p_full_list):
    df['svel11'] = seawater.eos80.svel(df['sal11'],df['t190C'],df['prDM'])
else:
    print("Sound velocity (svel11) not calculate as one of sal11, t190C an prDM not in file.")
    
# If oxygen sensor deployed on the CTD rig
if 'sbeox0V' in df.columns.tolist():
    # Recalculate oxygen concentration
    # Set holding columns
    df['sbeox0Mm/L'] = None
    
    # Read in oxygen sensor coefficients from 
    df_oxy0_coeffs = pd.read_csv(os.path.join(out,'sensor_coeffs_fulltable_OxygenSensor0.csv'))
    df_oxy0_coeffs.rename(columns = {'Unnamed: 0':'CTD number'}, inplace = True)
    df_oxy0_coeffs = df_oxy0_coeffs.set_index('CTD number')
    
    for index,row in df_oxy0_coeffs.iterrows():
        coef = row.to_dict()
        mask = df['CTD number']==str(index).upper()
        df.loc[mask,'sbeox0Mm/L'] = calculations.sbe43_oxycalc(df.loc[mask,'sbeox0V'], df.loc[mask,'t090C'], df.loc[mask,'prDM'], df.loc[mask,'sal00'], coef, df.loc[mask,'oxy0dV/dt'], 'umol/L')
    
    # Generate oxygen saturation
    df['sbeox0PS'] = df['sbeox0Mm/L'].div(calculations.oxysol(df['t090C'],df['sal00']).mul(44.66)).mul(100)

    # If second oxygen sensor deployed on the CTD rig
    if set(['sbeox1V', 't190C']).issubset(df.columns.tolist()):
        # Recalculate oxygen concentration
        # Set holding columns
        df['sbeox1Mm/L'] = None
                                 
        # Read in oxygen sensor coefficients from 
        df_oxy1_coeffs = pd.read_csv(os.path.join(out,'sensor_coeffs_fulltable_OxygenSensor1.csv'))
        df_oxy1_coeffs.rename(columns = {'Unnamed: 0':'CTD number'}, inplace = True)
        df_oxy1_coeffs = df_oxy1_coeffs.set_index('CTD number')

        for index,row in df_oxy1_coeffs.iterrows():
            coef = row.to_dict()
            mask = df['CTD number']==str(index).upper()
            df.loc[mask,'sbeox1Mm/L'] = calculations.sbe43_oxycalc(df.loc[mask,'sbeox1V'], df.loc[mask,'t190C'], df.loc[mask,'prDM'], df.loc[mask,'sal11'], coef, df.loc[mask,'oxy1dV/dt'], 'umol/L')
        # Generate oxygen saturation
        df['sbeox1PS'] = df['sbeox1Mm/L'].div(calculations.oxysol(df['t190C'],df['sal11']).mul(44.66)).mul(100)

    elif 'sbeox1V' not in df.columns.tolist() and 't90C' in df.columns.tolist():
        # Recalculate oxygen concentration
        # Set holding columns
        df['sbeox1Mm/L'] = None

        for index,row in df_oxy0_coeffs.iterrows():
            coef = row.to_dict()
            mask = df['CTD number']==str(index).upper()
            df.loc[mask,'sbeox1Mm/L'] = calculations.sbe43_oxycalc(df.loc[mask,'sbeox0V'], df.loc[mask,'t190C'], df.loc[mask,'prDM'], df.loc[mask,'sal11'], coef, df.loc[mask,'oxy0dV/dt'], 'umol/L')
        # Generate oxygen saturation
        df['sbeox1PS'] = df['sbeox1Mm/L'].div(calculations.oxysol(df['t190C'],df['sal11']).mul(44.66)).mul(100)

# Save profiles ready for screening
screen_file = os.path.join(all_2Hz,'cruise_screenready_2Hz.csv')
df.to_csv(screen_file, index=False)

print("Columns in data file:")
print(df.columns.tolist())
print('\nFile saved to: %s' % screen_file)

## Automated screen for heave entrainment features
Swell at the surface causes heave on the CTD wire while the rig is being lowered and raised through the water column. Sea-water trapped in the rig infrastructure from higher in the
water column can flush past the sensors on the base of the rig each time the rig decelerates and raises back through the water column. This is called entrainment.

Function "heave_flag()" iterates down the pressure channel (prDM) and flags:
- down-cast rows where velocity is below a user defined threshold (default vel = 0.2 m/s),
- a user defined window of cycles prior to the velocity threshold being reached (default window = 2),
- cycles where pressure is less than the first cycle below the velocity threshold for each entrainment/heave feature.

### The following input window allows users to enter preferred values, if differing from default
- If sea-swell was considerable or if CTD rig was lowered slower than recommended, (e.g. < 0.6 m/s), then lowering the 'Velocity minimum threshold' to say 0.15 or 0.1, will allow more data to be retained
- However, this must be balanced with preventing entrainment values from staying in the dataset. Entrainment readings are not valid and should be discarded in all but special cases.
- For special cases, e.g. cruises that do not take a profile, but simply sample at the surface, it is suggested to run the Notebook with the Heave Function switched off (see first cell).
- At present, there is not the ability to split the cruise, so having different thresholds per cast is not yet possible.

Results are then saved to file.


In [None]:
                                                                                ### Cell 7 ###

velocity_widget = widgets.BoundedFloatText(value=0.20,
                                           min=0,
                                           max=1.0,
                                           step=0.05,
                                           description='Velocity minimum threshold (m/s):',
                                           disabled=False,
                                           style = {'description_width': '75%'}
                                          )

window_widget = widgets.BoundedIntText(value=2,
                                       min=0,
                                       max=10,
                                       step=1,
                                       description='Window of cycles before threshold:',
                                       disabled=False,
                                       style = {'description_width': '75%'}
                                      )

display(widgets.VBox([velocity_widget, window_widget]))

## Visually display results of heave flagging function
Plots are generated for showing the heave flagging (marked in red) and can be reviewed per CTD cast
- If the velocity threshold is not sufficient, the previous cell (cell 7) can simply be re-run with a new threshold, followed by a rerun of the cell below to review the plots again 

N.B. The function iterates row by row, so will take some time depending on the number and depth of the casts being processed.

In [None]:
                                                                                ### Cell 8 ###
if heave_mode == 0:
    # User defined values for function
    vel = velocity_widget.value
    window = window_widget.value

    print("Velocity threshold = %s" % vel)
    print("Window = %s\n" % window)

    # Load data from file
    file_in = os.path.join(all_2Hz,'cruise_screenready_2Hz.csv')
    df = pd.read_csv(file_in)

    df = seabird_processes.heave_flagging(df,vel,window)

    file_out = os.path.join(all_2Hz,'cruise_heavescreened_2Hz.csv')
    df.to_csv(file_out, index=False)

    print("Heave flagging routine complete. File saved to: %s" % file_out)
    
    # Load 2Hz CTD data
    file_in = os.path.join(all_2Hz,'cruise_heavescreened_2Hz.csv')
    df_profile = pd.read_csv(file_in)
    df_dcasts = df_profile[df_profile['cast']=='D'][['CTD number','timeS','prDM','prDM_QC','t090C','sal00','CTDvel']]
    df_dcasts = df_dcasts.rename(columns={'Unnamed: 0': 'Cycles'})
    df_dcasts['prDM_QC'] = df_dcasts['prDM_QC'].astype(int).astype(str)
    
    heave_bokeh_layout = ctd_bokeh.bokeh_layout(profile_data=df_profile,
                                                pump_data=pumpdf,
                                                output_path=out,
                                                downcast_data=df_dcasts)
    
    output_notebook(INLINE)
    show(heave_bokeh_layout.heave_screening, notebook_handle=True)
    push_notebook()
    
elif heave_mode == 1:
    print("heave flagging turned off for this run")
    
else:
    print("\033[1;31mError: value for heave_mode is not valid. Please update to 0 or 1 as appropriate.\033[0m")

## Bin data excluding heave entrainment flagged rows 

This is the only step that erases data; all data after this point is retained and flagged where bad
    

In [None]:
bin_width_widget = widgets.BoundedFloatText(value=1.0,
                                           min=0,
                                           max=1000,
                                           step=0.05,
                                           description='Bin width',
                                           disabled=False,
                                           style = {'description_width': '75%'}
                                          )
display(widgets.VBox([bin_width_widget]))

In [None]:
                                                                                ### Cell 9 ###
print(f"Binning down cast to {bin_width_widget.value} {bin_unit_widget.value}")

binning_info_str = f"{str(bin_width_widget.value).replace(".", "_")}{bin_unit_widget.value}"
    
cast='D'
bin_dict = {'metre': 'depSM', 'decibar': 'prDM'}
zcord = bin_dict[str(bin_unit_widget.value)]
profile_id='CTD number'

            ### ED removing second o2 sensor variables from end of following list 21/06/2021
            ### RT re-added second O2 sensor variables 18/05/2022
params_out =[]
params_master = ['CTD number','depSM','prDM','t090C','c0S/m','t190C','c1S/m','sal00','sal11','potemp090C','potemp190C','sigma-theta00','sigma-theta11','svel00','svel11','sbeox0V','sbeox0Mm/L','sbeox0PS','sbeox1V','sbeox1Mm/L','sbeox1PS']
metadata = ['CTD number']
voltage_channels = ['FluoroWetlabWetstarSensor','TurbidityMeter','OxygenSensor','WET_LabsCStar','AltimeterSensor','FluoroWetlabECO_AFL_FL_Sensor']


if heave_mode == 0:
    # Load data from file
    file_in = os.path.join(all_2Hz,'cruise_heavescreened_2Hz.csv')
    df_profiles2Hz_screened = pd.read_csv(file_in)
else:
    file_in = os.path.join(all_2Hz,'cruise_screenready_2Hz.csv')
    df_profiles2Hz_screened = pd.read_csv(file_in)
    df_profiles2Hz_screened['prDM_QC'] = 0
    
for channel in voltage_channels:
    i = 0
    for column in df_profiles2Hz_screened.columns:
        if channel in column:
            new_label = channel+'_'+str(i)
            df_profiles2Hz_screened.rename(columns = {column:new_label}, inplace = True)
            params_master.append(new_label)
            i = i + 1
    
for item in params_master:
    if item in df_profiles2Hz_screened.columns:
        params_out.append(item)

df_binned = seabird_processes.bin_data(df_profiles2Hz_screened,cast,zcord,profile_id,params_out, bin_width=bin_width_widget.value)

print(params_out)    
    
# Convert oxygen values from umol/l to ml/l and umol/kg
if 'sbeox0Mm/L' in params_out:
    df_binned['sbeox0Mm/kg'] = df_binned['sbeox0Mm/L']/((df_binned['sigma-theta00']+1000)/1000)

    df_binned['sbeox0ml/l'] = df_binned['sbeox0Mm/L']/44.66
    
if 'sbeox1Mm/L' in params_out:
    df_binned['sbeox1Mm/kg'] = df_binned['sbeox1Mm/L']/((df_binned['sigma-theta11']+1000)/1000)

    df_binned['sbeox1ml/l'] = df_binned['sbeox1Mm/L']/44.66

uncal_cruise_file = os.path.join(out, f'cruise_data_uncal_{binning_info_str}binned.csv') 
df_binned.to_csv(uncal_cruise_file, index=False)

print(f"Pre-calibration processing completed. File saved to: {uncal_cruise_file}")

## Merge processed profiles with metadata from an electronic logsheet

<span style='color:Blue'> 

### A logsheet is not a requirement but only limited metadata will be read from headers and retained in output file if no logsheet is provided

</span>

## Collate bottle firing details with CTD metadata, Bedford numbers and SBE35 temperatures for the cruise

This cell creates a bottle summary file, useful to summarise sensor values and metadata at time of bottle firing

See file in folder ...\output\bottle\...

In [None]:
                                                                                ### Cell 10 ###
# Check if data frames already exist in notebook memeory and if so clear from memory to remove risk of inconsistencies   
if "ctd_events" in set(globals()).union(set(locals())):
    del(ctd_events) # type: ignore
    gc.collect()
    #print("Deleting ctd_events dataframe that already exists to clear history for this cell")
if "ctd_log" in set(globals()).union(set(locals())):
    del(ctd_log) # type: ignore
    gc.collect()

# Check if logsheet file exists

            ### ED change CTD_Log.xlsx file to .xls format before running when using newer versions of XLSXRD
            ### And change file extension here to use it 21/06/2021

# Load cast start seconds from the pump_on_time.csv file into a dataframe
pumpdf = pd.read_csv(os.path.join(out,'pump_on_time.csv'))
pumpdf.columns = ['Cycles','CTD number','Start','prDM']
pumpdf['CTD number'] = pumpdf['CTD number'].str.replace('.CNV','',regex=True)
pumpdf['Start'] = pd.to_timedelta(pumpdf['Start'], unit='seconds')
pumpdf = pumpdf[['CTD number','Start']]

ctd_events = data_processing.create_ctd_events(cruiseID=cruiseID,
                                               raw_directory=raw,
                                               logs=logs,
                                               pumpdf=pumpdf)

try:
    display(ctd_events)
    df = data_processing.merge_data_with_metadata(cruiseID,
                                                  output_directory=out,
                                                  ctd_events=ctd_events,
                                                  logs=logs,
                                                  binning_info=binning_info_str)
  #  data_processing.create_output_csv_for_fisheries(df=df, output_directory=out)
    bottle_processing.create_bottle_summary(bottle_directory=bottle,
                                            cruiseID=cruiseID,
                                            logs=logs,
                                            ctd_events=ctd_events,
                                            sbe35_raw=sbe35_raw,
                                            raw_folder=raw)
    
except NameError as e:
    print("\033[1;31mMetadata merge not possible without a populated logsheet XLSX file.\33[0m")
    print(e)


## Display the Visualisation dashboard:

Successfull running of the following cell is a great indication that everything has worked correctly and the process is almost complete.

The output display provides an opportunity to view the processed data for the first time, e.g. while still at sea.

In [None]:
                                                                                ### Cell 11 ###
# Load binned CTD data
uncal_CTD_file = os.path.join(out,f'{cruiseID}_CTDprofiles_uncal_{binning_info_str}binned_meta.csv') 
df_plot = pd.read_csv(uncal_CTD_file, parse_dates = ['CTD_start'])

profile_list = df_plot['CTD number'].unique().tolist()

# Convert postion to northing and eastings for plotting    
df_plot['Eastings'], df_plot['Northings'] = calculations.merc_from_arrays(df_plot['Latitude [degrees_north]'], df_plot['Longitude [degrees_east]'])

binning_layout = ctd_bokeh.bokeh_layout(profile_data = df_plot,
                                       pump_data=None,
                                       output_path=out,
                                       downcast_data=None)

output_notebook(INLINE)
show(binning_layout.bin_screen, notebook_handle=True)
push_notebook()

## At this stage of the notebook processing of the profiles is complete

### Now backup processed files on-board

The following cell automates the backup of processed files onboard the MI research vessels.

***Only run this cell if on an MI ship, or alternately, adjust the file directory path to use cell for your own automated backup

In [None]:
# TODO: Python 3 shutil.copytree function does not allow you to overwrite an existing directory so need to delete the directory before copying it
#  Python 3.8 and higher does have this functionality so if we upgrade should not remove the directoory but just overwrite it using dirs_exist_ok=True

#time.sleep(2.5)

#if not os.path.exists(os.path.normpath(os.path.join('Z:/2.1 Oceanographic', cruiseID))):
#    os.mkdir(os.path.normpath(os.path.join('Z:/2.1 Oceanographic', cruiseID)))
    
#if not os.path.exists(backup_server_path):
#    os.mkdir(backup_server_path)

#if os.path.exists(backup_server_path):
#    shutil.rmtree(backup_server_path)

#shutil.copytree(os.path.join(filepath, 'output'), backup_server_path)

In [None]:
                                                                                ### Cell 11 ###
# Load binned CTD data
uncal_CTD_file = os.path.join(out,f'{cruiseID}_CTDprofiles_uncal_{binning_info_str}binned_meta.csv') 
df_plot = pd.read_csv(uncal_CTD_file, parse_dates = ['CTD_start'])

profile_list = df_plot['CTD number'].unique().tolist()

# Convert postion to northing and eastings for plotting    
df_plot['Eastings'], df_plot['Northings'] = calculations.merc_from_arrays(df_plot['Latitude [degrees_north]'], df_plot['Longitude [degrees_east]'])

binning_layout = ctd_bokeh.bokeh_layout(profile_data = df_plot,
                                       pump_data=None,
                                       output_path=out,
                                       downcast_data=None)

output_notebook(INLINE)
show(binning_layout.bin_screen_overlay, notebook_handle=True)
push_notebook()