# *Full* CTD processing example notebook

Example notebook showing post-processing functionality for an example dataset.

Shows a more complete processing workflow:

- Load and join all CTD profiles from `.cnv`.
- Remove unused fields.
- Assign metadata.
- Export to netCDF (and some other formats if we want). 

In this example, we will use the binned files from the **2019 Fram Strait cruise**. These files are located in the folder `cnv_files/`. I found them in `NPDATA/project/Fram_strait/SOURCE/CTD/fs_2019/`.

### Initial imports

Import the `oceanograpy.data.ctd` module. Also enable interactive visuals. 

In [None]:
from oceanograpy.data import ctd
%matplotlib widget

## 1. Load the data

- Loading `.cnv` files
- Structuring into an object `D`, containing all data variables from the file and with all the metadata we were able to retrieve.

In [None]:
D = ctd.ctds_from_cnv_dir('cnv_files/')

*NOTE*: Occasionally, the `start_time` field in the `.cnv`s has a wacky value, e.g. `1-Jan-2000`. In this case, you may want to try setting `start_time_NMEA` flag. This will make the parser 
assign time based on the `NMEA UTC (Time)` field in the file header - if it exists.
*Example:*

    D = ctd.ctds_from_cnv_dir('input_files/', start_time_NMEA=True)

## 2. Remove data fields we don't want to include

The function `ctd.drop_vars_pick` lets us manually remove variables. 

Remove stuff we are *sure* we don't need, e.g. `SBE_FLAG`, `LATITUDE_SAMPLE`..

In [None]:
D = ctd.drop_vars_pick(D)

##### Apply some standard metadata 
(Update global and attributes including some NPI specific ones. Removes numbers if we only have one sensor e.g. (`TEMP1`). Adds NMDC keywords and time, lat/lon ranges, etc. Reorders the attributes.)

In [None]:
D = ctd.metadata_auto(D, NPI=True)

## 3. Inspect the data

At this pont, we want to look through he data we have - evaluate whether it looks good, and make edits if necessary. The `ctd` module contains various helper functions to facilitate this. If you want, you can of course do you own analysis/inspection here using python tools like `matplotlib`. 

In this step, we will not actually modify anything - we are *looking* at the data to see if we need to fix/change anything (see step 4).

#### Inspect the data file
Executing `D` will give a view of the dataset, allowing you to inspect data and metadata. It is an easy way to inspect the data/metadata. Also good to do this whenever you have made changes in order to check that the changes were effectuadet in the way you intended.  

In [None]:
D

#### Look at a map of the cruise station
A useful sanity check for the lon/lat fields.

In [None]:
ctd.map(D)

#### Contour plot the data
Useful to inspect the variables in the file - usually a good way to see if something is very wrong. 

In [None]:
ctd.contour(D)

#### Plot data profiles

Look through the individual profiles. Before pulishing, it is a good idea to have had at least a quick look at all profiles.

In [None]:
ctd.inspect_profiles(D)

#### Compare dual sensors

Look through individual profiles of dual sensors (e.g. primary and secondary salinity).

In [None]:
ctd.inspect_dual_sensors(D)

## 4. Edit the data

Now, we may want to make some changes to the file. This can include removing outliers, applying offsets, setting valid range thresholds, 
calculating calibrated `CHLA` from (uncalibrated)  `CHLA_fluorescence`.  

#### Manually remove outliers 

Let's say we found an outlier in salinity or an area of a profile where the temperature sensor went wacky. When we have identified which variable we want to edit and at what station, we can run the following:

    D = ctd.hand_remove_points(D, variable, station)

In [None]:
D = ctd.hand_remove_points(D, 'PSAL1', 'Sta0202')

#### Apply threshold cutoffs to a variable

Let's say that we see that a sensor occasionally jumps to a high, unrealistic value. We can remove (set to NaN) values outside an accepted range using the `ctd.apply_threshold()` function:

In [None]:
from matplotlib import pyplot as plt
plt.close('all')

In [None]:
D = ctd.apply_threshold(D)

### Apply an offset to a variable

We can apply a constant has an offset to a variable by using the `ctd.apply_offset()` function.

In [None]:
D = ctd.apply_offset(D)

#### Calibrate chlorophyll
If we have coefficients from a fit of CTD Chlorophyll against water sample measurements, we can apply a calibration to chlorophyll using `ctd.calibrate_chl()`.

This will create a new variable `CHLA` with calibrated chlorophyll, calculated as *CHLA = A x CHLA_from_CTD + B*.

By default, *CHLA_from_CTD* is read from the field `CHLA_fluorescence` or `CHLA1_fluorescence`. If you want to se somethnin else (e.g. `CHLA2_fluorescence`), you can specify it by setting `chl_name_in = CHLA2_fluorescence`.

If you want to automatically remove the old (uncalibrated chla) variable, you can set `remove_uncal=True`. 



In [None]:
A = 1.1 # Assume we know A and B
B = 0.3

D = ctd.calibrate_chl(D, A, B)

### 5. Get the file up to conventions

When we are happy with the data, we move on to the metadata. We will run a checker to see how out dataset complies with CF/ACDD, and add or modify metadata attributes until the
convention checker says we a re good to go.

##### Check what required metadata are missing

Will run a convention checker [(this one)](https://github.com/cedadev/cf-checker) on out dataset. It will check against the conventions we have specified in our file metadata (default CTD-1.8 and ACDD-1.3).

*Inspect the output carefully and make a note that we need to handle anything that gives errors or warnings*.


In [None]:
ctd.check_metadata(D)

### (Some pointers about metadata attributes)



**Global attributes**
- `summary` - This is an important metadata field. You should think if it as the "abstract" for your datasets. It can be multiple paragraphs if you want.
- `id` - This is a unique identifyer for your dataset, e.g. `fram_strait_2014_ctd_data_v1`. It should be decriptive but not awfully long. For long-term projects, you may want to stick to naming conventions of previous datasets. Including a version suffix (a la `_v1`) is a good idea in case you end up updating the dataset in the future (we can't remove uploaded files from *data.npolar.no*). 
- `comment` - This is a recommended field. You can fill it with whatever you think might be useful for a user of the dataset to see.. I wouldn't worry too much about this field missing if there is nothing particular to comment on.
- `processing_level` and `QC_indicator`. These can be specified on variable level or globally for the whole dataset. If you have these for each data variable, you do not need it for the whole datset (although the convention checker will warn you about it). Both fields have pre-determined value options like `['Raw instrument data data', 'Data manually reviewed'..]` and `['good data', 'probably good data'..]`. 

**Variable attributes**
- `standard_name` - This is an important attribute. Standard names are strictly controlled by conventions (unlike `long_name` or variable names themselves, like `TEMP`). You can find the list of currently accepted standard names [here](http://cfconventions.org/Data/cf-standard-names/current/build/cf-standard-name-table.html).
- `long_name` - This is not strictly controlled. Should be a human-readable description of what the variable is.

**NOTE** All variables should at minimum have either `standard_name` or `long_name` (or, ideally, both). If there is no `standard_name`, be sure to include a `long_name`!

## Fill in missing metadata fields

Use the helper functions below to change or add global or variable attributes.

Add/modify **global** attributes

In [None]:
D = ctd.set_attr_glob(D, 'title')

Add/modify **variable** attributes

In [None]:
D = ctd.set_attr_var(D, 'TEMP', 'units')


### 6. Export the file

When we are happy with both the data and metadata, we can export the file to netCDF. `ctd` also has helper functions for exporting some other useful formats.

(*Note*: You can always save your current file to netCDF and load it later if you want to make further changes. To load a previously saved netCFD file, use `ctd.from_netcdf(path_to_netcdf_file)`)

#### Export to netCDF

By default, the name of the netCDF file will be set to whatever is in the `id` global attribute field of your file (plus an `.nc` suffix ). If you want another file name, you can specify it by giving e.g.:

`ctd.to_netcdf(D, './nc_final/', file_name ='my_file_name')`


In [None]:
ctd.to_netcdf(D, './nc_final/')

#### Export to some other formats (optional)

We can also export to some other formats..

**Matlab file**: (can export both a struct with the same structure as the netCDF, or a simplified version containing only the data) 

In [None]:
ctd.to_mat(D, './output_other/matfile_full.mat')
ctd.to_mat(D, './output_other/matfile_simplified.mat', simplify=True)

**.csv**

In [None]:
ctd.to_csv(D, './output_other/csv_file.csv')

**.txt file with metadata**

In [None]:
ctd.metadata_to_txt(D, './output_other/metadata.csv')

# ..done! (maybe)

You should now, in principle, have a complete netcdf file ready for publication. *However*, you should not trust this procedure blindly. For example, there could be issues with the input files, or errors/inadequacies in the `oceanograpy.ctd` functions. You may also have other preferences than the default settings of the `oceanograpy.ctd` module. Feel free to change things up using your tool of choice (I can recommend using `xarray` and `Jupyter`..), and make suggestions if you would like the `ctd` fuunctions to do anything differently! 

Bottom line: *Be sure to* **carefully inspect your netCDF file before publishing anything**! 