Method to create float_source/<WMO>.mat file for OWC #141

gmaze · 2021-11-04T10:10:36Z

Rational for this new feature

Matlab and python implementations of the Argo salinity calibration method OWC take as data input a preprocessed version of Argo profile measurements.
This preprocessing is not implemented in pyowc yet, and is done here in matlab.
Argopy should be able to provide this set of preprocessed data for expert users.
This has been discussed in here euroargodev/argodmqc_owc/issues/33 and here euroargodev/DMQC-PCM/issues/21.

API design

First we would fetch float data to QC:

loader = ArgoDataFetcher(mode='expert').float(6902746)
ds = loader.load().data

Then, we could have 2 methods accessible with the argopy argo accessor:

a method to preprocessed the data and return a new xarray dataset, to be used by python softwares (eg: pyowc)
a method to save these preprocessed data in Matlab files that could directly be used by the Matlab OWC software or the current implementation of pyowc.

This could go like this:

# Preprocessed data for OWC:
ds_processed = ds.argo.to_owc()

and then possibly create matlab files to be used by OWC softwares:

dest = os.path.sep.join(["/", "float_source", "%i.mat" % wmo])
mat_data_file_list = ds.argo.create_float_source(dest)

The former method would ran internally ds.argo.to_owc() and then save Matlab files.

List of tasks for the `to_owc` method:

Based on what is done by the matlab function, the argopy method should do:

fetch netcdf data in expert mode
recalculates salinity from the adjusted pressure if necessary.
calculates potential temperature
converts dates (juld> year with decimals)
converts longitudes
harmonizes QC (pres, temp, psal), excludes outliers
sub-sample data along 10db pressure bins
ensures that data that are at about the same pressure have the same vertical index for all cycles (which is not necessarily the case if there are changes in resolution from one cycle to the other, incomplete profiles ...)

The text was updated successfully, but these errors were encountered:

gmaze · 2021-11-04T10:14:08Z

I see from the OWC matlab documentation the following:

For each float, put the original float data in matrix form, with each column being a profile, in 
chronological order (i.e. column 1 contains the first profile of the float, column 2 contains the 
second profile, etc.), in the following variable names:

LAT		(1×n, in decimal degrees, −ve means south of the equator, e.g. 20.5S = −20.5)
LONG		(1×n, in decimal degrees, from 0 to 360, e.g. 98.5W in the eastern Pacific = 261.5E)
DATES 	(1×n, in decimal year, e.g. 10 Dec 2000 = 2000.939726)
*Note that this date format is different from that used in the reference database.
PRES		(m×n, in dbar, monotonically increasing; i.e. 1st element of the column is the
shallowest pressure, and subsequent values are unique and increasing)
SAL		(m×n, in PSS-78)
TEMP		(m×n, in-situ ITS-90)
PTMP  	(m×n, potential temperature referenced to zero pressure)
PROFILE_NO (1×n, this can go from 0 to n−1, or 1 to n)

m = maximum number of observed levels from the float
n = number of profiles in the float time series

PROFILE_NO usually is the same as CYCLE_NO in the Argo netcdf files, but PROFILE_NO has 
to be unique. So for floats that report two cycle 0s, I suggest you either: (a) store cycle 
number in a variable called CYCLE_NO = [0, 0, 1, 2, 3, 4, ...] for your own record-keeping, then 
store PROFILE_NO = [1, 2, 3, 4, 5, ...] correspondingly for computation in this software; or (b) 
remove the first cycle 0 if it does not need calibration by this software.

Note also that if there are missing cycles in your float series, you can either create extra columns 
with NaNs to represent the missing cycles, or you can just leave them out. For example, if a float 
is missing cycle 4 from a 7-cycle series, then you can just have PROFILE_NO = [1, 2, 3, 5, 6, 7] and 
other matrices will have the corresponding 6 entries.

Fill up the extra spaces in the columns with NaNs to make up the matrices. Bad data should also 
be denoted by NaNs. In particular, values in PRES have to be distinct and monotonically increasing. 
Save the matrices in a .mat file in MATLAB in the appropriate subdirectory in /data/float_source/. 
There should be one .mat file for each float. For example,

/data/float_source/project_xx/float0001.mat
/data/float_source/project_xx/float0002.mat
/data/float_source/jones/myfloat_a.mat
/data/float_source/jones/myfloat_b.mat

and @cabanesc further commented:

In the matlab code that produces the .mat files, we apply the requirements found in the OW documentation.
In addition, we subsample the vertical levels (max 1 level every 10db).
We also discard the data with a QC=4, and rearrange the matrices so that measurements at approximately the same pressure have the same level index (I am not sure if this last step is still necessary in OWC)

gmaze · 2021-11-04T10:17:43Z

@cabanesc, @AndreaGarciaJuan, @kamwal , @quai20
I poke you here just to let you know I'm starting to work on this ⏳

gmaze · 2021-11-05T08:32:01Z

@cabanesc, I have difficulties to understand how the 1 point every 10db requirement is implemented.

For instance, when looking at this matlab source file: https://github.com/euroargodev/argodmqc_owc/blob/master/data/test_data/float_source/3901960.mat

The first profile pressure has the following values:

array([   3.        ,    5.        ,   15.10000038,   25.10000038,
         36.        ,   46.09999847,   55.        ,   65.        ,
         75.80000305,   85.        ,   95.69999695,  105.30000305,
        115.5       ,  125.5       ,  135.6000061 ,  145.3999939 ,
        155.6000061 ,  165.19999695,  175.5       ,  185.30000305, ....

where I see the 10db increment starting only after the 5db values. Why is this not starting at 3db ?

My own implementation of the requirement leads to the following selection of pressure values:

array([   3. ,   10.1,   21. ,   31. ,   40. ,   50.2,   60. ,   70.7,
         80.3,   90.8,  101.3,  111.4,  121.5,  131.4,  141.5,  151.5,
        161.6,  171.5,  181.3,  191.3,  201.4,  211.5,  221.5,  231.6,
        241.1,  251.8,  261.8,  271.4,  281.2,  291.8,  301.6,  311.4,...

that correctly return 1 value for each 10db layer:

So I guess, should the rule be interpreted as:

1 point every 10db starting from the 1st pressure level, or
1 point every 10db starting from 0db

The Matlab code does not follow one of these ...

kamwal · 2021-11-05T14:36:24Z

Can we add to the "List of tasks for the to_owc method" also option to recalculates salinity due to Cell Thermal Mass correction if applicable?

gmaze · 2021-11-05T15:07:36Z

Can we add to the "List of tasks for the to_owc method" also option to recalculates salinity due to Cell Thermal Mass correction if applicable?

why not
is this correction applied before OWC ?
in this case I would prefer to rename to_owc more appropriately to reflect what is doing here (prepare the float_source data for OWC) and then have the CTmass correction in another method

do you talk about the classic Lueck & Picklo thermal mass correction ?
in this case the method would be name LP_TM_correction() for instance

gmaze · 2021-11-10T08:28:31Z

Update

With the last few commits, we now have a working method to OWC create float sources in python (see #142 )

I am still consolidating against the Matlab function output, although after discussion with @cabanesc they are some choices made in the Matlab version that could be challenged, mostly about the order of the different manipulations

Examples

Sneak peek of processed profiles for misc floats

WMO 6903075

WMO 3901915

WMO 6903010

gmaze · 2021-11-17T11:04:43Z

Done ! 🎉

A satisfying routine is now implemented. Last differences with Matlab create_float_source.m are due to errors in there.

gmaze · 2021-11-23T15:12:21Z

Last differences with Matlab create_float_source.m are due to errors in there.

More precisely, last remaining differences are due to different handling of the binnarisation of the data along the pressure axis
This can't be fixed
But output data are sufficiently close to work with this python output

gmaze added forQCexpert Argo QC expertise is required enhancement New feature or request argo-core About core variables (P, T, S) labels Nov 4, 2021

gmaze self-assigned this Nov 4, 2021

gmaze added this to the Go from alpha to beta milestone Nov 4, 2021

This was referenced Nov 4, 2021

Cache .mat file reads euroargodev/argodmqc_owc#72

Merged

API to create source file and data for OWC software #142

Merged

gmaze mentioned this issue Nov 8, 2021

float_source matlab files incorrect euroargodev/argodmqc_owc#73

Closed

gmaze closed this as completed in #142 Jan 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Method to create float_source/<WMO>.mat file for OWC #141

Method to create float_source/<WMO>.mat file for OWC #141

gmaze commented Nov 4, 2021 •

edited

Loading

gmaze commented Nov 4, 2021

gmaze commented Nov 4, 2021

gmaze commented Nov 5, 2021 •

edited

Loading

kamwal commented Nov 5, 2021

gmaze commented Nov 5, 2021

gmaze commented Nov 10, 2021 •

edited

Loading

gmaze commented Nov 17, 2021

gmaze commented Nov 23, 2021

Method to create float_source/<WMO>.mat file for OWC #141

Method to create float_source/<WMO>.mat file for OWC #141

Comments

gmaze commented Nov 4, 2021 • edited Loading

Rational for this new feature

API design

List of tasks for the to_owc method:

gmaze commented Nov 4, 2021

gmaze commented Nov 4, 2021

gmaze commented Nov 5, 2021 • edited Loading

kamwal commented Nov 5, 2021

gmaze commented Nov 5, 2021

gmaze commented Nov 10, 2021 • edited Loading

Update

Examples

WMO 6903075

WMO 3901915

WMO 6903010

gmaze commented Nov 17, 2021

Done ! 🎉

gmaze commented Nov 23, 2021

gmaze commented Nov 4, 2021 •

edited

Loading

List of tasks for the `to_owc` method:

gmaze commented Nov 5, 2021 •

edited

Loading

gmaze commented Nov 10, 2021 •

edited

Loading