Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Method to create float_source/<WMO>.mat file for OWC #141

Closed
8 tasks done
gmaze opened this issue Nov 4, 2021 · 8 comments · Fixed by #142
Closed
8 tasks done

Method to create float_source/<WMO>.mat file for OWC #141

gmaze opened this issue Nov 4, 2021 · 8 comments · Fixed by #142
Assignees
Labels
argo-core About core variables (P, T, S) enhancement New feature or request forQCexpert Argo QC expertise is required

Comments

@gmaze
Copy link
Member

gmaze commented Nov 4, 2021

Rational for this new feature

Matlab and python implementations of the Argo salinity calibration method OWC take as data input a preprocessed version of Argo profile measurements.
This preprocessing is not implemented in pyowc yet, and is done here in matlab.
Argopy should be able to provide this set of preprocessed data for expert users.
This has been discussed in here euroargodev/argodmqc_owc/issues/33 and here euroargodev/DMQC-PCM/issues/21.

API design

First we would fetch float data to QC:

loader = ArgoDataFetcher(mode='expert').float(6902746)
ds = loader.load().data

Then, we could have 2 methods accessible with the argopy argo accessor:

  1. a method to preprocessed the data and return a new xarray dataset, to be used by python softwares (eg: pyowc)
  2. a method to save these preprocessed data in Matlab files that could directly be used by the Matlab OWC software or the current implementation of pyowc.

This could go like this:

# Preprocessed data for OWC:
ds_processed = ds.argo.to_owc()

and then possibly create matlab files to be used by OWC softwares:

dest = os.path.sep.join(["/", "float_source", "%i.mat" % wmo])
mat_data_file_list = ds.argo.create_float_source(dest)

The former method would ran internally ds.argo.to_owc() and then save Matlab files.

List of tasks for the to_owc method:

Based on what is done by the matlab function, the argopy method should do:

  • fetch netcdf data in expert mode
  • recalculates salinity from the adjusted pressure if necessary.
  • calculates potential temperature
  • converts dates (juld> year with decimals)
  • converts longitudes
  • harmonizes QC (pres, temp, psal), excludes outliers
  • sub-sample data along 10db pressure bins
  • ensures that data that are at about the same pressure have the same vertical index for all cycles (which is not necessarily the case if there are changes in resolution from one cycle to the other, incomplete profiles ...)
@gmaze gmaze added forQCexpert Argo QC expertise is required enhancement New feature or request argo-core About core variables (P, T, S) labels Nov 4, 2021
@gmaze gmaze self-assigned this Nov 4, 2021
@gmaze gmaze added this to the Go from alpha to beta milestone Nov 4, 2021
@gmaze
Copy link
Member Author

gmaze commented Nov 4, 2021

I see from the OWC matlab documentation the following:

For each float, put the original float data in matrix form, with each column being a profile, in 
chronological order (i.e. column 1 contains the first profile of the float, column 2 contains the 
second profile, etc.), in the following variable names:

LAT		(1×n, in decimal degrees, −ve means south of the equator, e.g. 20.5S = −20.5)
LONG		(1×n, in decimal degrees, from 0 to 360, e.g. 98.5W in the eastern Pacific = 261.5E)
DATES 	(1×n, in decimal year, e.g. 10 Dec 2000 = 2000.939726)
*Note that this date format is different from that used in the reference database.
PRES		(m×n, in dbar, monotonically increasing; i.e. 1st element of the column is the
shallowest pressure, and subsequent values are unique and increasing)
SAL		(m×n, in PSS-78)
TEMP		(m×n, in-situ ITS-90)
PTMP  	(m×n, potential temperature referenced to zero pressure)
PROFILE_NO (1×n, this can go from 0 to n−1, or 1 to n)

m = maximum number of observed levels from the float
n = number of profiles in the float time series

PROFILE_NO usually is the same as CYCLE_NO in the Argo netcdf files, but PROFILE_NO has 
to be unique. So for floats that report two cycle 0s, I suggest you either: (a) store cycle 
number in a variable called CYCLE_NO = [0, 0, 1, 2, 3, 4, ...] for your own record-keeping, then 
store PROFILE_NO = [1, 2, 3, 4, 5, ...] correspondingly for computation in this software; or (b) 
remove the first cycle 0 if it does not need calibration by this software.

Note also that if there are missing cycles in your float series, you can either create extra columns 
with NaNs to represent the missing cycles, or you can just leave them out. For example, if a float 
is missing cycle 4 from a 7-cycle series, then you can just have PROFILE_NO = [1, 2, 3, 5, 6, 7] and 
other matrices will have the corresponding 6 entries.

Fill up the extra spaces in the columns with NaNs to make up the matrices. Bad data should also 
be denoted by NaNs. In particular, values in PRES have to be distinct and monotonically increasing. 
Save the matrices in a .mat file in MATLAB in the appropriate subdirectory in /data/float_source/. 
There should be one .mat file for each float. For example,

/data/float_source/project_xx/float0001.mat
/data/float_source/project_xx/float0002.mat
/data/float_source/jones/myfloat_a.mat
/data/float_source/jones/myfloat_b.mat

and @cabanesc further commented:

In the matlab code that produces the .mat files, we apply the requirements found in the OW documentation.
In addition, we subsample the vertical levels (max 1 level every 10db).
We also discard the data with a QC=4, and rearrange the matrices so that measurements at approximately the same pressure have the same level index (I am not sure if this last step is still necessary in OWC)

@gmaze
Copy link
Member Author

gmaze commented Nov 4, 2021

@cabanesc, @AndreaGarciaJuan, @kamwal , @quai20
I poke you here just to let you know I'm starting to work on this ⏳

@gmaze
Copy link
Member Author

gmaze commented Nov 5, 2021

@cabanesc, I have difficulties to understand how the 1 point every 10db requirement is implemented.

For instance, when looking at this matlab source file: https://github.com/euroargodev/argodmqc_owc/blob/master/data/test_data/float_source/3901960.mat

The first profile pressure has the following values:

array([   3.        ,    5.        ,   15.10000038,   25.10000038,
         36.        ,   46.09999847,   55.        ,   65.        ,
         75.80000305,   85.        ,   95.69999695,  105.30000305,
        115.5       ,  125.5       ,  135.6000061 ,  145.3999939 ,
        155.6000061 ,  165.19999695,  175.5       ,  185.30000305, ....

where I see the 10db increment starting only after the 5db values. Why is this not starting at 3db ?

My own implementation of the requirement leads to the following selection of pressure values:

array([   3. ,   10.1,   21. ,   31. ,   40. ,   50.2,   60. ,   70.7,
         80.3,   90.8,  101.3,  111.4,  121.5,  131.4,  141.5,  151.5,
        161.6,  171.5,  181.3,  191.3,  201.4,  211.5,  221.5,  231.6,
        241.1,  251.8,  261.8,  271.4,  281.2,  291.8,  301.6,  311.4,...

that correctly return 1 value for each 10db layer:
download

So I guess, should the rule be interpreted as:

  • 1 point every 10db starting from the 1st pressure level, or
  • 1 point every 10db starting from 0db

The Matlab code does not follow one of these ...

@kamwal
Copy link

kamwal commented Nov 5, 2021

Can we add to the "List of tasks for the to_owc method" also option to recalculates salinity due to Cell Thermal Mass correction if applicable?

@gmaze
Copy link
Member Author

gmaze commented Nov 5, 2021

Can we add to the "List of tasks for the to_owc method" also option to recalculates salinity due to Cell Thermal Mass correction if applicable?

why not
is this correction applied before OWC ?
in this case I would prefer to rename to_owc more appropriately to reflect what is doing here (prepare the float_source data for OWC) and then have the CTmass correction in another method

do you talk about the classic Lueck & Picklo thermal mass correction ?
in this case the method would be name LP_TM_correction() for instance

@gmaze
Copy link
Member Author

gmaze commented Nov 10, 2021

Update

With the last few commits, we now have a working method to OWC create float sources in python (see #142 )

I am still consolidating against the Matlab function output, although after discussion with @cabanesc they are some choices made in the Matlab version that could be challenged, mostly about the order of the different manipulations

Examples

Sneak peek of processed profiles for misc floats

WMO 6903075

download (2)

WMO 3901915

download (4)

WMO 6903010

download (5)

@gmaze
Copy link
Member Author

gmaze commented Nov 17, 2021

Done ! 🎉

A satisfying routine is now implemented. Last differences with Matlab create_float_source.m are due to errors in there.

@gmaze
Copy link
Member Author

gmaze commented Nov 23, 2021

Last differences with Matlab create_float_source.m are due to errors in there.

More precisely, last remaining differences are due to different handling of the binnarisation of the data along the pressure axis
This can't be fixed
But output data are sufficiently close to work with this python output

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
argo-core About core variables (P, T, S) enhancement New feature or request forQCexpert Argo QC expertise is required
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants