Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Point to right directory where Hycom data is stored #206

Merged
merged 17 commits into from
Jan 10, 2022

Conversation

Mikejmnez
Copy link
Collaborator

@Mikejmnez Mikejmnez commented Nov 17, 2021

Do not merge yet.

There are a couple of things that need fixing:

  1. Data for testing I can provide with a test file with HyCOM data output, and a test for opening / creating the ospy object with such data.

  2. Decode variables. This is an issue that I have been having with this dataset when trying to make simple plots. Right now, the dataset is created with the xarray argument decode_cf=False (see intake catalog). As a result, both the time and variables do not have the proper format, e.g. salinity and temperature are not corrected for scale_factor, offset_value and missing_value . The problem with decode_cf=True when creating the dataset (it usually corrects this scale and offset vals) is that it gives an error because the time has 'encoding' that xarray doesn't like...

setting decode_cf=True when creating the dataset yields

ValueError: Unable to parse date string 'analysis'
During handling of the above exception, another exception occurred:

ValueError: unable to decode time units 'hours since analysis' with 'the default calendar'. 
Try opening your dataset with decode_times=False or installing cftime if it is not installed.

if I manually create the dataset with decode_cf=False and then do

ds = xr.decode_cf(ds, use_cftime=True)

I get the same error

ValueError: unable to decode time units 'hours since analysis' with 'the default calendar'. 
Try opening your dataset with decode_times=False or installing cftime if it is not installed.

If we leave decode_cf=False, the time variable is an array of floats64

time 1.731e+05 1.731e+05 ... 1.741e+05

I think this error is related to that in pydata/xarray#521

do you have a suggestion for decoding the dataset correctly when reading from intake catalog?

Reading the dataset with decode_cf=False and doing ds.time yields

long_name :    Valid Time
units :    hours since 2000-01-01 00:00:00
time_origin :    2000-01-01 00:00:00
calendar :    gregorian
axis :    T

@codecov
Copy link

codecov bot commented Nov 17, 2021

Codecov Report

Merging #206 (9b0e7ac) into master (521f0ee) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #206   +/-   ##
=======================================
  Coverage   95.30%   95.30%           
=======================================
  Files          10       10           
  Lines        3809     3809           
  Branches      842      849    +7     
=======================================
  Hits         3630     3630           
  Misses         99       99           
  Partials       80       80           
Flag Coverage Δ
unittests 95.30% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 521f0ee...9b0e7ac. Read the comment docs.

@ThomasHaine
Copy link
Collaborator

Thanks Miguel. @malmans2 can you comment on the data formatting issue?

@malmans2
Copy link
Contributor

I don't think the time coordinate is the issue. Maybe there's another variable with attributes that are not CF compliant? The attributes of time look in good shape, so there should be another variable with attribute hours since analysis.

If that's the case, you have a few options:

  1. If the variables creating issues are not useful, ignore them using the drop_variables argument of open_dataset
  2. Edit the file attributes and make them CF-compliant (i.e., replace hours since analysis with hours since YYYY-MM-DD hh:mm:ss)
  3. If you don't want to edit the raw files, use the preprocess argument of open_dataset and update the attributes while reading
  4. If there's no way to make the attributes CF-compliant, use preprocess to decide which variable decode (i.e., apply decode_cf only to a subset of variables

@malmans2
Copy link
Contributor

BTW, I don't think it would work in your case, but I often use the argument decode_timedelta=False to avoid problems with variables such as ice age. It's easy to test though!

@Mikejmnez
Copy link
Collaborator Author

Mikejmnez commented Nov 18, 2021

All good suggestions. I'll work on these and report back. Thanks @malmans2 !

updates:
- [github.com/psf/black: 21.10b0 → 21.11b1](psf/black@21.10b0...21.11b1)
- [github.com/nbQA-dev/nbQA: 1.1.1 → 1.2.1](nbQA-dev/nbQA@1.1.1...1.2.1)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
updates:
- [github.com/nbQA-dev/nbQA: 1.2.1 → 1.2.2](nbQA-dev/nbQA@1.2.1...1.2.2)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
updates:
- [github.com/psf/black: 21.11b1 → 21.12b0](psf/black@21.11b1...21.12b0)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@Mikejmnez
Copy link
Collaborator Author

Ok, figured out what variable was causing trouble (tau, which only has dimensions of time). Dropped it and now I can load the dataset while setting Decode_cf=True.

@malmans2 Now that we can proceed with this pull request, I need to send you a test dataset (nc) along with its intake file, right? This in order to add a test that it can open and create the ospy object.

The location and format (e.g. single zarr file) will likely change, but this should be merged in the next couple of days after the changes are done...

@malmans2
Copy link
Contributor

malmans2 commented Jan 5, 2022

@malmans2 Now that we can proceed with this pull request, I need to send you a test dataset (nc) along with its intake file, right? This in order to add a test that it can open and create the ospy object.

Yes, but maybe it's better if we move the data in yours (or @asiddi24 @ThomasHaine @renskegelderloos ) onedrive? So basically add your data, move to another onedrive, and change the url here: https://github.com/hainegroup/oceanspy/blob/master/oceanspy/tests/conftest.py

@Mikejmnez
Copy link
Collaborator Author

Yes, but maybe it's better if we move the data in yours (or @asiddi24 @ThomasHaine @renskegelderloos ) onedrive? So basically add your data, move to another onedrive, and change the url here: https://github.com/hainegroup/oceanspy/blob/master/oceanspy/tests/conftest.py

I agree. For the time being we can move it to my onedrive and figure out later a more stable long-term location.

@Mikejmnez
Copy link
Collaborator Author

@malmans2 I am getting this error where during testing oceanspy can't find the hycom_test.nc file. The error is in the .yaml file I sent you. I think the correct location should be

urlpath: ./oceanspy/tests/Data/hycom_test.nc

could you fixed it on the hycom_test.yaml file?

@malmans2
Copy link
Contributor

@Mikejmnez Done!

@Mikejmnez
Copy link
Collaborator Author

@Mikejmnez Done!

Awesome! If everything looks good, you can merge now. I figured what was causing trouble with decode_cf=True, the variable tau which is only time dependent and now important for now so it has been removed.

@malmans2
Copy link
Contributor

Great! Merging...

@malmans2 malmans2 self-requested a review January 10, 2022 14:55
@malmans2 malmans2 merged commit 0e19a23 into hainegroup:master Jan 10, 2022
@Mikejmnez Mikejmnez deleted the hycom branch July 25, 2022 19:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants