Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle mask_and_scale ourselves instead of using netCDF4 #20

Merged
merged 8 commits into from
Feb 28, 2014

Conversation

shoyer
Copy link
Member

@shoyer shoyer commented Feb 26, 2014

This lets us use NaNs instead of masked arrays to indicate missing values.

This lets us NaNs instead of masked arrays to indicate missing values.
@@ -59,36 +60,6 @@ def sync(self):
pass


def convert_to_cf_variable(array):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to conflict with my change https://github.com/akleeman/xray/pull/21 but I'm definitely in favor of moving it to conventions.

@shoyer
Copy link
Member Author

shoyer commented Feb 27, 2014

Notes:

  1. DO NOT MERGE -- some of the units tests are currently failing (see below).
  2. I decided to rebase to use some of @ebrevdo's recent changes. The discussion on the previous commit can be found here: akleeman@d700eeb

Failing tests:

I added an "encoded_dtype" attribute to keep of the original dtype of variables loaded from a netCDF file. Unfortunately, this means most of the round-trip tests currently fail, because no variables have "encoded_dtype" attributes until they are loaded from netCDF files. I think we will need to make some ugly trade-off to get round tripping working both directions, but I'm not yet sure what the best option is. We could:

  1. Ignore some specific attributes (like "encoded_dtype") when checking XArray equality.
  2. Save these encoding details on XArrays outside of the attributes dict.
  3. Add special logic for the round-trip tests to ignore these attributes.

FWIW, I think it's OK if we don't preserve data types in the round-trip process, as long as the data itself is equivalent. I'm not entirely opposed to trying, but in general it is very hard to guarantee that serialized/unserialized data is exactly equivalent. There is somewhat of a conflict between preserving the original data (netCDF-like) and representing the data in a sane format in-memory (which should not be exactly like a netCDF). IMHO, we should focus on the later.

if any(k in attributes for k in ['add_offset', 'scale_factor']):
data = np.array(data, dtype=float, copy=True)
if 'add_offset' in attributes:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the variable has been scaled and offset should we remove those attributes (or add underscores)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the for serializing to netCDF logic, so we need these attributes around still if we want to round trip things properly.

On the other hand, I'm not so certain now that exactly faithful round-trip behavior netCDF->xray->netCDF is behavior we want (see above).

@shoyer
Copy link
Member Author

shoyer commented Feb 27, 2014

OK, the last commit implements the suggested approach in #26 and all tests pass.

There is not yet a test for toggling decode_mask_and_scale on/off when loading a dataset.

Please take a look when you get the chance :).

if 'scale_factor' in encoding:
data /= encoding['scale_factor']
attributes['add_offset'] = encoding['add_offset']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add_offset -> scale_factor

@shoyer
Copy link
Member Author

shoyer commented Feb 27, 2014

See the new commits for more comprehensive tests of encoding/decoding.

Datasets now have an optional constructor argument which
determines whether CF-variables are converted or stored raw.
@@ -118,6 +118,7 @@ def __init__(self, variables=None, attributes=None):
attributes : dict-like, optional
Global attributes to save on this dataset.
"""
self._decode_cf = decode_cf
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_decode_cf should not add additional state to Dataset. Instead it should be passed on to _set_variables and _as_variable. If we want the option to use CF decoding later, we should expose set_variables as a public method.

shoyer added a commit that referenced this pull request Feb 28, 2014
Handle mask_and_scale ourselves instead of using netCDF4
@shoyer shoyer merged commit 65d62c6 into master Feb 28, 2014
@shoyer shoyer deleted the mask-and-scale branch February 28, 2014 22:33
alexamici added a commit that referenced this pull request Dec 10, 2020
* Define _get_backends_cls function inside apiv2.py to read engines from plugins.py

* Read open_backends_dataset_* from entrypoints.

* Add backend entrypoints in setup.cfg

* Pass apiv2.py isort and black formatting tests.

* add dependencies

* add backend entrypoints and check on conflicts

* black

* removed global variable EMGINES
add class for entrypointys

* black isort

* add detect_engines in __all__ init.py

* removed entrypoints in py36-bare-minimum.yml and py36-min-all-deps.yml

* add entrypoints in IGNORE_DEPS

* Plugins test (#20)

- replace entrypoints with pkg_resources
- add tests

* fix typo

Co-authored-by: keewis <keewis@users.noreply.github.com>

* style

Co-authored-by: keewis <keewis@users.noreply.github.com>

* style

* Code style

* Code style

* fix: updated plugins.ENGINES with plugins.list_engines()

* fix

* One more correctness fix of the latest merge from master

Co-authored-by: TheRed86 <m.rossetti@bopen.eu>
Co-authored-by: keewis <keewis@users.noreply.github.com>
Co-authored-by: Alessandro Amici <a.amici@bopen.eu>
keewis pushed a commit to keewis/xarray that referenced this pull request Jan 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants