Handle mask_and_scale ourselves instead of using netCDF4 #20

shoyer · 2014-02-26T00:19:15Z

This lets us use NaNs instead of masked arrays to indicate missing values.

This lets us NaNs instead of masked arrays to indicate missing values.

akleeman · 2014-02-27T06:58:53Z

src/xray/backends.py

@@ -59,36 +60,6 @@ def sync(self):
        pass


-def convert_to_cf_variable(array):


This is going to conflict with my change https://github.com/akleeman/xray/pull/21 but I'm definitely in favor of moving it to conventions.

shoyer · 2014-02-27T07:10:03Z

Notes:

DO NOT MERGE -- some of the units tests are currently failing (see below).
I decided to rebase to use some of @ebrevdo's recent changes. The discussion on the previous commit can be found here: akleeman@d700eeb

Failing tests:

I added an "encoded_dtype" attribute to keep of the original dtype of variables loaded from a netCDF file. Unfortunately, this means most of the round-trip tests currently fail, because no variables have "encoded_dtype" attributes until they are loaded from netCDF files. I think we will need to make some ugly trade-off to get round tripping working both directions, but I'm not yet sure what the best option is. We could:

Ignore some specific attributes (like "encoded_dtype") when checking XArray equality.
Save these encoding details on XArrays outside of the attributes dict.
Add special logic for the round-trip tests to ignore these attributes.

FWIW, I think it's OK if we don't preserve data types in the round-trip process, as long as the data itself is equivalent. I'm not entirely opposed to trying, but in general it is very hard to guarantee that serialized/unserialized data is exactly equivalent. There is somewhat of a conflict between preserving the original data (netCDF-like) and representing the data in a sane format in-memory (which should not be exactly like a netCDF). IMHO, we should focus on the later.

akleeman · 2014-02-27T07:11:44Z

src/xray/conventions.py

+    if any(k in attributes for k in ['add_offset', 'scale_factor']):
+        data = np.array(data, dtype=float, copy=True)
+        if 'add_offset' in attributes:


Once the variable has been scaled and offset should we remove those attributes (or add underscores)?

This is the for serializing to netCDF logic, so we need these attributes around still if we want to round trip things properly.

On the other hand, I'm not so certain now that exactly faithful round-trip behavior netCDF->xray->netCDF is behavior we want (see above).

shoyer · 2014-02-27T20:19:03Z

OK, the last commit implements the suggested approach in #26 and all tests pass.

There is not yet a test for toggling decode_mask_and_scale on/off when loading a dataset.

Please take a look when you get the chance :).

akleeman · 2014-02-27T20:28:04Z

src/xray/conventions.py

+        if 'scale_factor' in encoding:
+            data /= encoding['scale_factor']
+            attributes['add_offset'] = encoding['add_offset']


add_offset -> scale_factor

Miscelaneous bugs in mask/scale serialization also fixed.

shoyer · 2014-02-27T22:27:58Z

See the new commits for more comprehensive tests of encoding/decoding.

Datasets now have an optional constructor argument which determines whether CF-variables are converted or stored raw.

shoyer · 2014-02-28T17:30:59Z

src/xray/dataset.py

@@ -118,6 +118,7 @@ def __init__(self, variables=None, attributes=None):
        attributes : dict-like, optional
            Global attributes to save on this dataset.
        """
+        self._decode_cf = decode_cf


_decode_cf should not add additional state to Dataset. Instead it should be passed on to _set_variables and _as_variable. If we want the option to use CF decoding later, we should expose set_variables as a public method.

Handle mask_and_scale ourselves instead of using netCDF4

* Define _get_backends_cls function inside apiv2.py to read engines from plugins.py * Read open_backends_dataset_* from entrypoints. * Add backend entrypoints in setup.cfg * Pass apiv2.py isort and black formatting tests. * add dependencies * add backend entrypoints and check on conflicts * black * removed global variable EMGINES add class for entrypointys * black isort * add detect_engines in __all__ init.py * removed entrypoints in py36-bare-minimum.yml and py36-min-all-deps.yml * add entrypoints in IGNORE_DEPS * Plugins test (#20) - replace entrypoints with pkg_resources - add tests * fix typo Co-authored-by: keewis <keewis@users.noreply.github.com> * style Co-authored-by: keewis <keewis@users.noreply.github.com> * style * Code style * Code style * fix: updated plugins.ENGINES with plugins.list_engines() * fix * One more correctness fix of the latest merge from master Co-authored-by: TheRed86 <m.rossetti@bopen.eu> Co-authored-by: keewis <keewis@users.noreply.github.com> Co-authored-by: Alessandro Amici <a.amici@bopen.eu>

Add basic CI setup

Handle mask_and_scale ourselves instead of using netCDF4

d9a4c9c

This lets us NaNs instead of masked arrays to indicate missing values.

akleeman reviewed Feb 27, 2014
View reviewed changes

akleeman mentioned this pull request Feb 27, 2014

Allow the ability to add/persist details of how a dataset is stored. #26

Closed

Save serialization info to XArray.encoding

75bce5b

akleeman reviewed Feb 27, 2014
View reviewed changes

shoyer added 3 commits February 27, 2014 12:59

Changes in response to github comments

8cc8258

Serialize/de-serialize strings

4956ffc

More comprehensive Dataset read/write tests

0f1febf

Miscelaneous bugs in mask/scale serialization also fixed.

CF Time coordinates are handled using encodings.

8b5101d

Datasets now have an optional constructor argument which determines whether CF-variables are converted or stored raw.

shoyer reviewed Feb 28, 2014
View reviewed changes

shoyer added 2 commits February 28, 2014 13:38

Fixed my issues with Alex's commit

b9b4b0f

Reverted an accidental change to utils.dict_equal

c647ed6

shoyer added a commit that referenced this pull request Feb 28, 2014

Merge pull request #20 from akleeman/mask-and-scale

65d62c6

Handle mask_and_scale ourselves instead of using netCDF4

shoyer merged commit 65d62c6 into master Feb 28, 2014

shoyer deleted the mask-and-scale branch February 28, 2014 22:33

keewis pushed a commit to keewis/xarray that referenced this pull request Jan 17, 2024

Merge pull request pydata#20 from jhamman/ci

c5df4d5

Add basic CI setup

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle mask_and_scale ourselves instead of using netCDF4 #20

Handle mask_and_scale ourselves instead of using netCDF4 #20

shoyer commented Feb 26, 2014

akleeman Feb 27, 2014

shoyer commented Feb 27, 2014

akleeman Feb 27, 2014

shoyer Feb 27, 2014

shoyer commented Feb 27, 2014

akleeman Feb 27, 2014

shoyer commented Feb 27, 2014

shoyer Feb 28, 2014

		@@ -59,36 +60,6 @@ def sync(self):
		pass


		def convert_to_cf_variable(array):

Handle mask_and_scale ourselves instead of using netCDF4 #20

Handle mask_and_scale ourselves instead of using netCDF4 #20

Conversation

shoyer commented Feb 26, 2014

akleeman Feb 27, 2014

Choose a reason for hiding this comment

shoyer commented Feb 27, 2014

akleeman Feb 27, 2014

Choose a reason for hiding this comment

shoyer Feb 27, 2014

Choose a reason for hiding this comment

shoyer commented Feb 27, 2014

akleeman Feb 27, 2014

Choose a reason for hiding this comment

shoyer commented Feb 27, 2014

shoyer Feb 28, 2014

Choose a reason for hiding this comment