-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
propagation of encoding
#6323
Comments
See also #7686. The ideas presented here are also great! |
This issue was discussed at this week's dev meeting. I will summarize what we discussed:
Specific action items that can happen now:
Longer term action items:
|
We should also consider a configuration option to automatically drop encoding. |
In the hypothetical invocation |
My expectation was that this would be a separate object, e.g., "disable all encoding propagation by discarding encoding attributes once a Dataset has been modified" would be an intermediate step, on the route to removing (As a side note, I would probably spell this as |
In a future where |
For your consideration, I would like to posit the following use case: From this point of view, the encoding settings of the Dataset is logically an attribute of the Dataset and its elements. It would also be a pain (and lead to a degradation in code quality) to have to add a
My two cents: As a user, I would not expect arbitrary functions applied to a Dataset to also remove all encoding attributes. In fact, it would probably send me on a debug journey to figure out how, why and when my Dataset suddenly lost all the encoding settings I had added to it. Arguably, the clearing of encoding would be a side-effect, and one that most operations should not have. If I understand correctly, in the end the properties stored in the encoding attribute are meant for a backend function/library that will write the Dataset to a file (like Zarr, or NetCDF, or even some custom format through a self defined function). The actual effect of these properties come from the meaning that these backends assign to them. Therefore I would not, as Xarray, make assumptions about what functions invalidate what properties of the encoding attribute, but leave this to the user. So perhaps a reasonable approach could be to let the encoding attribute exist, but to not have any Xarray functions add, delete or modify them. If a user performs a function that impacts the encoding, they should fix those values before attempting to write to a file. (For these purposes, I would consider functions like As long as the documentation is clear on this behavior, I believe anyone encountering encoding related issues should be able to figure out that they have to fix the encoding attributes causing the issue. I hope this is a helpful contribution to the discussion :) |
Even before going through the items in #6323 (comment) — would it make sense to at least remove the old (Possible we could have pushed #8069 towards this, rather than setting a new encoding attribute? Thanks again to @Metamess for starting that PR...) |
What is your issue?
We frequently get bug reports related to
encoding
that can usually be fixed by clearing it or by overriding it using theencoding
parameter of theto_*
methods, e.g.There are also a few discussions with more background:
We discussed this in the meeting yesterday and as far as I can remember agreed that the current default behavior is not ideal and decided to investigate #5336: a
keep_encoding
option, similar tokeep_attrs
, that would beTrue
(propagateencoding
) by default but will be changed toFalse
(dropencoding
on any operation) in the future.cc @rabernat, @shoyer
The text was updated successfully, but these errors were encountered: