You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
All operations that add new dataset variables (merge, update, __setitem__).
All operations that create a new dataset ( __init__, concat)
For the later two cases, it is not clear that using an inner join on coordinate labels is the right choice, because that could lead to some surprising destructive operations. This should be considered carefully.
The text was updated successfully, but these errors were encountered:
I have most of a working implementation for this that will be up for a PR shortly. Here's some of my thinking on expected behavior.
Based on the principle that combining arrays into a new dataset should not remove information, it makes sense to use outer joins for Dataset.__init__ and Dataset.merge. This is the same behavior pandas uses
When adding an item to an existing dataset, it would be surprising if indexes or dimension sizes changed. So I think we should be using left joins for __setitem__ and update. This is also what pandas does.
Right now, we use Dataset.merge to handle all operations that add new items to a dataset. Adding automatic alignment is turning that into even more of a kludgy mess than it already was. I think some simplification of scope for update/__setitem__/merge would help, though I'm not sure it's worth breaking existing code.
If we want to mimic pandas, we should support automatic alignment of coordinate labels in:
in-place, see also WIP: Automatic label alignment for mathematical operations #184)merge
,update
,__setitem__
).__init__
,)concat
For the later two cases, it is not clear that using an inner join on coordinate labels is the right choice, because that could lead to some surprising destructive operations. This should be considered carefully.
The text was updated successfully, but these errors were encountered: