add open_datatree to xarray #8697

flamingbear · 2024-02-02T20:20:03Z

Draft: I'd like to open this up and start a discussion on a couple of things.

Here is a first stab at adding open_datatree to the backend of xarray.

I'm not sure which tests to migrate/write with this change. The only datatree tests that have open_datatree are the ones in tests/test_io.py. While those (and all of the datatree tests) run and pass, they are still located in _datatree, and they don't seem to fit in the test_backends.py. I do see that Tom did a good job of naming the tests so that they would fit with the existing tests in many places. Migrated tests over into xarray/tests/datatree once approved and merged can be added into the existing tests where appropriate (from Tom).
I was able to open datatrees with each of the engines, netcdf4, h5netcdf and zarr.
I haven't moved any documentation. I remember hearing that we could add it and mark it as experimental? Is that the correct way forward?

No check boxes checked yet.

Closes part of first bullet of Track merging datatree into xarray #8572
Tests added
User visible changes (including notable bug fixes) are documented in whats-new.rst
New functions/methods are listed in api.rst

…-datatree-attempt2

Adds additional ignore to mypy Adds additional ignore to doctests Excludes xarray/datatree_ from all pre-commmit.ci

First stab. Will need to add/move tests.

I mistakenly thought we wanted to use the hidden version of datatree_ and we do not.

We do not want to expose open_datatree at top level until all of the code is migrated.

flamingbear · 2024-02-02T20:20:44Z

xarray/backends/api.py

+    engine : str, optional
+        Xarray backend engine to us. Valid options include `{"netcdf4", "h5netcdf", "zarr"}`.
+    kwargs :
+        Additional keyword arguments passed to :py:meth:`~xarray.open_dataset` for each group.


This method doesn't exist yet. at that location.

could you explain why? This should exist, if you use :py:func: instead of :py:meth: (you can use sphobjinv to find the right role)

I think it doesn't exist at the top level because we wanted to migrate over the code before allowing a direct import.

from xarray import open_datatree

Until all of the code was merged, we said it would be imported from xarray.core.datatree. From #8572 First comment "EDIT: We decided it should return an xarray.DataTree object, or even xarray.core.datatree.DataTree object. So we can start by just copying the basic version in datatree/io.py right now which just calls open_dataset many times."

So I had thought xarray.core.datatree.DataTree made more sense for now.

But the second part of your comment, makes me think I might not understand what you were asking.

I guess what I'm asking is: open_dataset definitely exists (and it should be possible to link to), so I suppose you meant to write xarray.open_datatree?

Suggested change

Additional keyword arguments passed to :py:meth:`~xarray.open_dataset` for each group.

Additional keyword arguments passed to :py:func:`~xarray.open_datatree` for each group.

Also, once that's fixed, does this cause the docs build to fail? If not I believe it would be fine to leave as-is.

On reflection, I think this should still be open_dataset. That is what is in the original docs. And this is describing open_datatree, where it's calling open_dataset with these **kwargs each time. So I think this commit should be reverted. As to :py:meth: vs :py:func: I don't know and will look at the difference unless you tell me before I figure it out.

xarray/backends/api.py

xarray/backends/common.py

xarray/backends/h5netcdf_.py

flamingbear · 2024-02-02T21:09:36Z

~~Apparently I misunderstood this for excluding mypy from datatree_?~~

Well I understood it, but it ~~does~~ did not help me when we import from there.

Edit: I was able to ignore type errors for the imported datatree_ modules 9f89256

TomNicholas

I'm not sure which tests to migrate/write with this change.

I think we can start by moving test_io.py to a new folder xarray/tests/datatree/test_io.py. This keeps a distinction between testing functionality we have collectively "okayed" and what we haven't got to yet. We can think about potentially integrating those tests into xarray/tests/test_backends.py when we actually integrate the internals of open_datatree with the backends internals.

I was able to open datatrees with each of the engines, netcdf4, h5netcdf and zarr.

Great! These all have tests right?

I haven't moved any documentation. I remember hearing that we could add it and mark it as experimental? Is that the correct way forward?

Yeah I'm not sure what to do for this. We could put it in the main docs but with a big experimental warning on it, but then we haven't got to the rest of the datatree functionality yet. I think I would favour moving the code first, then exposing the documentation after. Perhaps the datatree docs should be moved piecemeal to a dedicated section in the documentation (under "For developers/contributors" maybe), then moved into the main docs only once everything is done?

Also starts fixing simple mypy errors

flamingbear · 2024-02-03T23:13:30Z

I was able to open datatrees with each of the engines, netcdf4, h5netcdf and zarr.

Great! These all have tests right?

I was thinking no, but when I went to write them. I think these are the tests I would write. They make me slightly uncomfortable only in that they use the to_zarr, to_netcdf backends.

flamingbear · 2024-02-03T23:14:10Z

I think I would favour moving the code first, then exposing the documentation after

Future us problem, I like that. But it also makes sense to me

Add some typing for mygrated tests. Adds display_expand_groups to core options.

xarray/core/options.py

This is cargo-cult. I wonder if there's a different CI test that wanted these and since this is now excluded at the top level. I'm putting them back until migration into main codebase.

flamingbear · 2024-02-05T22:31:50Z

Marking as ready for review, but mostly to start comments coming.

pyproject.toml

xarray/datatree_/datatree/datatree.py

puts common parts in common.

keewis

Seems fine to me, at least for now: we definitely need to refactor this at a later point (for example, there should be a function that converts a given group to a dataset, which then can be called from both open_datatree and open_dataset). I'd probably have started by migrating the datatree code to xarray.core, though.

I realize that this PR was created on top of #8656, but before merging it might be good to remove the commits from that (this will save us some trouble down the line).

keewis · 2024-02-11T21:47:53Z

xarray/backends/api.py

+        Strings and Path objects are interpreted as a path to a netCDF file or Zarr store.
+    engine : str, optional
+        Xarray backend engine to use. Valid options include `{"netcdf4", "h5netcdf", "zarr"}`.
+    kwargs :


Suggested change

kwargs :

**kwargs

keewis · 2024-02-11T21:50:01Z

xarray/backends/api.py

+    engine : str, optional
+        Xarray backend engine to us. Valid options include `{"netcdf4", "h5netcdf", "zarr"}`.
+    kwargs :
+        Additional keyword arguments passed to :py:meth:`~xarray.open_dataset` for each group.


could you explain why? This should exist, if you use :py:func: instead of :py:meth: (you can use sphobjinv to find the right role)

xarray/backends/api.py

Co-authored-by: Justus Magin <keewis@users.noreply.github.com>

flamingbear · 2024-02-12T17:26:36Z

I realize that this PR was created on top of #8656, but before merging it might be good to remove the commits from that (this will save us some trouble down the line).

I didn't see that originally. Will they still cause problems when this is squashed merged? I assume you'd want me to get rid of the 5 commits before Jan 31st (when you merged #8688)? do you want to give me a git-wizard hint?

keewis · 2024-02-12T17:46:59Z

The only way that I know of is interactive rebasing, but that would mean that you'd have to remove all merge commits and maybe even resolve merge conflicts again. So if you checked that github doesn't show any weird things in this PR it maybe doesn't matter.

Co-authored-by: Justus Magin <keewis@users.noreply.github.com>

This reverts commit aab1744.

flamingbear · 2024-02-12T20:27:33Z

So if you checked that github doesn't show any weird things in this PR it maybe doesn't matter.

I don't see anything weird, no conflicts or anything, just those old commits with different lineage.

edit: I think it's fine if it's squash merged. If not, I did rebase this onto the current pydata/main and the resulting PR would be clean, but I don't actually want to push that on top of this.

TomNicholas · 2024-02-14T18:46:38Z

@flamingbear bear is there a comment thread for the mypy issue you raised earlier? I have no idea why mypy needs to be explicitly told about the types here, but mypy seems to pass CI on this branch... 🤷‍♂️

TomNicholas · 2024-02-14T18:47:42Z

Also @flamingbear this now has two approvals - is it ready to merge from your end?

flamingbear · 2024-02-14T18:50:13Z

but mypy seems to pass CI on this branch... 🤷‍♂️

I think that's because it's excluded in this PR. I've moved the that test module in the PR you linked to.

Also @flamingbear this now has two approvals - is it ready to merge from your end?

It is ready on my end. Merge away. 🙇

TomNicholas · 2024-02-14T19:00:37Z

Let's start a habit of adding a whats-new.rst entry (under "internal changes") for each of these PRs, then I will merge.

flamingbear · 2024-02-14T19:56:17Z

Let's start a habit of adding a whats-new.rst entry (under "internal changes") for each of these PRs, then I will merge.

Did some pattern matching because I'm not building the docs without error anymore locally.

flamingbear · 2024-02-14T21:16:28Z

@TomNicholas I was hoping to kick off the CI again by fixing the rst formating since the macos 3.11 had timed out and was hanging. But I also turned off automerge.

keewis · 2024-02-14T21:55:22Z

I'm kinda late with this, but one concern is that when merging the datatree repo we effectively removed xarray/datatree_ from packaging. With the code in xr.open_datatree we now use that code, but won't put it into build artifacts.

So my question is: should we remove that exclude and instead put the whole repo into build artifacts, or should we revert this and wait until we have done the series of PRs copying over the datatree library to its final place? To be clear, I'm mostly interested about the timing of things.

TomNicholas · 2024-02-14T21:58:04Z

If we are not advertising any of this to be imported by users until the move is complete then does the distinction matter?

keewis · 2024-02-14T22:03:20Z

if we're fine with it being in a broken state (because datatree_ would not be in the installed library), then no. We'd just have to make sure that the import of datatree_ is not at the top level, so open_dataset would still work. Looking at the code, it appears that the imports are either local or behind TYPE_CHECKING guards, so at least at runtime this would be fine. Now we'd only need to see whether this breaks type checking, but I'm not an expert on that.

TomNicholas · 2024-02-14T22:09:21Z

I personally think that's fine - we're likely going to have DataTree be in a broken state anyway whilst we have like treenode.py moved but not datatree.py moved or whatever. As long as it doesn't break advertised functionality for users by crapping out, or break code quality tools we use (like mypy) then I don't see a problem.

flamingbear added 8 commits January 29, 2024 15:33

Merge remote-tracking branch 'prepared-datatree/main' into mhs/import…

0266b63

…-datatree-attempt2

DAS-2060: Skips datatree_ CI

3899b06

Adds additional ignore to mypy Adds additional ignore to doctests Excludes xarray/datatree_ from all pre-commmit.ci

DAS-2070: Migrate open_datatree into xarray.

d5b80f9

First stab. Will need to add/move tests.

DAS-2060: replace relative import of datatree to library

0c62960

DAS-2060: revert the exporting of NodePath from datatree

a523d50

I mistakenly thought we wanted to use the hidden version of datatree_ and we do not.

Merge branch 'main' into mhs/DAS-2060/open_datatree

1e5e433

Don't expose open_datatree at top level

e687e4a

We do not want to expose open_datatree at top level until all of the code is migrated.

Point datatree imports to xarray.datatree_.datatree

4e05d5c

TomNicholas added the topic-DataTree Related to the implementation of a DataTree class label Feb 2, 2024

flamingbear commented Feb 2, 2024

View reviewed changes

TomNicholas added this to In progress in DataTree integration via automation Feb 2, 2024

flamingbear commented Feb 2, 2024

View reviewed changes

xarray/backends/api.py Outdated Show resolved Hide resolved

flamingbear commented Feb 2, 2024

View reviewed changes

xarray/backends/common.py Outdated Show resolved Hide resolved

flamingbear commented Feb 2, 2024

View reviewed changes

xarray/backends/h5netcdf_.py Outdated Show resolved Hide resolved

Updates function signatures for mypy.

77405d9

TomNicholas reviewed Feb 2, 2024

View reviewed changes

Move io tests, remove undefined reference to documentation.

81b425f

Also starts fixing simple mypy errors

flamingbear added 2 commits February 5, 2024 09:23

Pass bare-minimum tests.

3c5bcda

Update pyproject.toml to exclude imported datatree_ modules.

9f89256

Add some typing for mygrated tests. Adds display_expand_groups to core options.

flamingbear commented Feb 5, 2024

View reviewed changes

xarray/core/options.py Show resolved Hide resolved

Adding back type ignores

a4bad61

This is cargo-cult. I wonder if there's a different CI test that wanted these and since this is now excluded at the top level. I'm putting them back until migration into main codebase.

flamingbear force-pushed the mhs/open_datatree branch from ec91d63 to a4bad61 Compare February 5, 2024 19:55

flamingbear marked this pull request as ready for review February 5, 2024 22:31

TomNicholas reviewed Feb 5, 2024

View reviewed changes

pyproject.toml Show resolved Hide resolved

xarray/datatree_/datatree/datatree.py Show resolved Hide resolved

flamingbear added 2 commits February 6, 2024 08:46

Refactor open_datatree back together.

e4f0374

puts common parts in common.

Removes TODO comment

3b1224c

keewis approved these changes Feb 11, 2024

View reviewed changes

flamingbear and others added 4 commits February 12, 2024 09:18

Merge branch 'main' into mhs/open_datatree

6498acc

Call raised exception

4280d30

Add unpacking notation to kwargs

8c54465

Use final location for DataTree doc strings

afba7ba

Co-authored-by: Justus Magin <keewis@users.noreply.github.com>

flamingbear and others added 2 commits February 12, 2024 11:02

fix comment from open_dataset to open_datatree

aab1744

Co-authored-by: Justus Magin <keewis@users.noreply.github.com>

Revert "fix comment from open_dataset to open_datatree"

5b48973

This reverts commit aab1744.

flamingbear added 2 commits February 13, 2024 08:02

Change sphynx link from meth to func

c6bb18a

Merge branch 'main' into mhs/open_datatree

4d306c0

Update whats-new.rst

d386ed3

TomNicholas enabled auto-merge (squash) February 14, 2024 19:57

Fix what-new.rst formatting.

e291587

auto-merge was automatically disabled February 14, 2024 21:15
Head branch was pushed to by a user without write access

TomNicholas merged commit fffb03c into pydata:main Feb 14, 2024
29 checks passed

DataTree integration automation moved this from In progress to Done Feb 14, 2024

flamingbear deleted the mhs/open_datatree branch February 14, 2024 23:41

TomNicholas mentioned this pull request Apr 9, 2024

Track merging datatree into xarray #8572

Open

27 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add open_datatree to xarray #8697

add open_datatree to xarray #8697

flamingbear commented Feb 2, 2024 •

edited

flamingbear Feb 2, 2024

keewis Feb 11, 2024 •

edited

flamingbear Feb 12, 2024

flamingbear Feb 12, 2024

keewis Feb 12, 2024 •

edited

flamingbear Feb 12, 2024 •

edited

flamingbear commented Feb 2, 2024 •

edited

TomNicholas left a comment

flamingbear commented Feb 3, 2024

flamingbear commented Feb 3, 2024

flamingbear commented Feb 5, 2024

keewis left a comment

keewis Feb 11, 2024

flamingbear Feb 12, 2024

keewis Feb 11, 2024 •

edited

flamingbear commented Feb 12, 2024

keewis commented Feb 12, 2024

flamingbear commented Feb 12, 2024 •

edited

TomNicholas commented Feb 14, 2024

TomNicholas commented Feb 14, 2024

flamingbear commented Feb 14, 2024

TomNicholas commented Feb 14, 2024

flamingbear commented Feb 14, 2024

flamingbear commented Feb 14, 2024

keewis commented Feb 14, 2024 •

edited

TomNicholas commented Feb 14, 2024

keewis commented Feb 14, 2024 •

edited

TomNicholas commented Feb 14, 2024

	Additional keyword arguments passed to :py:meth:`~xarray.open_dataset` for each group.
	Additional keyword arguments passed to :py:func:`~xarray.open_datatree` for each group.

add open_datatree to xarray #8697

add open_datatree to xarray #8697

Conversation

flamingbear commented Feb 2, 2024 • edited

flamingbear Feb 2, 2024

Choose a reason for hiding this comment

keewis Feb 11, 2024 • edited

Choose a reason for hiding this comment

flamingbear Feb 12, 2024

Choose a reason for hiding this comment

flamingbear Feb 12, 2024

Choose a reason for hiding this comment

keewis Feb 12, 2024 • edited

Choose a reason for hiding this comment

flamingbear Feb 12, 2024 • edited

Choose a reason for hiding this comment

flamingbear commented Feb 2, 2024 • edited

TomNicholas left a comment

Choose a reason for hiding this comment

flamingbear commented Feb 3, 2024

flamingbear commented Feb 3, 2024

flamingbear commented Feb 5, 2024

keewis left a comment

Choose a reason for hiding this comment

keewis Feb 11, 2024

Choose a reason for hiding this comment

flamingbear Feb 12, 2024

Choose a reason for hiding this comment

keewis Feb 11, 2024 • edited

Choose a reason for hiding this comment

flamingbear commented Feb 12, 2024

keewis commented Feb 12, 2024

flamingbear commented Feb 12, 2024 • edited

TomNicholas commented Feb 14, 2024

TomNicholas commented Feb 14, 2024

flamingbear commented Feb 14, 2024

TomNicholas commented Feb 14, 2024

flamingbear commented Feb 14, 2024

flamingbear commented Feb 14, 2024

keewis commented Feb 14, 2024 • edited

TomNicholas commented Feb 14, 2024

keewis commented Feb 14, 2024 • edited

TomNicholas commented Feb 14, 2024

flamingbear commented Feb 2, 2024 •

edited

keewis Feb 11, 2024 •

edited

keewis Feb 12, 2024 •

edited

flamingbear Feb 12, 2024 •

edited

flamingbear commented Feb 2, 2024 •

edited

keewis Feb 11, 2024 •

edited

flamingbear commented Feb 12, 2024 •

edited

keewis commented Feb 14, 2024 •

edited

keewis commented Feb 14, 2024 •

edited