Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Next iteration on xarray support for intake #2

Closed
mmccarty opened this issue Apr 23, 2018 · 6 comments
Closed

Next iteration on xarray support for intake #2

mmccarty opened this issue Apr 23, 2018 · 6 comments

Comments

@mmccarty
Copy link
Member

From @martindurant on first PR:

Notes on our conversation about the future of this PR/repo (cc @jbcrail , but note that @mmccarty will presumably return to work on this when he has time). I believe the following is a reasonable course of action.

  • a new container type should be acceptable to Intake, "xarray". The builtin functionality in the source class will be overridden.
  • We must consider carefully what this means for an xarray opened on an Intake server - presumably communication will be the same as an ndarray, which doesn't yet exist (right?). Note that being able to load netCDF/HDF from a remote location would be a huge boon, and there are servers around doing only that job, because it is so useful - can we make it happen? We would need to create each variable as a dask-array, where any chunk calls the server with its multi-dimensional index, and create a local xarray that stores these dask-arrays in the same arrangement and with the same metadata as remotely.
  • The natural representation of an open xarray is the open xarray object itself, and that is what discover() should return. Also, the arrays should be chunked from the start, so to_dask() is a no-op on that, and read() should call whatever xarray function it is to materialise the data into memory.
  • This repo should be renamed intake-xarray, and include three separate plugins: netCDF which opens one or more files (these are separate functions in xarray); rasterIO and zarr. The latter is the only one that can actually directly open files remotely, and we should take care to parse s3:, hdfs:, and gcs: and create the mappers that zarr needs (I'll help with that). It would be nice is an unstructured zarr array returns an xarray data-array as opposed to a dataset, although maybe that should be a separate plugin. Note again, that we have no array readers at all, not even numpy (never mind scientific formats)
@mmccarty
Copy link
Member Author

  • Rename repo - Done!

@martindurant
Copy link
Member

:)

@mmccarty
Copy link
Member Author

mmccarty commented May 8, 2018

@martindurant is this issue resolved with the last 2 PRs #7 and #6

@martindurant
Copy link
Member

Point 2) is outstanding, but maybe that should be a general issue on Intake main for all ndarrays. I have an idea of how to handle the server, but it's not quite simple.

@martindurant
Copy link
Member

#9 is pointed at this, but will need changes in Intake too.

@martindurant
Copy link
Member

Everything here was done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants