Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xarray container type #54

Closed
mmccarty opened this issue Mar 20, 2018 · 10 comments
Closed

xarray container type #54

mmccarty opened this issue Mar 20, 2018 · 10 comments
Assignees

Comments

@mmccarty
Copy link
Member

mmccarty commented Mar 20, 2018

Issue 43 lists candidate additions to the container types supported by intake. At least one project will need xarray in the near term. This issue tracks the work to add an xarray container type.

@martindurant
Copy link
Member

+1 for adding, and, as you said in other places, +7 for keeping the set of containers flexible and the number of places in the code checking specific containers to a minimum.

@seibert
Copy link
Collaborator

seibert commented Mar 20, 2018

yeah, for adding a new container, the main decision points are:

  • How to merge partitions together
  • How to describe its schema
  • How to serialize it over the network
  • What is the corresponding Dask data structure? (For xarray, I'm not sure what to_dask() should return since it kind of inverts the expectations)

@mmccarty
Copy link
Member Author

Thanks for the info @seibert ! As for Dask data structures, xarray supports the Dask interface, therefore to_dask should be a no op. @mrocklin, is that correct?

@martindurant
Copy link
Member

@mmccarty , I think Stan's point was the opposite - what should we do in the default non-dask case, exactly? I think it's OK to use dask internally anyway (i.e., chunking, in this context), but that's an opinion, not authoritative.

@seibert
Copy link
Collaborator

seibert commented Mar 20, 2018

Well, it is both directions. For the other containers, there is a clear distinction between in-memory and out-of-core/distributed data structure:

container in memory out of core
dataframe pandas.DataFrame dask.dataframe
ndarray numpy.ndarray dask.array
python list dask.bag
xarray ? ?

@martindurant
Copy link
Member

container in memory ooc
xarray xarray xarray (with chunks)

:)

@mmccarty
Copy link
Member Author

^ yes, that's what I was getting at.

@seibert
Copy link
Collaborator

seibert commented Mar 20, 2018

Ok, that makes sense.

@martindurant
Copy link
Member

@martindurant
Copy link
Member

Following intake/intake-xarray#1 (comment) , I regard this as solved.

  • We allow for several named plugins, and minimize the assumptions in the code: plugins should in general define to_dask rather than rely on intake.catalog.dask_util
  • either provide a register classes to container.container_map or otherwise provide for a serialise-deserialise mechanism. Note that pickle is fine for this, as is currently done for the zero-length dataframes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants