Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add from_pandas and from_xarray #2049

Closed
SimonHeybrock opened this issue Jul 12, 2021 · 1 comment · Fixed by #2054
Closed

Add from_pandas and from_xarray #2049

SimonHeybrock opened this issue Jul 12, 2021 · 1 comment · Fixed by #2054
Assignees
Labels
enhancement New feature or request

Comments

@SimonHeybrock
Copy link
Member

We have previously added from_dict and to_dict, but it has proven to be of limited use, since it is too specific to scipp's data schemas. For better interaction with other common libraries in the Python ecosystem, we should add from_pandas and from_xarray.

to_pandas and to_xarray may be added as a second step, but they are more tricky since there may be scipp features that cannot be represented in those libraries. So let us start with:

from_xarray:

  • Support DataArray and Dataset
  • data maps to values
  • can we handle units? xarray does not support specifics, so unless we want to rely on xarray attributes with specific names we have the set them as dimensionless in scipp
  • Note that xarray attributes have no direct equivalent in scipp. Attributes of xarray data arrays or dataset may be stored as scalar variables in scipp attributes.
  • xarray distinguishes coord with and without index. Coords with index should map to scipp coords, coords without index to attributes.

from_pandas:

  • Not too familiar with this, need to check if there is a way to identify columns as coords

Overall, it is probably best to start simple. Some of the details above (such as attr or unit handling) can be done later, no need to get everything working in the first pull-request.

See also:

@SimonHeybrock SimonHeybrock added this to Selected in Development Board via automation Jul 12, 2021
@SimonHeybrock SimonHeybrock added the enhancement New feature or request label Jul 12, 2021
@SimonHeybrock
Copy link
Member Author

SimonHeybrock commented Jul 12, 2021

Instead of implementing from_pandas, we might choose to simply use from_xarray(xr.Dataset.from_dataframe). Downside is that the user would need to install xarray, but if it avoids a lot of code it may be the best choice for now.

Development Board automation moved this from Selected to Done Jul 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

2 participants