Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ds.to_dict with data as arrays, not lists #7739

Merged
merged 19 commits into from
Apr 28, 2023

Conversation

jmccreight
Copy link
Contributor

@jmccreight jmccreight commented Apr 7, 2023

  • Closes DataArray to_dict() without converting with numpy tolist() #1599
  • Tests added
    • Added @pytest.mark.parametrize("numpy_data", [True, False]) to existing and edited existing
  • User visible changes (including notable bug fixes) are documented in whats-new.rst
  • New functions/methods are listed in api.rst
    • Extended documentation for ds.to_dict and da.to_dict

@jmccreight
Copy link
Contributor Author

I would appreciate any edification on the Mypy failures. Looking at the indicated lines, i'm 🤷 .

@jmccreight
Copy link
Contributor Author

I solved the mypy errors in a highly dubious way. 👀

@jmccreight
Copy link
Contributor Author

In the off-hand chance this is reviewed before I push again, do not merge. I have a fix to encodings not getting properly roundtripped in Ds.from_dict(ds.to_dict). it was minor to fix but making sure it's tested will take a min

xarray/core/variable.py Outdated Show resolved Hide resolved
xarray/core/variable.py Outdated Show resolved Hide resolved
@jmccreight
Copy link
Contributor Author

@dcherian thanks! I didnt incoroprate any suggestions yet.
regarding the inequality of encodings of datasets is obscured by assert_identical(a, b) not evaluating encodings. it seems like it should have an option to also check encodings (or not).

@jmccreight
Copy link
Contributor Author

i kinda implied, but I'll just state that the extra code to test equality of encodings is not handsome.

@jmccreight
Copy link
Contributor Author

I'm happy to "fix" the mypy issues, but it's on that I suspect might be requested for changes (if I recall correctly, it's just in the tests)

@dcherian
Copy link
Contributor

Copying my comment from #1599 (comment)

Perhaps we should have array_to_list: bool instead. If False, we just preserve the underlying array type.
Then the user could do ds.as_numpy().to_dict(array_to_list=False) to always get numpy arrays as #7739

array_data or data_as_array could be other options

@dcherian dcherian changed the title first stab at ds.to_dict giving data as numpy objects ds.to_dict with data as arrays, not lists Apr 18, 2023
Copy link
Collaborator

@headtr1ck headtr1ck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good already.
Just a couple of minor comments.

And a design question/suggestion: what about instead of adding another kwarg, you could use data = True / False / "numpy"?

xarray/tests/test_dataarray.py Outdated Show resolved Hide resolved
xarray/tests/test_dataset.py Outdated Show resolved Hide resolved
xarray/tests/test_dataset.py Outdated Show resolved Hide resolved
xarray/core/variable.py Outdated Show resolved Hide resolved
xarray/core/dataarray.py Outdated Show resolved Hide resolved
@jmccreight
Copy link
Contributor Author

Making all the requested changes, the above should resolve momentarily.

I like this "trick"/suggestion:

And a design question/suggestion: what about instead of adding another kwarg, you could use data = True / False / "numpy"?

I will implement this if we are in agreement with @dcherian

@dcherian
Copy link
Contributor

what about instead of adding another kwarg, you could use data = True / False / "numpy"?

Oh yeah, I like this. Only suggestion is data = True / False / "array" / "list" where True and "list" are synonymous.

…of Python datatypes, "array" returns numpy.ndarrays, False returns only the schema
@jmccreight
Copy link
Contributor Author

I followed data = True / False / "array" / "list"

xarray/core/dataarray.py Outdated Show resolved Hide resolved
xarray/tests/test_dataset.py Outdated Show resolved Hide resolved
xarray/tests/test_dataset.py Outdated Show resolved Hide resolved
doc/whats-new.rst Outdated Show resolved Hide resolved
xarray/core/dataset.py Outdated Show resolved Hide resolved
xarray/core/dataset.py Outdated Show resolved Hide resolved
doc/whats-new.rst Outdated Show resolved Hide resolved
xarray/core/variable.py Outdated Show resolved Hide resolved
xarray/core/variable.py Outdated Show resolved Hide resolved
xarray/core/variable.py Show resolved Hide resolved
xarray/tests/test_dataarray.py Outdated Show resolved Hide resolved
xarray/tests/test_dataarray.py Show resolved Hide resolved
@dcherian
Copy link
Contributor

Thanks @jmccreight great work!

@dcherian dcherian merged commit 087ebbb into pydata:main Apr 28, 2023
dcherian added a commit to dcherian/xarray that referenced this pull request May 6, 2023
* main:
  Introduce Grouper objects internally (pydata#7561)
  [skip-ci] Add cftime groupby, resample benchmarks (pydata#7795)
  Fix groupby binary ops when grouped array is subset relative to other (pydata#7798)
  adjust the deprecation policy for python (pydata#7793)
  [pre-commit.ci] pre-commit autoupdate (pydata#7803)
  Allow the label run-upstream to run upstream CI (pydata#7787)
  Update asv links in contributing guide (pydata#7801)
  Implement DataArray.to_dask_dataframe() (pydata#7635)
  `ds.to_dict` with data as arrays, not lists (pydata#7739)
  Add lshift and rshift operators (pydata#7741)
  Use canonical name for set_horizonalalignment over alias set_ha (pydata#7786)
  Remove pandas<2 pin (pydata#7785)
  [pre-commit.ci] pre-commit autoupdate (pydata#7783)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DataArray to_dict() without converting with numpy tolist()
4 participants