Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to coerce all objects to Numpy arrays. #1393

Merged
merged 8 commits into from Jan 17, 2019

Conversation

malmaud
Copy link
Contributor

@malmaud malmaud commented Jan 11, 2019

This lets you pass a broad array of list-like objects as data to be plotted instead of the current hard-coded list of numpy arrays, lists, tuples, and Pandas objects. Objects such as PyTorch tensors (widely used in the machine-learning world), xarrays, and others know how to convert themselves into Numpy arrays and so might as well be allowed to be passed directly as plotting arguments so users don't have to explicitly convert them.

@jonmmease
Copy link
Contributor

Hi @malmaud ,

Thanks for the PR! Here are a couple of thoughts

  1. In addition to the DataArrayValidator (which handles properties like scatter.x), several of the other validator types also accept arrays (e.g. the ColorValidator can sometimes accept numeric arrays for properties like scatter.marker.color). So to be consistent we should try to handle all array cases if we can.
  2. To support (1), I'd like to work out a way to do the coercion to numpy arrays inside the copy_to_readonly_numpy_array function, this way we can reuse the kind, force_numeric, and 'WRITEABLE' logic.
  3. In order to accomplish (1) and (2), while preserving our current error messages, I think we would need to find a way to have is_homogeneous_array determine whether it's appropriate to convert non-numpy non-pandas objects into numpy arrays.

I haven't looked into this too much yet, but do PyTorch tensors and xarray DataArrays support the numpy array interface? (https://docs.scipy.org/doc/numpy-1.15.1/reference/arrays.interface.html). If so, we might be able to handle this usecase across all array types by adding a simple check to is_homogeneous_array to check for the presence of an __array_interface__ method.

@malmaud
Copy link
Contributor Author

malmaud commented Jan 12, 2019

Thanks for the feedback! I'll implement 1) and 2).

  1. is a bit subtle - there are a few different ways that numpy.array tries to convert an object to an array - it could, as you said, use the array interface. It will fall back to just seeing if the object has an __array__ method that directly returns a numpy array. Or it can also use the Python buffer interface. Or an earlier version of the array interface. I wouldn't want to re-implement that entire logic. PyTorch tensors, for instance, have an __array__ method but don't support the array interface.

I think just checking for __array_interface__ or __array__ methods on objects is probably sufficient for 99% of cases though - the coercion could always be made more general later if we see objects in the wild that this heuristic isn't sufficient for.

@jonmmease
Copy link
Contributor

I think just checking for __array_interface__ or __array__ methods on objects is probably sufficient for 99% of cases though - the coercion could always be made more general later if we see objects in the wild that this heuristic isn't sufficient for.

👍

@malmaud
Copy link
Contributor Author

malmaud commented Jan 14, 2019

OK, I made the changes.

@jonmmease
Copy link
Contributor

Thanks @malmaud,

In terms of testing, it looks like something went wrong in plotly.tests.test_optional.test_figure_factory.test_figure_factory.TestTrisurf. Could you have a look?

======================================================================
ERROR: test_trisurf_all_args (plotly.tests.test_optional.test_figure_factory.test_figure_factory.TestTrisurf)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/circleci/project/plotly/tests/test_optional/test_figure_factory/test_figure_factory.py", line 889, in test_trisurf_all_args
    x, y, z, simplices
  File "/home/circleci/project/plotly/figure_factory/_trisurf.py", line 466, in create_trisurf
    edges_color=edges_color, plot_edges=plot_edges)
  File "/home/circleci/project/plotly/figure_factory/_trisurf.py", line 165, in trisurf
    showlegend=False
  File "/home/circleci/project/plotly/graph_objs/_scatter3d.py", line 1939, in __init__
    self['marker'] = marker if marker is not None else _v
  File "/home/circleci/project/plotly/basedatatypes.py", line 2820, in __setitem__
    self._set_compound_prop(prop, value)
  File "/home/circleci/project/plotly/basedatatypes.py", line 3129, in _set_compound_prop
    val = validator.validate_coerce(val, skip_invalid=self._skip_invalid)
  File "/home/circleci/project/_plotly_utils/basevalidators.py", line 2117, in validate_coerce
    v = self.data_class(v, skip_invalid=skip_invalid)
  File "/home/circleci/project/plotly/graph_objs/scatter3d/_marker.py", line 1064, in __init__
    self['color'] = color if color is not None else _v
  File "/home/circleci/project/plotly/basedatatypes.py", line 2829, in __setitem__
    self._set_prop(prop, value)
  File "/home/circleci/project/plotly/basedatatypes.py", line 3065, in _set_prop
    val = validator.validate_coerce(val)
  File "/home/circleci/project/_plotly_utils/basevalidators.py", line 1140, in validate_coerce
    invalid_els = self.find_invalid_els(v, validated_v)
  File "/home/circleci/project/_plotly_utils/basevalidators.py", line 1169, in find_invalid_els
    self.find_invalid_els(orig_el, validated_el, invalid_els)
  File "/home/circleci/project/_plotly_utils/basevalidators.py", line 1167, in find_invalid_els
    for orig_el, validated_el in zip(orig, validated):
TypeError: zip argument #1 must support iteration

Before merging this, we'll also need some tests that exercise the new __array_interface__ or __array__ logic. I think the easiest thing here might be to create a test file that mirrors _plotly_utils/tests/validators/test_pandas_series_input.py but for a library that plotly.py doesn't know about like xarray. To do this, we'd need to add xarray to

  • optional-requirements.txt
  • As an optional: dependency in tox.ini
  • To the conda create command in .circleci/create_conda_optional_env.sh

Are you interested in adding these tests? If not, we can just leave the PR open until I have a chance to go through and add them (which would hopefully be by the end of the month).

@malmaud
Copy link
Contributor Author

malmaud commented Jan 15, 2019

Alright, I'll take a look and add the tests.

@malmaud
Copy link
Contributor Author

malmaud commented Jan 16, 2019

Tests have been added. I had to make some other changes to make the tests pass, so apparently that was a worthwhile exercise.

@@ -43,12 +43,16 @@ def fullmatch(regex, string, flags=0):
# Utility functions
# -----------------
def to_scalar_or_list(v):
if np and np.isscalar(v) and hasattr(v, 'item'):
return np.asscalar(v)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand this branch. What kinds of objects does this handle?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a comment to the code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

tox.ini Outdated
@@ -72,7 +72,7 @@ deps=
optional: pyshp==1.2.10
optional: pillow==5.2.0
optional: matplotlib==2.2.3
optional: xarray==0.11.2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for future reference, what kind of problem did this cause?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the tests reported that 0.10.9 was the highest version of xarray available. I guess xarray added some upper bound to its dependencies in the 0.11 series that rendered it incompatible with the environment of the test - I didn't really investigate too closely.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the info. I figured it was a transitive dependency issue, just wanted to double check.

@jonmmease
Copy link
Contributor

This is great work @malmaud, thanks for taking this on and especially for adding the tests! I had two inline questions, but everything else lgtm.

@jonmmease jonmmease merged commit bc4fb46 into plotly:master Jan 17, 2019
@jonmmease
Copy link
Contributor

Merged! Thanks again for the solid PR.

@jonmmease jonmmease added this to the v3.6.0 milestone Jan 26, 2019
@malmaud malmaud deleted the array_convert branch January 31, 2019 18:21
@yurzo
Copy link

yurzo commented Feb 6, 2019

Please see #1427

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants