Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plt.hist() fails with TensorFlow Numpy emulation #19574

Open
FlorinAndrei opened this issue Feb 24, 2021 · 28 comments
Open

plt.hist() fails with TensorFlow Numpy emulation #19574

FlorinAndrei opened this issue Feb 24, 2021 · 28 comments
Labels
keep Items to be ignored by the “Stale” Github Action topic: units and array ducktypes

Comments

@FlorinAndrei
Copy link

Bug report

Bug summary

Generating np.random.randn(1000) values, visualizing them with plt.hist(). Works fine with Numpy.

When I replace Numpy with tensorflow.experimental.numpy, Matplotlib 3.3.4 fails to display the histogram correctly. Matplotlib 3.2.2 works fine.

Code for reproduction

import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
import tensorflow.experimental.numpy as tnp

# bad image
labels1 = 15 + 2 * tnp.random.randn(1000)
_ = plt.hist(labels1)

# good image
labels2 = 15 + 2 * np.random.randn(1000)
_ = plt.hist(labels2)

Actual outcome

np-bad

Expected outcome

np-good

Matplotlib version

  • Operating system: Windows 10
  • Matplotlib version (import matplotlib; print(matplotlib.__version__)): 3.3.4
  • Matplotlib backend (print(matplotlib.get_backend())): module://ipykernel.pylab.backend_inline
  • Python version: 3.8.7
  • Jupyter version (if applicable): see below
  • Other libraries: see below

TensorFlow 2.4.1

jupyter --version
jupyter core     : 4.7.0
jupyter-notebook : 6.1.6
qtconsole        : 5.0.1
ipython          : 7.20.0
ipykernel        : 5.4.2
jupyter client   : 6.1.7
jupyter lab      : not installed
nbconvert        : 6.0.7
ipywidgets       : 7.6.3
nbformat         : 5.0.8
traitlets        : 5.0.5

Python installed from python.org as an exe installer. Everything else is pip install --user

Bug opened with TensorFlow on this same issue:

tensorflow/tensorflow#46274

@jklymak
Copy link
Member

jklymak commented Feb 25, 2021

I think we are happy to help, but we make no guarantee to support all numpy-like objects that exist in the python universe. If you want help with this I suggest figuring out how to reproduce w/o tensorflow so we can advise.

@jklymak jklymak added status: downstream fix required status: needs clarification Issues that need more information to resolve. labels Feb 25, 2021
@tacaswell
Copy link
Member

This looks like a failure of the up-cast to 2D logic (and instead of 1 histogram of N elements we are seeing N histograms of 1 element).

@tacaswell
Copy link
Member

Specifically this code:

def _reshape_2D(X, name):
"""
Use Fortran ordering to convert ndarrays and lists of iterables to lists of
1D arrays.
Lists of iterables are converted by applying `numpy.asanyarray` to each of
their elements. 1D ndarrays are returned in a singleton list containing
them. 2D ndarrays are converted to the list of their *columns*.
*name* is used to generate the error message for invalid inputs.
"""
# unpack if we have a values or to_numpy method.
try:
X = X.to_numpy()
except AttributeError:
try:
if isinstance(X.values, np.ndarray):
X = X.values
except AttributeError:
pass
# Iterate over columns for ndarrays.
if isinstance(X, np.ndarray):
X = X.T
if len(X) == 0:
return [[]]
elif X.ndim == 1 and np.ndim(X[0]) == 0:
# 1D array of scalars: directly return it.
return [X]
elif X.ndim in [1, 2]:
# 2D array, or 1D array of iterables: flatten them first.
return [np.reshape(x, -1) for x in X]
else:
raise ValueError(f'{name} must have 2 or fewer dimensions')
# Iterate over list of iterables.
if len(X) == 0:
return [[]]
result = []
is_1d = True
for xi in X:
# check if this is iterable, except for strings which we
# treat as singletons.
if (isinstance(xi, collections.abc.Iterable) and
not isinstance(xi, str)):
is_1d = False
xi = np.asanyarray(xi)
nd = np.ndim(xi)
if nd > 1:
raise ValueError(f'{name} must have 2 or fewer dimensions')
result.append(xi.reshape(-1))
if is_1d:
# 1D array of scalars: directly return it.
return [np.reshape(result, -1)]
else:
# 2D array, or 1D array of iterables: use flattened version.
return result

which recently got re-worked due to changes in how numpy auto-up-casts object / ragged arrays.

@FlorinAndrei
Copy link
Author

I suggest figuring out how to reproduce w/o tensorflow so we can advise

The Tensorflow failure is the only case I know, and it's how I ended up reporting this bug. I was just trying to learn the Numpy emulation in Tensorflow.

@mwaskom
Copy link

mwaskom commented Feb 25, 2021

I think the two problems are:

  1. Matplotlib preprocesses instances of np.ndarray so they are always 2D, with vectors gaining an outer dimension. The tensorflow object doesn't get this preprocessing. (Matplotlib should probably be using __array__ here?)

  2. The introspection of the elements in X to test for iterableness (which lets a normal sequence of numbers work as expected) fails because the elements of the tensorflow object are 0D but an instance of Iterable:

from collections.abc import Iterable
print(isinstance(labels1[0], Iterable))
print(isinstance(labels2[0], Iterable))

@tacaswell tacaswell added this to the v3.4.1 milestone Feb 25, 2021
@tacaswell tacaswell added Good first issue Open a pull request against these issues if there are no active ones! and removed status: downstream fix required status: needs clarification Issues that need more information to resolve. labels Feb 25, 2021
@tacaswell
Copy link
Member

It is reasonable to expect that we will ducktype TF arrays correctly here. For now I think there is a user-space work around to cant to numpy before passing to us. That said, none of the core developers are regular TF users so any help with duck-typing them correctly would be greatly appreciated.

Internally we can not directly use np.asarray before handling units because that will break things like pint, but checking for __array__ may work better?

Labeled this as good first issue because I think all of the work that needs to be done is in _reshape_2D and requires little understanding of the rest of Matplotlib internals.

@jklymak
Copy link
Member

jklymak commented Feb 25, 2021

I think it is desirable to ducktype what we can, but we need some rules for what that means. Random object x used to work, but now it doesn't is not a good argument that random x should ever have worked. It would be preferable to try and stipulate what properties x needs to have for successful duck typing.

@mwaskom
Copy link

mwaskom commented Feb 25, 2021

Isn't that what obj.__array__ is for?

@tacaswell
Copy link
Member

Long term we can rely on the behavior defined by https://data-apis.org/blog/array_api_standard_release/ but for now I suspect we are going to have to keep playing whack-a-mole with these.

@jklymak
Copy link
Member

jklymak commented Feb 25, 2021

As discussed on the dev call: We are happy to take a patch that improves the duck typing here, and provides a mock class that behaves like the object so that we can run tests. We cannot take downstream libraries as a test dependency (except for pandas) so someone will have to go through the effort of making a mock.

In user space, please convert to a numpy array or list of lists or list of numpy arrays. Nothing else is documented or guaranteed to work. While we don't try to break things, passing other objects are not guaranteed to be stable.

Downstream libraries should be able to avoid these problems by providing a to_numpy method. Its possible we could check for __array__ as well, but it would be nice to have a test object to work with so we can test that. (In this case its not clear that will work if array elements are all going to be Iterable.)

I'll close because we don't have an immediate action, but again we very much welcome feedback either as a new self-contained issue or PR. Re-opening at @tacaswell request - we will definitely still take a PR for this.

@jklymak jklymak closed this as completed Feb 25, 2021
@jklymak jklymak reopened this Feb 25, 2021
@mwaskom
Copy link

mwaskom commented Feb 25, 2021

(In this case its not clear that will work if array elements are all going to be Iterable.)

If _reshape_2D duck-typed ndarrays, this wouldn't be an issue because the tensorflow object would be 2D when you get to that check. But it's an added wrinkle that explains why it doesn't work as you might expect from the fact that the tensorflow array looks like a vector.

@QuLogic QuLogic modified the milestones: v3.4.1, v3.4.2 Mar 30, 2021
@tacaswell tacaswell modified the milestones: v3.4.2, v3.5.0 May 5, 2021
@tacaswell
Copy link
Member

Pushed to 3.5. I think the action items here are

  1. we need to document what we expect to work or not so that in the future we can have a clear line of what we
  2. we would likely take a patch to cbook that handles the reality of what TensorFlow is doing because we do not have 1 yet.

@jklymak
Copy link
Member

jklymak commented May 5, 2021

Is this really a good first issue?

@timhoffm
Copy link
Member

timhoffm commented May 5, 2021

Probably 2. can be done as a first issue.

For 1. I don't think we even know what we expect.

@tacaswell
Copy link
Member

I think part 2 is a good first issue to Matplotlib for someone who is fluent in tensorflow and array conventions.

@sriakshata
Copy link

Hello, I need clarity on this issue more correctly I couldn't get it . The problem is only with the matplotlib 3.3.4 version, as with the latest version 3.4.3 version it shows correct output on google colab. I am new here and I attempting to first time. Please help me understand.
Screenshot (314)

@tacaswell
Copy link
Member

Are you using the same version of tensorflow with both mpl3.4.3 and mpl3.3.4?

If not, it could be that something changed on the TF side to match our expectation of what an "array" is better.

If so, then maybe we accidentally fixed this and it can be closed!

@sriakshata
Copy link

I have used mpl 3.4.3 with tensorflow 2. I haven't worked with mpl 3.3.4. It works fine with mpl 3.4.3 with tensorflow 2.

@mwaskom
Copy link

mwaskom commented Aug 20, 2021

Do you get the same results as in this comment?

from collections.abc import Iterable
print(isinstance(labels1[0], Iterable))
print(isinstance(labels2[0], Iterable))

@sriakshata
Copy link

@mwaskom I am not sure about the idea mentioned in that comment. I just tried getting the plot with the given code and it worked fine.

@mwaskom
Copy link

mwaskom commented Aug 20, 2021

Can you try running that code in the environment where the plot works?

@sriakshata
Copy link

I have tried running it. It works fine with mpl 3.4.3 and tensorflow 2

@mwaskom
Copy link

mwaskom commented Aug 21, 2021

Yes but what does it print?

@sriakshata
Copy link

See above comment https://github.com/matplotlib/matplotlib/issues/19574#issuecomment-902075023. I have attached the output I got.

Copy link

This issue has been marked "inactive" because it has been 365 days since the last comment. If this issue is still present in recent Matplotlib releases, or the feature request is still wanted, please leave a comment and this label will be removed. If there are no updates in another 30 days, this issue will be automatically closed, but you are free to re-open or create a new issue if needed. We value issue reports, and this procedure is meant to help us resurface and prioritize issues that have not been addressed yet, not make them disappear. Thanks for your help!

@github-actions github-actions bot added the status: inactive Marked by the “Stale” Github Action label Nov 20, 2023
@mwaskom
Copy link

mwaskom commented Nov 20, 2023

I don't know whether this is still an issue (I think so) but something is a little whacky about this bot: it has been a lot more than 365 days since the last comment here.

@rcomer
Copy link
Member

rcomer commented Nov 20, 2023

I guess if we wanted to be strict, the bot message could say "it has been over 365 days since the last update". It's basically just working through the issue backlog and checking update timestamps.

@jklymak
Copy link
Member

jklymak commented Nov 20, 2023

To clarify further, we set a limit of how many issues get checked and how often so that they don't all get closed at once with no chance for comment.

@jklymak jklymak added keep Items to be ignored by the “Stale” Github Action and removed status: inactive Marked by the “Stale” Github Action Good first issue Open a pull request against these issues if there are no active ones! status: downstream fix required labels Nov 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
keep Items to be ignored by the “Stale” Github Action topic: units and array ducktypes
Projects
None yet
Development

No branches or pull requests

8 participants