Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: possible regression with pandas 1.4 with plt.plot when using a single column dataframe as the x argument #22330

Closed
lesteve opened this issue Jan 27, 2022 · 3 comments · Fixed by #22141

Comments

@lesteve
Copy link
Contributor

lesteve commented Jan 27, 2022

Bug summary

InvalidIndexError is raised with pandas 1.4 when using a single column dataframe as x argument in plt.plot. To be honest this may not be a fully legitimate use of plt.plot ...

Code for reproduction

import matplotlib.pyplot as plt
import pandas as pd

df = pd.DataFrame({'col': [1, 2, 3]})

plt.plot(df, [1, 2, 3], 'o')

Actual outcome

InvalidIndexError: (slice(None, None, None), None)
Full stack-trace
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
File ~/miniconda3/envs/del/lib/python3.10/site-packages/pandas/core/indexes/base.py:3621, in Index.get_loc(self, key, method, tolerance)
   3620 try:
-> 3621     return self._engine.get_loc(casted_key)
   3622 except KeyError as err:

File ~/miniconda3/envs/del/lib/python3.10/site-packages/pandas/_libs/index.pyx:136, in pandas._libs.index.IndexEngine.get_loc()

File ~/miniconda3/envs/del/lib/python3.10/site-packages/pandas/_libs/index.pyx:142, in pandas._libs.index.IndexEngine.get_loc()

TypeError: '(slice(None, None, None), None)' is an invalid key

During handling of the above exception, another exception occurred:

InvalidIndexError                         Traceback (most recent call last)
Input In [3], in <module>
      2 import pandas as pd
      4 df = pd.DataFrame({"col": [1, 2, 3]})
----> 6 plt.plot(df, [1, 2, 3], "o")

File ~/miniconda3/envs/del/lib/python3.10/site-packages/matplotlib/pyplot.py:2757, in plot(scalex, scaley, data, *args, **kwargs)
   2755 @_copy_docstring_and_deprecators(Axes.plot)
   2756 def plot(*args, scalex=True, scaley=True, data=None, **kwargs):
-> 2757     return gca().plot(
   2758         *args, scalex=scalex, scaley=scaley,
   2759         **({"data": data} if data is not None else {}), **kwargs)

File ~/miniconda3/envs/del/lib/python3.10/site-packages/matplotlib/axes/_axes.py:1632, in Axes.plot(self, scalex, scaley, data, *args, **kwargs)
   1390 """
   1391 Plot y versus x as lines and/or markers.
   1392 
   (...)
   1629 (``'green'``) or hex strings (``'#008000'``).
   1630 """
   1631 kwargs = cbook.normalize_kwargs(kwargs, mlines.Line2D)
-> 1632 lines = [*self._get_lines(*args, data=data, **kwargs)]
   1633 for line in lines:
   1634     self.add_line(line)

File ~/miniconda3/envs/del/lib/python3.10/site-packages/matplotlib/axes/_base.py:312, in _process_plot_var_args.__call__(self, data, *args, **kwargs)
    310     this += args[0],
    311     args = args[1:]
--> 312 yield from self._plot_args(this, kwargs)

File ~/miniconda3/envs/del/lib/python3.10/site-packages/matplotlib/axes/_base.py:487, in _process_plot_var_args._plot_args(self, tup, kwargs, return_kwargs)
    484         kw[prop_name] = val
    486 if len(xy) == 2:
--> 487     x = _check_1d(xy[0])
    488     y = _check_1d(xy[1])
    489 else:

File ~/miniconda3/envs/del/lib/python3.10/site-packages/matplotlib/cbook/__init__.py:1327, in _check_1d(x)
   1321 with warnings.catch_warnings(record=True) as w:
   1322     warnings.filterwarnings(
   1323         "always",
   1324         category=Warning,
   1325         message='Support for multi-dimensional indexing')
-> 1327     ndim = x[:, None].ndim
   1328     # we have definitely hit a pandas index or series object
   1329     # cast to a numpy array.
   1330     if len(w) > 0:

File ~/miniconda3/envs/del/lib/python3.10/site-packages/pandas/core/frame.py:3506, in DataFrame.__getitem__(self, key)
   3504 if self.columns.nlevels > 1:
   3505     return self._getitem_multilevel(key)
-> 3506 indexer = self.columns.get_loc(key)
   3507 if is_integer(indexer):
   3508     indexer = [indexer]

File ~/miniconda3/envs/del/lib/python3.10/site-packages/pandas/core/indexes/base.py:3628, in Index.get_loc(self, key, method, tolerance)
   3623         raise KeyError(key) from err
   3624     except TypeError:
   3625         # If we have a listlike key, _check_indexing_error will raise
   3626         #  InvalidIndexError. Otherwise we fall through and re-raise
   3627         #  the TypeError.
-> 3628         self._check_indexing_error(key)
   3629         raise
   3631 # GH#42269

File ~/miniconda3/envs/del/lib/python3.10/site-packages/pandas/core/indexes/base.py:5637, in Index._check_indexing_error(self, key)
   5633 def _check_indexing_error(self, key):
   5634     if not is_scalar(key):
   5635         # if key is not a scalar, directly raise an error (the code below
   5636         # would convert to numpy arrays and raise later any way) - GH29926
-> 5637         raise InvalidIndexError(key)

InvalidIndexError: (slice(None, None, None), None)

Expected outcome

This was working fine with pandas < 1.4, although I am not sure whether this was a legitimate use of plt.plot

Additional information

I would be totally fine with an answer along the lines of "well this was working before only by pure chance so ...".

Looking at the reason of the behaviour change:

pandas 1.4 has changed the error type raised for df[:, None]:

  • InvalidIndexError in pandas 1.4
  • TypeError in pandas < 1.4

If the snippet is deemed a legitimate use of matplotlib, a possible fix could be to add InvalidIndexError here:
https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/cbook/__init__.py#L1342

Operating system

Ubuntu

Matplotlib Version

3.5.1

Matplotlib Backend

QtAgg

Python version

3.9.5

Jupyter version

No response

Installation

conda

@jklymak jklymak linked a pull request Jan 27, 2022 that will close this issue
6 tasks
@jklymak
Copy link
Member

jklymak commented Jan 27, 2022

See #22141

I think we should just try to_numpy if an object has it - that gets rid of all the pandas warning checks, and seems to work for the tests I have.

We can't just use InvalidIndexError because that is pandas-only and we cannot import pandas without depending on pandas. I'm sure there is some fancy way to conditionally import pandas if it is in the environment, but I don't think we should resort to trickery here.

@jklymak jklymak mentioned this issue Jan 27, 2022
6 tasks
@tacaswell
Copy link
Member

To be honest this may not be a fully legitimate use of plt.plot .

It is certainly on the edge given the degree to which it is conflating a 1 column data frame and a series. That said, we already do something very similar with 2D arrays as

plt.plot(np.array([[1, 2, 3]]).T, [4, 5, 6])

works.

@okumuralab
Copy link

Similarly,

import matplotlib.pyplot as plt
import pandas as pd

x = [1, 2]
df = pd.DataFrame({'A': [3, 4]})
plt.plot(x, df)

used to work, but now raises an error:

InvalidIndexError: (slice(None, None, None), None)

@QuLogic QuLogic added this to the v3.5.2 milestone Mar 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants