Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Inconsistent treatment of list of labels in plot when the input is a dataframe #27762

Closed
Gattocrucco opened this issue Feb 8, 2024 · 9 comments · Fixed by #27767
Closed
Milestone

Comments

@Gattocrucco
Copy link

Bug summary

plt.plot interprets a dataframe as a sequence of series to be plotted, one per column. Labels can be specified by setting label to a list of labels matching the number of columns. If the dataframe has only one column, though, the label argument is interpreted as a single label, instead of a list with one element.

Code for reproduction

from matplotlib import pyplot as plt
import pandas as pd
import numpy as np

x = np.arange(5)
y = pd.DataFrame({'a': x, 'b': x + 1})

plt.figure('double', clear=True)
plt.plot(x, y, label=['A', 'B'])
plt.legend() # -> ok

plt.figure('single', clear=True)
plt.plot(x, y.drop('b', axis=1), label=['A'])
plt.legend() # -> legend label is "['A']"

plt.show()

Actual outcome

immagine

Expected outcome

immagine

Additional information

No response

Operating system

macOS 14.2.1

Matplotlib Version

3.8.2

Matplotlib Backend

MacOSX

Python version

3.12.1 | packaged by conda-forge | (main, Dec 23 2023, 08:01:35) [Clang 16.0.6 ]

Jupyter version

No response

Installation

conda

@rcomer
Copy link
Member

rcomer commented Feb 9, 2024

This isn't specific to dataframes. The same behaviour can be seen with

import matplotlib.pyplot as plt

fig, ax = plt.subplots()
ax.plot([[0, 0], [1, 2]], label=['A', 'B'])
ax.legend()
plt.show()

image

fig, ax = plt.subplots()
ax.plot([[0], [1]], label=['A'])
ax.legend()
plt.show()

image

@timhoffm
Copy link
Member

timhoffm commented Feb 9, 2024

I'm not aware that we make any guarantees that one can pass a single-element list for single data. - I think the behavior is undefined in this case. For context, multiple label support was introduced in #16178.

But if we want to support this, that'd need to be handled here:

else:
labels = [label] * n_datasets

@rcomer
Copy link
Member

rcomer commented Feb 9, 2024

I tried poking the code that @timhoffm highlighted, but it seems the current behaviour is very much expected:

@pytest.mark.parametrize('label_array', [['low', 'high'],
('low', 'high'),
np.array(['low', 'high'])])
def test_plot_single_input_multiple_label(label_array):
# test ax.plot() with 1D array like input
# and iterable label
x = [1, 2, 3]
y = [2, 5, 6]
fig, ax = plt.subplots()
ax.plot(x, y, label=label_array)
leg = ax.legend()
assert len(leg.get_texts()) == 1
assert leg.get_texts()[0].get_text() == str(label_array)

@Gattocrucco
Copy link
Author

I think the current behavior is like if np.array([1]) yielded a 0-d array. It breaks agnostic coding patterns that do not depend on the length of things.

@timhoffm
Copy link
Member

timhoffm commented Feb 9, 2024

I tried poking the code that @timhoffm highlighted, but it seems the current behaviour is very much expected:

My vague recollection is that #16178 should only affect the multiple-data case. Changes to the single-data case were out of scope. I believe, the test is just ensurging the single-data case is kept as is; i.e. anything passed to label is converted using str. So in a way, this behavior is expected as the de-facto behavior, but I would not claim it's necessarily intended.

We have a generalization ambiguity problem. We cannot have both:

  1. For single data, convert any label object to str.
  2. For N data, support a list of labels of length N.

While I see the motivation for 2 with the N=1 case, supporting that would mean to break 1. We certainly have to go through a deprecation period with this. And we'd have to consider how to handle the case one dataset, but >1 labels (as in the test). I think, if one dataset, length-1 label list gives the one element as label, the >1 labels case should error out and not create the str anymore.

I'd fundamentally support the idea of not converting iterables to str anymore - if anybody wants to take on the deprecation effort.

@jklymak
Copy link
Member

jklymak commented Feb 9, 2024

I'd fundamentally support the idea of not converting iterables to str anymore - if anybody wants to take on the deprecation effort.

I think we should do this - I can't think of a situation where a user would want the result above, and if for some weird reason they do, they can convert to a single string themselves. I think the current behaviour is undesirable enough that I'm not even sure we need a full deprecation cycle for it?

@timhoffm
Copy link
Member

timhoffm commented Feb 9, 2024

I would bet that there are people who for some reason have a list as identifier for their dataset and they don't care enough to format that explicitly (['A', 'B'] in the legend was good enough for them). Therefore, I advocate for a regular deprecation.

@rcomer
Copy link
Member

rcomer commented Feb 9, 2024

So what should happen for the single element list during the deprecation period? Can we implement new behaviour for that right away?

@timhoffm
Copy link
Member

timhoffm commented Feb 10, 2024

Given all context, I’d say yes. - Longer explanation will follow later (don’t want to type that on the phone)

Update for completeness:

While changing the behavior for single dataset + one-element list is breaking, we would not get much from a deprecation. Since we have a change of behavior, a runtime deprecation warning could announce that, but there is no reasonable way to allow people opting into the new behavior, which should be the desired one in the majority of cases. - They see the warning, and cannot do anything with it. So a runtime heads up is not feasible. Which would leave us with just a deprecation notice in the release notes. Apart from that I there's much overlap between people who read release notes and people who know (or cared) that they have passed a list for a single label, still the above argument holds, you cannot opt into the new behavior during deprecation. So the deprecation would only help people, wo intentionally wanted a list-formatted label. That is rare and not worth the effort and waiting time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants