[Bug]: Inconsistent treatment of list of labels in `plot` when the input is a dataframe #27762

Gattocrucco · 2024-02-08T22:51:47Z

Bug summary

plt.plot interprets a dataframe as a sequence of series to be plotted, one per column. Labels can be specified by setting label to a list of labels matching the number of columns. If the dataframe has only one column, though, the label argument is interpreted as a single label, instead of a list with one element.

Code for reproduction

from matplotlib import pyplot as plt
import pandas as pd
import numpy as np

x = np.arange(5)
y = pd.DataFrame({'a': x, 'b': x + 1})

plt.figure('double', clear=True)
plt.plot(x, y, label=['A', 'B'])
plt.legend() # -> ok

plt.figure('single', clear=True)
plt.plot(x, y.drop('b', axis=1), label=['A'])
plt.legend() # -> legend label is "['A']"

plt.show()

Actual outcome

Expected outcome

Additional information

No response

Operating system

macOS 14.2.1

Matplotlib Version

3.8.2

Matplotlib Backend

MacOSX

Python version

3.12.1 | packaged by conda-forge | (main, Dec 23 2023, 08:01:35) [Clang 16.0.6 ]

Jupyter version

No response

Installation

conda

The text was updated successfully, but these errors were encountered:

rcomer · 2024-02-09T10:23:36Z

This isn't specific to dataframes. The same behaviour can be seen with

import matplotlib.pyplot as plt

fig, ax = plt.subplots()
ax.plot([[0, 0], [1, 2]], label=['A', 'B'])
ax.legend()
plt.show()

fig, ax = plt.subplots()
ax.plot([[0], [1]], label=['A'])
ax.legend()
plt.show()

timhoffm · 2024-02-09T12:15:29Z

I'm not aware that we make any guarantees that one can pass a single-element list for single data. - I think the behavior is undefined in this case. For context, multiple label support was introduced in #16178.

But if we want to support this, that'd need to be handled here:

matplotlib/lib/matplotlib/axes/_base.py

Lines 522 to 523 in d2cc4d0

    
           else: 
        
               labels = [label] * n_datasets

rcomer · 2024-02-09T13:11:36Z

I tried poking the code that @timhoffm highlighted, but it seems the current behaviour is very much expected:

matplotlib/lib/matplotlib/tests/test_legend.py

Lines 1191 to 1203 in d2cc4d0

    
           @pytest.mark.parametrize('label_array', [['low', 'high'], 
        
                                                    ('low', 'high'), 
        
                                                    np.array(['low', 'high'])]) 
        
           def test_plot_single_input_multiple_label(label_array): 
        
               # test ax.plot() with 1D array like input 
        
               # and iterable label 
        
               x = [1, 2, 3] 
        
               y = [2, 5, 6] 
        
               fig, ax = plt.subplots() 
        
               ax.plot(x, y, label=label_array) 
        
               leg = ax.legend() 
        
               assert len(leg.get_texts()) == 1 
        
               assert leg.get_texts()[0].get_text() == str(label_array)

Gattocrucco · 2024-02-09T19:53:53Z

I think the current behavior is like if np.array([1]) yielded a 0-d array. It breaks agnostic coding patterns that do not depend on the length of things.

timhoffm · 2024-02-09T20:52:59Z

I tried poking the code that @timhoffm highlighted, but it seems the current behaviour is very much expected:

My vague recollection is that #16178 should only affect the multiple-data case. Changes to the single-data case were out of scope. I believe, the test is just ensurging the single-data case is kept as is; i.e. anything passed to label is converted using str. So in a way, this behavior is expected as the de-facto behavior, but I would not claim it's necessarily intended.

We have a generalization ambiguity problem. We cannot have both:

For single data, convert any label object to str.
For N data, support a list of labels of length N.

While I see the motivation for 2 with the N=1 case, supporting that would mean to break 1. We certainly have to go through a deprecation period with this. And we'd have to consider how to handle the case one dataset, but >1 labels (as in the test). I think, if one dataset, length-1 label list gives the one element as label, the >1 labels case should error out and not create the str anymore.

I'd fundamentally support the idea of not converting iterables to str anymore - if anybody wants to take on the deprecation effort.

jklymak · 2024-02-09T22:25:30Z

I'd fundamentally support the idea of not converting iterables to str anymore - if anybody wants to take on the deprecation effort.

I think we should do this - I can't think of a situation where a user would want the result above, and if for some weird reason they do, they can convert to a single string themselves. I think the current behaviour is undesirable enough that I'm not even sure we need a full deprecation cycle for it?

timhoffm · 2024-02-09T22:37:34Z

I would bet that there are people who for some reason have a list as identifier for their dataset and they don't care enough to format that explicitly (['A', 'B'] in the legend was good enough for them). Therefore, I advocate for a regular deprecation.

rcomer · 2024-02-09T23:08:27Z

So what should happen for the single element list during the deprecation period? Can we implement new behaviour for that right away?

timhoffm · 2024-02-10T08:19:28Z

Given all context, I’d say yes. - Longer explanation will follow later (don’t want to type that on the phone)

Update for completeness:

While changing the behavior for single dataset + one-element list is breaking, we would not get much from a deprecation. Since we have a change of behavior, a runtime deprecation warning could announce that, but there is no reasonable way to allow people opting into the new behavior, which should be the desired one in the majority of cases. - They see the warning, and cannot do anything with it. So a runtime heads up is not feasible. Which would leave us with just a deprecation notice in the release notes. Apart from that I there's much overlap between people who read release notes and people who know (or cared) that they have passed a list for a single label, still the above argument holds, you cannot opt into the new behavior during deprecation. So the deprecation would only help people, wo intentionally wanted a list-formatted label. That is rare and not worth the effort and waiting time.

rcomer mentioned this issue Feb 10, 2024

Update handling of sequence labels for plot #27767

Merged

4 tasks

QuLogic closed this as completed in #27767 Feb 15, 2024

rcomer added this to the v3.9.0 milestone Feb 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Inconsistent treatment of list of labels in `plot` when the input is a dataframe #27762

[Bug]: Inconsistent treatment of list of labels in `plot` when the input is a dataframe #27762

Gattocrucco commented Feb 8, 2024

rcomer commented Feb 9, 2024 •

edited

timhoffm commented Feb 9, 2024 •

edited

rcomer commented Feb 9, 2024

Gattocrucco commented Feb 9, 2024

timhoffm commented Feb 9, 2024

jklymak commented Feb 9, 2024 •

edited

timhoffm commented Feb 9, 2024

rcomer commented Feb 9, 2024

timhoffm commented Feb 10, 2024 •

edited

[Bug]: Inconsistent treatment of list of labels in plot when the input is a dataframe #27762

[Bug]: Inconsistent treatment of list of labels in plot when the input is a dataframe #27762

Comments

Gattocrucco commented Feb 8, 2024

Bug summary

Code for reproduction

Actual outcome

Expected outcome

Additional information

Operating system

Matplotlib Version

Matplotlib Backend

Python version

Jupyter version

Installation

rcomer commented Feb 9, 2024 • edited

timhoffm commented Feb 9, 2024 • edited

rcomer commented Feb 9, 2024

Gattocrucco commented Feb 9, 2024

timhoffm commented Feb 9, 2024

jklymak commented Feb 9, 2024 • edited

timhoffm commented Feb 9, 2024

rcomer commented Feb 9, 2024

timhoffm commented Feb 10, 2024 • edited

[Bug]: Inconsistent treatment of list of labels in `plot` when the input is a dataframe #27762

[Bug]: Inconsistent treatment of list of labels in `plot` when the input is a dataframe #27762

rcomer commented Feb 9, 2024 •

edited

timhoffm commented Feb 9, 2024 •

edited

jklymak commented Feb 9, 2024 •

edited

timhoffm commented Feb 10, 2024 •

edited