warning in bar plot with multiple columns #18764

AlbertDeFusco · 2017-12-13T13:12:10Z

Code Sample, a copy-pastable example if possible

>>> import numpy as np
>>> import pandas as pd
>>> pd.__version__
'0.21.0'
>>> a = np.random.randint(1, 100, size=10)
>>> b = 100 - a
>>> i = np.arange(100, 110)
>>> 
>>> df = pd.DataFrame(dict(a=a, b=b, i=i))
>>> df.plot.bar(x='i', y=['b','a'], stacked=True)
/Users/adefusco/Applications/miniconda3/envs/projects-data-analysis/lib/python3.6/site-packages/pandas/plotting/_core.py:1714: UserWarning: Pandas doesn't allow columns to be created via a new attribute name - see https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute-access
  series.name = label
<matplotlib.axes._subplots.AxesSubplot object at 0x10dec46d8>

Problem description

The warning message about series.name = label in this case is because it's trying to do the following. I am not using the label keyword argument.

y = ['a','b'] # <=== from inputs to function
label = kwds['label'] if 'label' in kwds else y
series = data[y].copy()  # Don't modify
series.name = label

and since series is actually a Pandas now thinks that a new column is being created with the values ['a','b'].

Expected Output

The warning message does not occur if the Index is used as the x-axis

df[['b','a']].plot.bar(stacked=True)

Proposed solution

In pandas/plotting/_core.py would the following be reasonable?

if y is not None:
    if is_scalar(y):
        if is_integer(y) and not data.columns.holds_integer():
            y = data.columns[y]
        
        label = kwds['label'] if 'label' in kwds else y
        series = data[y].copy()  # Don't modify
        series.name = label
        
        data = series
        
    elif is_dict_like(y):
        data = data[list(y.values())].copy()
        data = data.rename(columns=y)
    
    elif is_list_like(y):
        data = data[y].copy()
        
    elif not isinstance(data[y], ABCSeries):
        raise ValueError("y must be a label or position")

... continue with plot

this provides for the following options using the DataFrame defined above.

df.plot.bar(x='i', y=1)              # <-- plot the second column
df.plot.bar(x='i', y=['b,'a'])       # <-- plot multiple columns
df.plot.bar(x='i', y=dict(y=a, z=b)) # <-- plot multiple columns and with custom labels

After I teach myself how to build Pandas I'll test this change.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Darwin
OS-release: 17.3.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.21.0
pytest: None
pip: 9.0.1
setuptools: 36.5.0.post20170921
Cython: None
numpy: 1.13.3
scipy: None
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.1.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.5.0

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2017-12-13T13:33:01Z

Can you check if this was fixed by #18695?

AlbertDeFusco · 2017-12-13T14:23:23Z

Actually, #18695 breaks my plot entirely by only allowing y to be a scalar. I'll submit a PR along with tests soon.

>>> import numpy as np
>>> import pandas as pd
>>> pd.__version__
'0.22.0.dev0+356.g9705a4806'
>>> a = np.random.randint(1, 100, size=10)
>>> b = 100 - a
>>> i = np.arange(100, 110)
>>> 
>>> df = pd.DataFrame(dict(a=a, b=b, i=i))
>>> df.plot.bar(x='i', y=['b','a'], stacked=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/adefusco/Development/Continuum/pandas/pandas/plotting/_core.py", line 2701, in bar
    return self(kind='bar', x=x, y=y, **kwds)
  File "/Users/adefusco/Development/Continuum/pandas/pandas/plotting/_core.py", line 2666, in __call__
    sort_columns=sort_columns, **kwds)
  File "/Users/adefusco/Development/Continuum/pandas/pandas/plotting/_core.py", line 1905, in plot_frame
    **kwds)
  File "/Users/adefusco/Development/Continuum/pandas/pandas/plotting/_core.py", line 1716, in _plot
    raise ValueError("y must be a label or position")
ValueError: y must be a label or position

TomAugspurger · 2017-12-13T14:29:52Z

Yeah, that was the point of #18695, to raise when the user passes invalid arguments. x and y are supposed to be single labels or positions. Passing x and y sends the code down a path that's expecting all the other kwargs to deal with single values, not multiple.

If you want to plot multiple, I'd recommend df.set_index('i')[['b', 'a']].plot.bar(stacked=True).

AlbertDeFusco · 2017-12-13T15:52:50Z

Ok, thanks.

AlbertDeFusco closed this as completed Dec 13, 2017

kylebarron mentioned this issue Mar 19, 2018

pd.merge() doesn't merge int and str column dtypes but no warning or error #9780

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

warning in bar plot with multiple columns #18764

warning in bar plot with multiple columns #18764

AlbertDeFusco commented Dec 13, 2017

INSTALLED VERSIONS

TomAugspurger commented Dec 13, 2017 •

edited

Loading

AlbertDeFusco commented Dec 13, 2017

TomAugspurger commented Dec 13, 2017

AlbertDeFusco commented Dec 13, 2017

warning in bar plot with multiple columns #18764

warning in bar plot with multiple columns #18764

Comments

AlbertDeFusco commented Dec 13, 2017

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Proposed solution

Output of pd.show_versions()

INSTALLED VERSIONS

TomAugspurger commented Dec 13, 2017 • edited Loading

AlbertDeFusco commented Dec 13, 2017

TomAugspurger commented Dec 13, 2017

AlbertDeFusco commented Dec 13, 2017

Output of `pd.show_versions()`

TomAugspurger commented Dec 13, 2017 •

edited

Loading