Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

warning in bar plot with multiple columns #18764

Closed
AlbertDeFusco opened this issue Dec 13, 2017 · 4 comments
Closed

warning in bar plot with multiple columns #18764

AlbertDeFusco opened this issue Dec 13, 2017 · 4 comments

Comments

@AlbertDeFusco
Copy link

Code Sample, a copy-pastable example if possible

>>> import numpy as np
>>> import pandas as pd
>>> pd.__version__
'0.21.0'
>>> a = np.random.randint(1, 100, size=10)
>>> b = 100 - a
>>> i = np.arange(100, 110)
>>> 
>>> df = pd.DataFrame(dict(a=a, b=b, i=i))
>>> df.plot.bar(x='i', y=['b','a'], stacked=True)
/Users/adefusco/Applications/miniconda3/envs/projects-data-analysis/lib/python3.6/site-packages/pandas/plotting/_core.py:1714: UserWarning: Pandas doesn't allow columns to be created via a new attribute name - see https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute-access
  series.name = label
<matplotlib.axes._subplots.AxesSubplot object at 0x10dec46d8>

Problem description

The warning message about series.name = label in this case is because it's trying to do the following. I am not using the label keyword argument.

y = ['a','b'] # <=== from inputs to function
label = kwds['label'] if 'label' in kwds else y
series = data[y].copy()  # Don't modify
series.name = label

and since series is actually a Pandas now thinks that a new column is being created with the values ['a','b'].

Expected Output

The warning message does not occur if the Index is used as the x-axis

df[['b','a']].plot.bar(stacked=True)

Proposed solution

In pandas/plotting/_core.py would the following be reasonable?

if y is not None:
    if is_scalar(y):
        if is_integer(y) and not data.columns.holds_integer():
            y = data.columns[y]
        
        label = kwds['label'] if 'label' in kwds else y
        series = data[y].copy()  # Don't modify
        series.name = label
        
        data = series
        
    elif is_dict_like(y):
        data = data[list(y.values())].copy()
        data = data.rename(columns=y)
    
    elif is_list_like(y):
        data = data[y].copy()
        
    elif not isinstance(data[y], ABCSeries):
        raise ValueError("y must be a label or position")

... continue with plot

this provides for the following options using the DataFrame defined above.

df.plot.bar(x='i', y=1)              # <-- plot the second column
df.plot.bar(x='i', y=['b,'a'])       # <-- plot multiple columns
df.plot.bar(x='i', y=dict(y=a, z=b)) # <-- plot multiple columns and with custom labels

After I teach myself how to build Pandas I'll test this change.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Darwin
OS-release: 17.3.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.21.0
pytest: None
pip: 9.0.1
setuptools: 36.5.0.post20170921
Cython: None
numpy: 1.13.3
scipy: None
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.1.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.5.0

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Dec 13, 2017

Can you check if this was fixed by #18695?

@AlbertDeFusco
Copy link
Author

Actually, #18695 breaks my plot entirely by only allowing y to be a scalar. I'll submit a PR along with tests soon.

>>> import numpy as np
>>> import pandas as pd
>>> pd.__version__
'0.22.0.dev0+356.g9705a4806'
>>> a = np.random.randint(1, 100, size=10)
>>> b = 100 - a
>>> i = np.arange(100, 110)
>>> 
>>> df = pd.DataFrame(dict(a=a, b=b, i=i))
>>> df.plot.bar(x='i', y=['b','a'], stacked=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/adefusco/Development/Continuum/pandas/pandas/plotting/_core.py", line 2701, in bar
    return self(kind='bar', x=x, y=y, **kwds)
  File "/Users/adefusco/Development/Continuum/pandas/pandas/plotting/_core.py", line 2666, in __call__
    sort_columns=sort_columns, **kwds)
  File "/Users/adefusco/Development/Continuum/pandas/pandas/plotting/_core.py", line 1905, in plot_frame
    **kwds)
  File "/Users/adefusco/Development/Continuum/pandas/pandas/plotting/_core.py", line 1716, in _plot
    raise ValueError("y must be a label or position")
ValueError: y must be a label or position

@TomAugspurger
Copy link
Contributor

Yeah, that was the point of #18695, to raise when the user passes invalid arguments. x and y are supposed to be single labels or positions. Passing x and y sends the code down a path that's expecting all the other kwargs to deal with single values, not multiple.

If you want to plot multiple, I'd recommend df.set_index('i')[['b', 'a']].plot.bar(stacked=True).

@AlbertDeFusco
Copy link
Author

Ok, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants