Setting histogram weights for multiple columns fails #33173

mave240 · 2020-03-31T10:07:25Z

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np
df = pd.DataFrame(dict(zip(['A', 'B'], np.random.randn(2, 100))))
df.plot.hist(weights=0.1*np.ones(shape=(100, 2)))  # fail

Problem description

Trying to plot a histogram of a multi-column data frame with weights fails with ValueError: weights should have the same shape as a. There is no error for a single column. It doesn't work with:

df.plot.hist(weights=0.1*np.ones(shape=(100,)))  # fail

either.

Expected Output

A matplotlib figure.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : None
python : 3.7.6.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 142 Stepping 9, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None
pandas : 1.0.3
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 46.1.1.post20200323
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.3
numexpr : None
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : None
pytables : None
pytest : None
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : 0.48.0

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2020-03-31T13:06:59Z

Thanks for the report. Can you post the full traceback, and investigate where things are going wrong?

mave240 · 2020-03-31T13:45:25Z

Thanks for the report. Can you post the full traceback, and investigate where things are going wrong?

Traceback (most recent call last):
File "", line 4, in
File "Z:\miniconda3\envs\tf\lib\site-packages\pandas\plotting_core.py", line 1180, in hist
return self(kind="hist", by=by, bins=bins, **kwargs)
File "Z:\miniconda3\envs\tf\lib\site-packages\pandas\plotting_core.py", line 847, in call
return plot_backend.plot(data, kind=kind, **kwargs)
File "Z:\miniconda3\envs\tf\lib\site-packages\pandas\plotting_matplotlib_init_.py", line 61, in plot
plot_obj.generate()
File "Z:\miniconda3\envs\tf\lib\site-packages\pandas\plotting_matplotlib\core.py", line 260, in generate
self._args_adjust()
File "Z:\miniconda3\envs\tf\lib\site-packages\pandas\plotting_matplotlib\hist.py", line 34, in _args_adjust
weights=self.kwds.get("weights", None),
File "<array_function internals>", line 6, in histogram
File "Z:\miniconda3\envs\tf\lib\site-packages\numpy\lib\histograms.py", line 793, in histogram
a, weights = _ravel_and_check_weights(a, weights)
File "Z:\miniconda3\envs\tf\lib\site-packages\numpy\lib\histograms.py", line 301, in _ravel_and_check_weights
'weights should have the same shape as a.')
ValueError: weights should have the same shape as a.

There are two issues I think. The first one emerges when the number of bins is computed in HistPlot._args_adjust(self). There, np.histogram is called with the weights argument unnecessarily passed in. When removed, the plot succeeds if the shape of weights is (100, ), as that matches the individual column (squeezed) shape, which is required by np.histrogram.
However, my guess is that the shape of weights should be the same as that of the data frame itself, (100, 2) in this case, so each column can be weighted individually. That is the second issue which is less trivial to fix.

charlesdong1991 · 2020-04-03T21:29:35Z

take

TomAugspurger added the Visualization plotting label Mar 31, 2020

TomAugspurger added this to the Contributions Welcome milestone Mar 31, 2020

github-actions bot assigned charlesdong1991 Apr 3, 2020

charlesdong1991 mentioned this issue Apr 9, 2020

BUG: weights is not working for multiple columns in df.plot.hist #33440

Merged

5 tasks

jreback modified the milestones: Contributions Welcome, 1.1 Apr 10, 2020

jreback closed this as completed in #33440 Apr 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Setting histogram weights for multiple columns fails #33173

Setting histogram weights for multiple columns fails #33173

mave240 commented Mar 31, 2020 •

edited

Loading

INSTALLED VERSIONS

TomAugspurger commented Mar 31, 2020

mave240 commented Mar 31, 2020 •

edited

Loading

charlesdong1991 commented Apr 3, 2020

Setting histogram weights for multiple columns fails #33173

Setting histogram weights for multiple columns fails #33173

Comments

mave240 commented Mar 31, 2020 • edited Loading

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

TomAugspurger commented Mar 31, 2020

mave240 commented Mar 31, 2020 • edited Loading

charlesdong1991 commented Apr 3, 2020

mave240 commented Mar 31, 2020 •

edited

Loading

Output of `pd.show_versions()`

mave240 commented Mar 31, 2020 •

edited

Loading