ENH: groupby.hist bins don't match #22222

javadnoorb · 2018-08-06T17:38:13Z

Code Sample, a copy-pastable example if possible

import numpy as np
import pandas as pd
N = 100
df = pd.DataFrame(np.append(np.random.randn(N), np.random.randn(N)/10), columns = ['rand'])
df['group'] = [0]*N + [1]*N
df.groupby('group')['rand'].hist(bins=20, alpha=0.7)

Problem description

When the dynamic range of two groups are not the same, histogram bins each group according to its own range. Ideally these should match, taking the union of dynamic ranges and binning it for all.

The output of above code is:

Expected Output

Histograms with bins of equal widths (or maybe an input flag to hist that could do this)

maybe somethings like the output of following code:

bins = np.linspace(-3, 2.5, 20)
df[df['group']==0]['rand'].hist(bins=bins, alpha=0.7)
df[df['group']==1]['rand'].hist(bins=bins, alpha=0.7)

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Linux
OS-release: 4.15.0-29-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.3
pytest: 3.5.0
pip: 18.0
setuptools: 39.1.0
Cython: 0.28.1
numpy: 1.14.5
scipy: 1.0.1
pyarrow: None
xarray: None
IPython: 6.3.0
sphinx: 1.7.2
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.2.2
openpyxl: None
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.9999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: 0.2.0
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

WillAyd · 2018-08-06T17:45:41Z

I'd be +1 on this as I do think your proposal makes more sense. Investigation and PRs are always welcome

javadnoorb · 2018-08-07T00:16:09Z

Thanks @WillAyd. I'll try to come up with a PR.

Closes #479 Contrary to the title `plotting.py`, which uses Seaborn, produces consistent bin-width when the range of values differs between images. The Jupyter Notebook (`notebooks/02-Summary-statistics-and-plots.ipynb`) uses Pandas for loading data and plotting it (a deliberate choice to reduce the number of packages users would encounter) and it is Pandas which produces the different bin-widths (see [#22222 ENH: groupby.hist bins don't match](pandas-dev/pandas#22222)). The Notebook has been updated to show how to use `np.linspace()` across the total range of data. Docstrings of `tests/test_plotting.py` have also been improved.

WillAyd added Groupby Visualization plotting labels Aug 6, 2018

WillAyd added this to the Contributions Welcome milestone Aug 6, 2018

javadnoorb mentioned this issue Aug 7, 2018

ENH: option for groupby.hist to match bins #22228

Closed

4 tasks

mroeschke added the Enhancement label Jun 21, 2021

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

ns-rse mentioned this issue Apr 12, 2023

Consistent bin-width in histograms summarising multiple images AFM-SPM/TopoStats#524

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: groupby.hist bins don't match #22222

ENH: groupby.hist bins don't match #22222

javadnoorb commented Aug 6, 2018

INSTALLED VERSIONS

WillAyd commented Aug 6, 2018

javadnoorb commented Aug 7, 2018

ENH: groupby.hist bins don't match #22222

ENH: groupby.hist bins don't match #22222

Comments

javadnoorb commented Aug 6, 2018

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

WillAyd commented Aug 6, 2018

javadnoorb commented Aug 7, 2018

Output of `pd.show_versions()`