Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: stacked histogram on datetimes #34204

Open
c-foschi opened this issue May 16, 2020 · 7 comments
Open

BUG: stacked histogram on datetimes #34204

c-foschi opened this issue May 16, 2020 · 7 comments
Assignees
Labels

Comments

@c-foschi
Copy link

Hello, I want to create a staked histogram, and I hoped to do it with pandas hist method. The data I want to plot is in one column of my DataFrame, while one other column is the factor by which I want to color the bars of the histogram. To obtain the actual dataset on which to use the hist method, I tried two of the methods listed here: https://stackoverflow.com/questions/41622054/stacked-histogram-of-grouped-values-in-pandas, whith a consistent result like this:

v= np.array([['2019-10-15T07:33:50.000000000',                           'NaT'],
             ['2019-10-18T09:19:58.000000000',                           'NaT'],
             [                          'NaT', '2020-04-26T16:14:38.000000000'],
             [                          'NaT', '2020-04-27T04:48:30.000000000']],
            dtype= 'datetime64[ns]')
df= pd.DataFrame(v)

But when I call the histogram on that dataframe:

df.plot(kind= 'hist', stacked=True)

it gives me the error ValueError: view limit minimum -0.05000000000000001 is less than 1 and is an invalid Matplotlib date value. This often happens if you pass a non-datetime value to an axis that has datetime units.

If, on the other hand, I use method hist directly:

df.hist(stacked= True)

it gives me the error ValueError: hist method requires numerical columns, nothing to plot, even if one column alone can be plotted safely.

NaT values shouldn't be the problem here, and I assume that both the codes I posted should work, because they produce regular stacked histograms when the columns are numerical instead of datetime.

INSTALLED VERSIONS

commit : None
python : 3.7.3.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 1.0.3
numpy : 1.18.1
pytz : 2020.1
dateutil : 2.8.1
pip : 20.0.2
setuptools : 46.1.3.post20200330
Cython : 0.29.17
pytest : 5.4.1
hypothesis : 5.8.3
sphinx : 3.0.3
blosc : None
feather : None
xlsxwriter : 1.2.8
lxml.etree : 4.5.0
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.13.0
pandas_datareader: None
bs4 : 4.9.0
bottleneck : 1.3.2
fastparquet : None
gcsfs : None
lxml.etree : 4.5.0
matplotlib : 3.1.3
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 5.4.1
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.16
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.8
numba : 0.43.1

@c-foschi c-foschi added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels May 16, 2020
@charlesdong1991
Copy link
Member

charlesdong1991 commented May 16, 2020

thanks for your report @c-foschi

The issue here is that values are datetimes, currently df.hist accepts numeric values, not datetime values.

could you try see if your example works in matplotlib (aka if matplotlib stacked hist works given datetime values) ? if so, we might think of supporting.

@charlesdong1991 charlesdong1991 added Needs Info Clarification about behavior needed to assess issue Visualization plotting and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels May 16, 2020
@c-foschi
Copy link
Author

c-foschi commented May 16, 2020

thank you too.

So Series.hist accepts datetime values, but DataFrame.hist doesn't? I didn't expect that.

I have no practice with matplotlib, but this worked:

l= [df[col][~df[col].isna()] for col in df.columns]

plt.figure()
plt.hist(l, stacked=True)
plt.legend(df.columns)
plt.show()

@charlesdong1991
Copy link
Member

charlesdong1991 commented May 16, 2020

right, but the l you assigned to plt.hist contains list of Series which have integer values, and therefore is plottable.

So Series.hist accepts datetime values, but DataFrame.hist doesn't? I didn't expect that.

emm, indeed, looking at the source code, seems Series.hist directly used ax.hist from matplotlib, but for DataFrame.hist, it adds to filter out non-numeric values.

data = data._get_numeric_data()

I am not sure why, but probably we could change something here.

thanks for reporting!

@charlesdong1991 charlesdong1991 removed the Needs Info Clarification about behavior needed to assess issue label May 16, 2020
@charlesdong1991
Copy link
Member

take

@c-foschi
Copy link
Author

right, but the l you assigned to plt.hist contains list of Series which have integer values, and therefore is plottable.

no, I don't know where did you take those integer values, but the data I used is the same I presented in my first post and it is datetime.

>>> l
[0   2019-10-15 07:33:50
 1   2019-10-18 09:19:58
 Name: 0, dtype: datetime64[ns],
 2   2020-04-26 16:14:38
 3   2020-04-27 04:48:30
 Name: 1, dtype: datetime64[ns]]

@charlesdong1991
Copy link
Member

whoops, yeah, thanks for the correction!
i mis-use the data from another issue for this, yeah, as said above, matplotlib does support datetime values for hist, and so should pandas.

@c-foschi
Copy link
Author

thanks!

@mroeschke mroeschke added the Bug label Aug 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants