Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better error message when using DataFrame.hist() without numerical columns #10444

Closed
goretkin opened this issue Jun 25, 2015 · 12 comments · Fixed by #26483
Closed

Better error message when using DataFrame.hist() without numerical columns #10444

goretkin opened this issue Jun 25, 2015 · 12 comments · Fixed by #26483
Labels
Dtype Conversions Unexpected or buggy dtype conversions Error Reporting Incorrect or improved errors from pandas good first issue Visualization plotting
Milestone

Comments

@goretkin
Copy link

pandas version: 0.16.2
matplotlib version 1.4.3 (and produced different error message on older version)

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.rand(10,2))
df_o = df.astype(np.object)
df_o.hist()
ValueError                                Traceback (most recent call last)
<ipython-input-1-26253737011d> in <module>()
      4 df = pd.DataFrame(np.random.rand(10,2))
      5 df_o = df.astype(np.object)
----> 6 df_o.hist()

/usr/local/lib/python2.7/dist-packages/pandas/tools/plotting.pyc in hist_frame(data, column, by, grid, xlabelsize, xrot, ylabelsize, yrot, ax, sharex, sharey, figsize, layout, bins, **kwds)
   2764     fig, axes = _subplots(naxes=naxes, ax=ax, squeeze=False,
   2765                           sharex=sharex, sharey=sharey, figsize=figsize,
-> 2766                           layout=layout)
   2767     _axes = _flatten(axes)
   2768 

/usr/local/lib/python2.7/dist-packages/pandas/tools/plotting.pyc in _subplots(naxes, sharex, sharey, squeeze, subplot_kw, ax, layout, layout_type, **fig_kw)
   3244 
   3245     # Create first subplot separately, so we can share it if requested
-> 3246     ax0 = fig.add_subplot(nrows, ncols, 1, **subplot_kw)
   3247 
   3248     if sharex:

/usr/local/lib/python2.7/dist-packages/matplotlib/figure.pyc in add_subplot(self, *args, **kwargs)
    962                     self._axstack.remove(ax)
    963 
--> 964             a = subplot_class_factory(projection_class)(self, *args, **kwargs)
    965 
    966         self._axstack.add(key, a)

/usr/local/lib/python2.7/dist-packages/matplotlib/axes/_subplots.pyc in __init__(self, fig, *args, **kwargs)
     62                     raise ValueError(
     63                         "num must be 0 <= num <= {maxn}, not {num}".format(
---> 64                             maxn=rows*cols, num=num))
     65                 if num == 0:
     66                     warnings.warn("The use of 0 (which ends up being the "

ValueError: num must be 0 <= num <= 0, not 1
@TomAugspurger
Copy link
Contributor

We (only?) plot numeric types (df._get_numeric_data IIRC).

What's you use here that you're getting object dtypes? Integer NaNs? You'll typically want to avoid object dtypes since they're much slower for numeric operations.

@TomAugspurger TomAugspurger added Visualization plotting Dtype Conversions Unexpected or buggy dtype conversions labels Jun 26, 2015
@goretkin
Copy link
Author

I agree with what you're saying, and I think a suitable fix would include a more explicit check for the dtypes and show an error. As it is, I spent some time trying to figure out what the issue was, especially because the string representation of the DataFrame doesn't show the dtypes.

I am using floats and integers, but by accident, when I constructed the DataFrame, all entries were NaN objects, and then I populated the DataFrame in a loop.

@datapythonista datapythonista changed the title hist raises matplotlib ValueError on dataframe with objects Better erro rmessage when using DataFrame.hist() without numerical columns Jul 6, 2018
@datapythonista datapythonista changed the title Better erro rmessage when using DataFrame.hist() without numerical columns Better error message when using DataFrame.hist() without numerical columns Jul 6, 2018
@datapythonista
Copy link
Member

An error message like hist method requires numerical columns, nothing to plot or anything clearer would be useful.

@zhanwenchen
Copy link

Any progress on this? It took me quite a few hours to realize it was a dtype error.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Aug 21, 2018 via email

@den-run-ai
Copy link

den-run-ai commented Sep 24, 2018

@TomAugspurger i get a nonsense error here from a dataframe with 2 columns, but not when histogramming them one at a time as a series:

ValueError: num must be 1 <= num <= 0, not 1

traceback:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-41-7cfbfac10616> in <module>()
      1 dfi['ml_data'][
----> 2     ['duration_', 'duration__']].dropna().astype('timedelta64[D]').astype(float).hist(bins=20)

/usr/local/lib/python3.6/dist-packages/pandas/plotting/_core.py in hist_frame(data, column, by, grid, xlabelsize, xrot, ylabelsize, yrot, ax, sharex, sharey, figsize, layout, bins, **kwds)
   2176     fig, axes = _subplots(naxes=naxes, ax=ax, squeeze=False,
   2177                           sharex=sharex, sharey=sharey, figsize=figsize,
-> 2178                           layout=layout)
   2179     _axes = _flatten(axes)
   2180 

/usr/local/lib/python3.6/dist-packages/pandas/plotting/_tools.py in _subplots(naxes, sharex, sharey, squeeze, subplot_kw, ax, layout, layout_type, **fig_kw)
    235 
    236     # Create first subplot separately, so we can share it if requested
--> 237     ax0 = fig.add_subplot(nrows, ncols, 1, **subplot_kw)
    238 
    239     if sharex:

/usr/local/lib/python3.6/dist-packages/matplotlib/figure.py in add_subplot(self, *args, **kwargs)
   1072                     self._axstack.remove(ax)
   1073 
-> 1074             a = subplot_class_factory(projection_class)(self, *args, **kwargs)
   1075 
   1076         self._axstack.add(key, a)

/usr/local/lib/python3.6/dist-packages/matplotlib/axes/_subplots.py in __init__(self, fig, *args, **kwargs)
     62                     raise ValueError(
     63                         "num must be 1 <= num <= {maxn}, not {num}".format(
---> 64                             maxn=rows*cols, num=num))
     65                 self._subplotspec = GridSpec(rows, cols)[int(num) - 1]
     66                 # num - 1 for converting from MATLAB to python indexing

ValueError: num must be 1 <= num <= 0, not 1

colab notebook:

INSTALLED VERSIONS

commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.14.33+
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.22.0
numpy: 1.14.5
matplotlib: 2.1.2

@datapythonista
Copy link
Member

@denfromufa do you want to fix it?

Here you have the general documentation on how to do it: https://pandas.pydata.org/pandas-docs/stable/contributing.html

The fix should be easy, just checking the type and raising an exception with a useful message.

@den-run-ai
Copy link

@datapythonista i don't agree on this solution - it should just work. why raise an exception when histogram works for each series? however i did not have a chance to debug this yet.

@datapythonista
Copy link
Member

if you make it work even better, feel free to send a PR for it.

@wkitlasten
Copy link

I am struggling with a similar issue. I am building data frames from various sources, each with 2690 rows; from one source I can get the histogram to work, from the other I get the error reported above (and below). My plan was to convert both data frames to a dict using df.to_dict() and back to a data frame using pd.DataFrame.from_dict(), to explore more and reproduce the issue here in case someone could point out what the problem was. But when I do that, they both plot just fine. E.g.

dic=df1.to_dict()
df1=pd.DataFrame.from_dict(dic)

When I try to examine the original data frames for NaNs, etc, I cannot tell a difference. Any idea why converting my dfs to dict and back solves this issue?

df0:

print(df0.sort_values('rate',ascending=False).head(5))
print(pd.isnull(df).sum())

image

df1:

print(df1.sort_values('rate',ascending=False).head(5))
print(pd.isnull(df).sum())

image


ValueError Traceback (most recent call last)
in
9 print(df.loc[:,['obnme','rate','total_et','acres']].sort_values('obnme',ascending=False).to_dict())
10 print(pd.isnull(df).sum())
---> 11 df.hist('rate',bins=np.arange(df['rate'].min(),df['rate'].max(),0.25))
12 plt.title('ET rate for all WR')
13 plt.xlabel('ET rate (ft/yr)')

C:\conda3x64\envs\p3x64\lib\site-packages\pandas\plotting_core.py in hist_frame(data, column, by, grid, xlabelsize, xrot, ylabelsize, yrot, ax, sharex, sharey, figsize, layout, bins, **kwds)
2406 fig, axes = _subplots(naxes=naxes, ax=ax, squeeze=False,
2407 sharex=sharex, sharey=sharey, figsize=figsize,
-> 2408 layout=layout)
2409 _axes = _flatten(axes)
2410

C:\conda3x64\envs\p3x64\lib\site-packages\pandas\plotting_tools.py in _subplots(naxes, sharex, sharey, squeeze, subplot_kw, ax, layout, layout_type, **fig_kw)
236
237 # Create first subplot separately, so we can share it if requested
--> 238 ax0 = fig.add_subplot(nrows, ncols, 1, **subplot_kw)
239
240 if sharex:

C:\conda3x64\envs\p3x64\lib\site-packages\matplotlib\figure.py in add_subplot(self, *args, **kwargs)
1237 self._axstack.remove(ax)
1238
-> 1239 a = subplot_class_factory(projection_class)(self, *args, **kwargs)
1240 self._axstack.add(key, a)
1241 self.sca(a)

C:\conda3x64\envs\p3x64\lib\site-packages\matplotlib\axes_subplots.py in init(self, fig, args, **kwargs)
65 raise ValueError(
66 ("num must be 1 <= num <= {maxn}, not {num}"
---> 67 ).format(maxn=rows
cols, num=num))
68 self._subplotspec = GridSpec(
69 rows, cols, figure=self.figure)[int(num) - 1]

ValueError: num must be 1 <= num <= 0, not 1

@matsmaiwald
Copy link
Contributor

Anyone working on this? If not, I'd like to take this up as my first issue.

@datapythonista
Copy link
Member

all yours @matsmaiwald, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Error Reporting Incorrect or improved errors from pandas good first issue Visualization plotting
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants