Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandas 0.23.4: when does "ValueError: Values falls before first bin" happens during grouping? #22619

Closed
Nemecsek opened this issue Sep 6, 2018 · 6 comments
Labels
Duplicate Report Duplicate issue or pull request

Comments

@Nemecsek
Copy link

Nemecsek commented Sep 6, 2018

I didn't find any reference of a Pandas error that happens during grouping: "ValueError: Values falls before first bin".
Can it be due to a internal bug?

According to my opinion it should never happen that a group cannot be done. None should be returned instead.

The trace is:

  File "new_calc.py", line 584, in calculate_groups
    grouped = df.groupby(pd.Grouper(freq="D"))
  File "/etc/bluelikon/utilities/env/local/lib/python2.7/site-packages/pandas/core/generic.py", line 6665, in groupby
    observed=observed, **kwargs)
  File "/etc/bluelikon/utilities/env/local/lib/python2.7/site-packages/pandas/core/groupby/groupby.py", line 2152, in groupby
    return klass(obj, by, **kwds)
  File "/etc/bluelikon/utilities/env/local/lib/python2.7/site-packages/pandas/core/groupby/groupby.py", line 599, in __init__
    mutated=self.mutated)
  File "/etc/bluelikon/utilities/env/local/lib/python2.7/site-packages/pandas/core/groupby/groupby.py", line 3189, in _get_grouper
    binner, grouper, obj = key._get_grouper(obj, validate=False)
  File "/etc/bluelikon/utilities/env/local/lib/python2.7/site-packages/pandas/core/resample.py", line 1281, in _get_grouper
    r._set_binner()
  File "/etc/bluelikon/utilities/env/local/lib/python2.7/site-packages/pandas/core/resample.py", line 148, in _set_binner
    self.binner, self.grouper = self._get_binner()
  File "/etc/bluelikon/utilities/env/local/lib/python2.7/site-packages/pandas/core/resample.py", line 156, in _get_binner
    binner, bins, binlabels = self._get_binner_for_time()
  File "/etc/bluelikon/utilities/env/local/lib/python2.7/site-packages/pandas/core/resample.py", line 888, in _get_binner_for_time
    return self.groupby._get_time_bins(self.ax)
  File "/etc/bluelikon/utilities/env/local/lib/python2.7/site-packages/pandas/core/resample.py", line 1333, in _get_time_bins
    ax_values, bin_edges, self.closed, hasnans=ax.hasnans)
  File "pandas/_libs/lib.pyx", line 554, in pandas._libs.lib.generate_bins_dt64
ValueError: Values falls before first bin
@TomAugspurger
Copy link
Contributor

Could you fill out the issue template, including a reproducible example and the output of pd.show_versions?

@TomAugspurger TomAugspurger added the Needs Info Clarification about behavior needed to assess issue label Sep 6, 2018
@Nemecsek
Copy link
Author

Nemecsek commented Sep 7, 2018

I didn't fill the form because I cannot make a reproducible example.

Any demo set works, but real data don't. At the moment I am trying to isolate the DataFrame rows causing the issue. I will let you know as soon as possible when I am successful in this.

I am working with an old Debian jessie that cannot be updated, and Python 2.7.

(I didn't know about pandas.show_versions(), really handy indeed)

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.9.final.0
python-bits: 64
OS: Linux
OS-release: 3.16.0-6-amd64
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: de_DE.UTF-8
LOCALE: None.None

pandas: 0.23.4
pytest: None
pip: 18.0
setuptools: 5.5.1
Cython: None
numpy: 1.15.1
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: 1.2.11
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
None

@Nemecsek
Copy link
Author

Nemecsek commented Sep 7, 2018

Got it: it has to do with time localization of df's datetimeindex.

To group my data I need first to convert the index from UTC to local time.
This sample code crashes when data are grouped by week AND datetime is localized.
If you localize to UTC or use a grouping period different from W it works:

import pandas as pd
import numpy as np
from datetime import datetime

dt = pd.date_range(start=datetime(2016,3,28,0,0,0), periods=20, freq='15min')
np.random.seed(seed=1111)
data = np.random.randint(1, high=100, size=len(dt))

df = pd.DataFrame({'DTIME': dt, 'VALUE': data})
df = df.set_index('DTIME').tz_localize("Europe/Berlin")   # <==== ok with "UTC"

grouped = df.groupby(pd.Grouper(freq="W"))   # <==== ok with "H", "D", "M", "Y"
for g in grouped:
    print(g)

This instead works without any problem: datetimeindex is first localized to UTC, then to local:

import pandas as pd
import numpy as np
from datetime import datetime

dt = pd.date_range(start=datetime(2016,3,28,0,0,0), periods=20, freq='15min')
np.random.seed(seed=1111)
data = np.random.randint(1, high=100, size=len(dt))

df = pd.DataFrame({'DTIME': dt, 'VALUE': data})
df = df.set_index('DTIME').tz_localize("UTC")
df = df.tz_convert("Europe/Berlin")

grouped = df.groupby(pd.Grouper(freq="W"))
for g in grouped:
    print(g)

@mroeschke
Copy link
Member

Thanks for the report; this looks like a duplicate of #9119 (df.groupby(pd.Grouper(freq="W")) is the same as df.resample('W'))

@mroeschke mroeschke added Duplicate Report Duplicate issue or pull request and removed Needs Info Clarification about behavior needed to assess issue labels Sep 7, 2018
@Nemecsek
Copy link
Author

Nemecsek commented Sep 8, 2018

I completely missed the previous issue. It has been opened for 4 years!

@mroeschke
Copy link
Member

Hoping to get to it soon; it's tricky since the resampling code has been there for a while. Extra eyes are always appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request
Projects
None yet
Development

No branches or pull requests

3 participants