Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using resample() with groupby on this DataFrame causes Segmentation Fault #8573

Closed
ginzor opened this issue Oct 17, 2014 · 4 comments
Closed
Labels
Bug Groupby Resample resample method Testing pandas testing functions or related to the test suite
Milestone

Comments

@ginzor
Copy link

ginzor commented Oct 17, 2014

When trying to resample timestamps into 5 minute time slots grouping on an id column (tried both counting and summing aggregation in 'how' parameter). In a DataFrame with TimeSeries data I get a memory crash, i.e. Segmentation Fault.

I reduced the DataFrame as far as I could in reproducing the crash. Also noted that it will not cause a segfault if I sort the index (don't know if this is needed for resample() function could not find such documentation).

import datetime
import pandas as pd

all_wins_and_wagers =\
[(1L, datetime.datetime(2013, 10, 1, 16, 20), 1L, 0L),
 (2L, datetime.datetime(2013, 10, 1, 16, 10), 1L, 0L),
 (2L, datetime.datetime(2013, 10, 1, 18, 15), 1L, 0L),
 (2L, datetime.datetime(2013, 10, 1, 16, 10, 31), 1L, 0L)]

df = pd.DataFrame.from_records(all_wins_and_wagers, columns=("ID", "timestamp", "A", "B")).set_index("timestamp")
df_resampled = df.groupby("ID").resample("5min", "sum")

Tried on following setups of pandas.

INSTALLED VERSIONS

commit: None
python: 2.7.8.final.0
python-bits: 64
OS: Linux
OS-release: 3.16-2-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.14.1
nose: 1.3.4
Cython: None
numpy: 1.9.0
scipy: None
statsmodels: None
IPython: 2.3.0
sphinx: None
patsy: None
scikits.timeseries: None
dateutil: 2.2
pytz: 2014.7
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.0.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: None
pymysql: None
psycopg2: None

INSTALLED VERSIONS

commit: None
python: 2.7.8.final.0
python-bits: 64
OS: Linux
OS-release: 3.16-2-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.13.1
Cython: None
numpy: 1.8.0
scipy: None
statsmodels: None
IPython: 2.3.0
sphinx: None
patsy: None
scikits.timeseries: None
dateutil: 1.5
pytz: 2012c
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.0.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
sqlalchemy: None
lxml: None
bs4: None
html5lib: 0.999
bq: None
apiclient: None

@jreback
Copy link
Contributor

jreback commented Oct 17, 2014

pretty sure this is fixed in 0.15rc1

see pandas.pydata.org

@jreback
Copy link
Contributor

jreback commented Oct 18, 2014

The syntax you are using is not supported, and should raise and error, use the following.

In [8]: df.groupby(['ID',pd.Grouper(freq='5min')]).sum()
Out[8]: 
                        A  B
ID timestamp                
1  2013-10-01 16:20:00  1  0
2  2013-10-01 16:10:00  2  0
   2013-10-01 18:15:00  1  0

@jreback jreback added Resample resample method Groupby Error Reporting Incorrect or improved errors from pandas API Design labels Oct 18, 2014
@jreback jreback added this to the 0.15.1 milestone Oct 18, 2014
@jreback
Copy link
Contributor

jreback commented Oct 18, 2014

hmm, I see it 'sort of' works if you sort, ok. will make this a bug then

@jreback jreback added the Bug label Oct 18, 2014
@jreback jreback modified the milestones: 0.16.0, 0.15.2 Nov 29, 2014
@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015
@jreback jreback modified the milestones: 0.18.1, Next Major Release Feb 17, 2016
@jreback jreback added the Testing pandas testing functions or related to the test suite label Feb 17, 2016
@jreback
Copy link
Contributor

jreback commented Feb 17, 2016

this appears good in master, just needs validation tests

@jreback jreback added Difficulty Novice and removed API Design Error Reporting Incorrect or improved errors from pandas labels Feb 17, 2016
jreback added a commit to jreback/pandas that referenced this issue Apr 18, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Groupby Resample resample method Testing pandas testing functions or related to the test suite
Projects
None yet
Development

No branches or pull requests

2 participants