Using resample() with groupby on this DataFrame causes Segmentation Fault #8573

ginzor · 2014-10-17T13:33:26Z

When trying to resample timestamps into 5 minute time slots grouping on an id column (tried both counting and summing aggregation in 'how' parameter). In a DataFrame with TimeSeries data I get a memory crash, i.e. Segmentation Fault.

I reduced the DataFrame as far as I could in reproducing the crash. Also noted that it will not cause a segfault if I sort the index (don't know if this is needed for resample() function could not find such documentation).

import datetime
import pandas as pd

all_wins_and_wagers =\
[(1L, datetime.datetime(2013, 10, 1, 16, 20), 1L, 0L),
 (2L, datetime.datetime(2013, 10, 1, 16, 10), 1L, 0L),
 (2L, datetime.datetime(2013, 10, 1, 18, 15), 1L, 0L),
 (2L, datetime.datetime(2013, 10, 1, 16, 10, 31), 1L, 0L)]

df = pd.DataFrame.from_records(all_wins_and_wagers, columns=("ID", "timestamp", "A", "B")).set_index("timestamp")
df_resampled = df.groupby("ID").resample("5min", "sum")

Tried on following setups of pandas.

INSTALLED VERSIONS

commit: None
python: 2.7.8.final.0
python-bits: 64
OS: Linux
OS-release: 3.16-2-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.14.1
nose: 1.3.4
Cython: None
numpy: 1.9.0
scipy: None
statsmodels: None
IPython: 2.3.0
sphinx: None
patsy: None
scikits.timeseries: None
dateutil: 2.2
pytz: 2014.7
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.0.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: None
pymysql: None
psycopg2: None

INSTALLED VERSIONS

commit: None
python: 2.7.8.final.0
python-bits: 64
OS: Linux
OS-release: 3.16-2-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.13.1
Cython: None
numpy: 1.8.0
scipy: None
statsmodels: None
IPython: 2.3.0
sphinx: None
patsy: None
scikits.timeseries: None
dateutil: 1.5
pytz: 2012c
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.0.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
sqlalchemy: None
lxml: None
bs4: None
html5lib: 0.999
bq: None
apiclient: None

jreback · 2014-10-17T15:50:13Z

pretty sure this is fixed in 0.15rc1

see pandas.pydata.org

jreback · 2014-10-18T12:41:10Z

The syntax you are using is not supported, and should raise and error, use the following.

In [8]: df.groupby(['ID',pd.Grouper(freq='5min')]).sum()
Out[8]: 
                        A  B
ID timestamp                
1  2013-10-01 16:20:00  1  0
2  2013-10-01 16:10:00  2  0
   2013-10-01 18:15:00  1  0

jreback · 2014-10-18T12:42:26Z

hmm, I see it 'sort of' works if you sort, ok. will make this a bug then

jreback · 2016-02-17T13:25:24Z

this appears good in master, just needs validation tests

closes pandas-dev#8573

jreback added Resample resample method Groupby Error Reporting Incorrect or improved errors from pandas API Design labels Oct 18, 2014

jreback added this to the 0.15.1 milestone Oct 18, 2014

jreback added the Bug label Oct 18, 2014

jreback modified the milestones: 0.16.0, 0.15.2 Nov 29, 2014

jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015

kelvin22 mentioned this issue May 29, 2015

resample() with how=count causes Segmentation Fault #10228

Closed

jreback modified the milestones: 0.18.1, Next Major Release Feb 17, 2016

jreback added the Testing pandas testing functions or related to the test suite label Feb 17, 2016

jreback added Difficulty Novice and removed API Design Error Reporting Incorrect or improved errors from pandas labels Feb 17, 2016

jreback added a commit to jreback/pandas that referenced this issue Apr 18, 2016

TST: validation tests for resample segfault

ec2d7ad

closes pandas-dev#8573

jreback closed this as completed in 2267bd3 Apr 19, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using resample() with groupby on this DataFrame causes Segmentation Fault #8573

Using resample() with groupby on this DataFrame causes Segmentation Fault #8573

ginzor commented Oct 17, 2014

jreback commented Oct 17, 2014

jreback commented Oct 18, 2014

jreback commented Oct 18, 2014

jreback commented Feb 17, 2016

Using resample() with groupby on this DataFrame causes Segmentation Fault #8573

Using resample() with groupby on this DataFrame causes Segmentation Fault #8573

Comments

ginzor commented Oct 17, 2014

INSTALLED VERSIONS

INSTALLED VERSIONS

jreback commented Oct 17, 2014

jreback commented Oct 18, 2014

jreback commented Oct 18, 2014

jreback commented Feb 17, 2016