You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have attempted to reduce this to the smallest example that exhibits this issue - rather than a useful example. The problem is that the operation causes python to core dump.
In the original case in which I discovered this the core dump would only occur sometimes (and when I put it in a loop it would occur on different iterations). This code seems to core dump on the third iteration every time I have run it.
Code example:
import os
import pandas as pd
df = pd.read_csv(os.path.join(os.getcwd(), 'error_report.txt'), sep='\t')
for i in range(0, 1000):
print "Pre shift {}".format(i)
df['shift_F'] = df.groupby(['B', 'C'])['F'].shift(-1)
print "Post shift {}".format(i)
With the attached data file (tab separated) - code assumes in the same directory: error_report.txt
This the output I get:
python pandas_test.py
Pre shift 0
Post shift 0
Pre shift 1
Post shift 1
Pre shift 2
Segmentation fault (core dumped)
If I modify the code to use apply and then add the shifted column inside the apply function then there is no error. Similarly if I use .shift(0) I do not get the error.
Using exception stack traces I managed to pinpoint the problem. I believe it is within the group_shift_indexer procedure.
When I reintroduced the cython array boundary check option (@cython.boundscheck(True)) your use case did not crash, but instead raised a boundary violation error. The core of the problem is that the labels array, obtained from the groupby's grouper property, besides proper group integer-coded labels might contain the so called null keys (with value -1).
Lines L1358-L1359 do not properly check for this corner case. When I inject this patch:
Hi,
I have attempted to reduce this to the smallest example that exhibits this issue - rather than a useful example. The problem is that the operation causes python to core dump.
In the original case in which I discovered this the core dump would only occur sometimes (and when I put it in a loop it would occur on different iterations). This code seems to core dump on the third iteration every time I have run it.
Code example:
With the attached data file (tab separated) - code assumes in the same directory:
error_report.txt
This the output I get:
If I modify the code to use apply and then add the shifted column inside the apply function then there is no error. Similarly if I use .shift(0) I do not get the error.
Version info:
Regards
Stephen
The text was updated successfully, but these errors were encountered: