Issue with 'arrange' when df has an index #47

omri374 · 2018-01-09T13:16:26Z

Hi,
Please take a look at the following example:

from dfply import *
utime = pd.DataFrame({"u":1,"eventTime":["01-01-1971 01:04:00","01-01-1971 02:07:00","01-01-1971 01:09:00","01-01-1971 01:10:00"]})
print(utime >> arrange(X.eventTime))

utime = utime.set_index("u")
print(utime >> d.arrange(X.eventTime))

In the first option, the result is as expected. When introducing an index, the result is incorrect and contains 4 times as many values as before.

I'm not sure if it's bug or an expected behavior, as I'm a newbie to pandas and to indices of data frames.

output for the code:
eventTime u
0 01-01-1971 01:04:00 1
2 01-01-1971 01:09:00 1
3 01-01-1971 01:10:00 1
1 01-01-1971 02:07:00 1
eventTime
u
1 01-01-1971 01:04:00
1 01-01-1971 02:07:00
1 01-01-1971 01:09:00
1 01-01-1971 01:10:00
1 01-01-1971 01:04:00
1 01-01-1971 02:07:00
1 01-01-1971 01:09:00
1 01-01-1971 01:10:00
1 01-01-1971 01:04:00
1 01-01-1971 02:07:00
1 01-01-1971 01:09:00
1 01-01-1971 01:10:00
1 01-01-1971 01:04:00
1 01-01-1971 02:07:00
1 01-01-1971 01:09:00
1 01-01-1971 01:10:00

kieferk · 2018-01-10T14:04:08Z

Good catch! This is in fact a bug. It was happening because I was using the original dataframe's index to sort, then re-indexing with the sorted indices. When there were duplicate indices it would duplicate the rows.

Should be fixed now. I just changed to indexing using .iloc instead.

I tried the same on my machine with the new master branch:

from dfply import *
utime = pd.DataFrame({"u":1,"eventTime":["01-01-1971 01:04:00","01-01-1971 02:07:00","01-01-1971 01:09:00","01-01-1971 01:10:00"]})

print(utime >> arrange(X.eventTime))
             eventTime  u
0  01-01-1971 01:04:00  1
2  01-01-1971 01:09:00  1
3  01-01-1971 01:10:00  1
1  01-01-1971 02:07:00  1

utime = utime.set_index("u")

print(utime >> arrange(X.eventTime))
             eventTime
u                     
1  01-01-1971 01:04:00
1  01-01-1971 01:09:00
1  01-01-1971 01:10:00
1  01-01-1971 02:07:00

Which is the behavior you expected. If you pull the master branch and reinstall it should work.

kieferk closed this as completed Jan 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with 'arrange' when df has an index #47

Issue with 'arrange' when df has an index #47

omri374 commented Jan 9, 2018

kieferk commented Jan 10, 2018

Issue with 'arrange' when df has an index #47

Issue with 'arrange' when df has an index #47

Comments

omri374 commented Jan 9, 2018

kieferk commented Jan 10, 2018