You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Good catch! This is in fact a bug. It was happening because I was using the original dataframe's index to sort, then re-indexing with the sorted indices. When there were duplicate indices it would duplicate the rows.
Should be fixed now. I just changed to indexing using .iloc instead.
I tried the same on my machine with the new master branch:
Hi,
Please take a look at the following example:
from dfply import *
utime = pd.DataFrame({"u":1,"eventTime":["01-01-1971 01:04:00","01-01-1971 02:07:00","01-01-1971 01:09:00","01-01-1971 01:10:00"]})
print(utime >> arrange(X.eventTime))
utime = utime.set_index("u")
print(utime >> d.arrange(X.eventTime))
In the first option, the result is as expected. When introducing an index, the result is incorrect and contains 4 times as many values as before.
I'm not sure if it's bug or an expected behavior, as I'm a newbie to pandas and to indices of data frames.
output for the code:
eventTime u
0 01-01-1971 01:04:00 1
2 01-01-1971 01:09:00 1
3 01-01-1971 01:10:00 1
1 01-01-1971 02:07:00 1
eventTime
u
1 01-01-1971 01:04:00
1 01-01-1971 02:07:00
1 01-01-1971 01:09:00
1 01-01-1971 01:10:00
1 01-01-1971 01:04:00
1 01-01-1971 02:07:00
1 01-01-1971 01:09:00
1 01-01-1971 01:10:00
1 01-01-1971 01:04:00
1 01-01-1971 02:07:00
1 01-01-1971 01:09:00
1 01-01-1971 01:10:00
1 01-01-1971 01:04:00
1 01-01-1971 02:07:00
1 01-01-1971 01:09:00
1 01-01-1971 01:10:00
The text was updated successfully, but these errors were encountered: