-
-
Notifications
You must be signed in to change notification settings - Fork 19.1k
Description
-
I have searched the [pandas] tag on StackOverflow for similar questions.
-
I have asked my usage related question on StackOverflow.
I have asked this question on SO but did not get any answers there. Below I give a simplified version of my SO question.
I am not sure whether this is a bug or whether I misunderstand how indexing within groups works.
https://stackoverflow.com/questions/62149056/pandas-get-loc-with-ffill-gives-unexpected-results-within-group
Assume you have the following dataframe
import pandas as pd
# pandas==1.0.4
df = pd.DataFrame({'idxDigits': [1, 1, 2, 2]},
index=pd.Index([0, 1 , 10, 11], name='myIdx'))
print(df)
# idxDigits
# myIdx
# 0 1
# 1 1
# 10 2
# 11 2
and you want to find for each idxDigits
number the dataframe entry that is at or before a user-specified index value idxSearchValue
.
My approach was to define the following function
def mySelect(x, idxSearchValue):
print('idxSearchValue: {}'.format(idxSearchValue))
idx = x.index.get_loc(idxSearchValue, 'ffill')
return x.iloc[[idx]].reset_index()
and apply it via groupby
res = df.groupby(['idxDigits'], as_index=False).apply(mySelect, idxSearchValue=10)
# idxSearchValue: 10
# idxSearchValue: 10
Although there is a perfect match for idxSearchValue = 10
, we get the result
print(res.reset_index(drop=True))
# myIdx idxDigits
# 0 1 1
# 1 11 2
So, my question is:
Why does get_loc
for group idxDigits==2
return myIdx = 11
although there is a perfect match within the group for idxSearchValue = 10
?
BTW: Splitting the dataframe manually and applying get_loc
gives the expected result
df1 = df[df.idxDigits == 1]
print(df1.iloc[[df1.index.get_loc(10, 'ffill')]].reset_index())
# myIdx idxDigits
# 0 1 1
df2 = df[df.idxDigits == 2]
print(df2.iloc[[df2.index.get_loc(10, 'ffill')]].reset_index())
# myIdx idxDigits
# 0 10 2