Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: DataFrame.reindex not following limit - 方法参数的错误 #28631

Closed
french-home opened this issue Sep 26, 2019 · 10 comments · Fixed by #28671
Closed

BUG: DataFrame.reindex not following limit - 方法参数的错误 #28631

french-home opened this issue Sep 26, 2019 · 10 comments · Fixed by #28671
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@french-home
Copy link

french-home commented Sep 26, 2019

Python版本: 3.6.5
版本:pandas == 0.24.2

import pandas as pd

Data = [
    ["A", "A", "A"],
    ["B", "B", "B"],
    ["C", "C", "C"],
    ["D", "D", "D"],
]
test1 = pd.DataFrame(Data)
print(test1)
print("------------")
test1 = test1.reindex([0, 1, 2, 3, 4, 5], method="ffill", limit=1)
print(test1)

reindex_bug

limit参数限制数据继续向后填充

版本:pandas==0.25.1

import pandas as pd

Data = [
    ["A", "A", "A"],
    ["B", "B", "B"],
    ["C", "C", "C"],
    ["D", "D", "D"],
]
test1 = pd.DataFrame(Data)
print(test1)
print("------------")
test1 = test1.reindex([0, 1, 2, 3, 4, 5], method="ffill", limit=1)
print(test1)

reindex_bug_2

limit参数并没有限制数据继续向后填充

@datapythonista
Copy link
Member

谢谢 @french-home for reporting this. It'd be very useful if you can provide what's the output in both versions. Thanks!

@datapythonista datapythonista added Bug Indexing Related to indexing on series/frames, not to indexes themselves labels Sep 26, 2019
@french-home
Copy link
Author

french-home commented Sep 26, 2019 via email

@datapythonista
Copy link
Member

Isn't it easier to simply copy it in a comment here? You can send it by email if that's a problem, but I think that can be easier.

@french-home
Copy link
Author

french-home commented Sep 26, 2019 via email

@datapythonista
Copy link
Member

datapythonista commented Sep 26, 2019

Sure, please send it to ************* (edited) and I'll update the description for you.

@french-home
Copy link
Author

french-home commented Sep 26, 2019 via email

@datapythonista
Copy link
Member

Thanks a lot @french-home, it does look like a bug. I updated the description with your output, will have a look to see what's wrong when I have time.

Thanks for reporting it. If you'd like to research what's the problem yourself, and you need help with that, please let me know, happy to help.

@jorisvandenbossche jorisvandenbossche changed the title BUG DataFream.rindex方法参数的错误 BUG: DataFream.reindex not following limit Sep 26, 2019
@jorisvandenbossche
Copy link
Member

So to make it a bit more explicit. Older versions correctly apply the "limit"

In [8]: pd.__version__  
Out[8]: '0.24.2'

In [9]: test1.reindex([0, 1, 2, 3, 4, 5], method="ffill", limit=1)
Out[9]: 
     0    1    2
0    A    A    A
1    B    B    B
2    C    C    C
3    D    D    D
4    D    D    D
5  NaN  NaN  NaN

while 0.25 / master does not:

In [22]: pd.__version__   
Out[22]: '0.26.0.dev0+418.g1cfba0a87'

In [23]: test1.reindex([0, 1, 2, 3, 4, 5], method="ffill", limit=1)
Out[23]: 
   0  1  2
0  A  A  A
1  B  B  B
2  C  C  C
3  D  D  D
4  D  D  D
5  D  D  D

So this is a regression.

@jorisvandenbossche jorisvandenbossche added the Regression Functionality that used to work in a prior pandas version label Sep 26, 2019
@jorisvandenbossche jorisvandenbossche added this to the 0.25.2 milestone Sep 26, 2019
@datapythonista datapythonista changed the title BUG: DataFream.reindex not following limit BUG: DataFrame.reindex not following limit - 方法参数的错误 Sep 26, 2019
@jorisvandenbossche
Copy link
Member

So this boils down to RangeIndex.get_indexer no longer taking the limit keyword into account:

In [42]: idx = pd.Index(range(4))                                                                                                                                                                                  

In [43]: target = pd.Index([0, 1, 2, 3, 4, 5])                                                                                                                                                                     

In [44]: idx.get_indexer(target, method='pad', limit=1)                                                                                                                                                            
Out[44]: array([0, 1, 2, 3, 3, 3])

while the last element in that result should be -1.

This seems to be cause by #27119 (cc @toobaz )

@toobaz
Copy link
Member

toobaz commented Sep 26, 2019

This seems to be cause by #27119 (cc @toobaz )

Definitely. I forgot to forward the limit argument here:

return super().get_indexer(target, method=method,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants