Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: DataFrame.reindex not following limit - 方法参数的错误 #28631

Closed
french-home opened this issue Sep 26, 2019 · 10 comments · Fixed by #28671
Milestone

Comments

@french-home
Copy link

@french-home french-home commented Sep 26, 2019

Python版本: 3.6.5
版本:pandas == 0.24.2

import pandas as pd

Data = [
    ["A", "A", "A"],
    ["B", "B", "B"],
    ["C", "C", "C"],
    ["D", "D", "D"],
]
test1 = pd.DataFrame(Data)
print(test1)
print("------------")
test1 = test1.reindex([0, 1, 2, 3, 4, 5], method="ffill", limit=1)
print(test1)

reindex_bug

limit参数限制数据继续向后填充

版本:pandas==0.25.1

import pandas as pd

Data = [
    ["A", "A", "A"],
    ["B", "B", "B"],
    ["C", "C", "C"],
    ["D", "D", "D"],
]
test1 = pd.DataFrame(Data)
print(test1)
print("------------")
test1 = test1.reindex([0, 1, 2, 3, 4, 5], method="ffill", limit=1)
print(test1)

reindex_bug_2

limit参数并没有限制数据继续向后填充

@datapythonista

This comment has been minimized.

Copy link
Member

@datapythonista datapythonista commented Sep 26, 2019

谢谢 @french-home for reporting this. It'd be very useful if you can provide what's the output in both versions. Thanks!

@french-home

This comment has been minimized.

Copy link
Author

@french-home french-home commented Sep 26, 2019

@datapythonista

This comment has been minimized.

Copy link
Member

@datapythonista datapythonista commented Sep 26, 2019

Isn't it easier to simply copy it in a comment here? You can send it by email if that's a problem, but I think that can be easier.

@french-home

This comment has been minimized.

Copy link
Author

@french-home french-home commented Sep 26, 2019

@datapythonista

This comment has been minimized.

Copy link
Member

@datapythonista datapythonista commented Sep 26, 2019

Sure, please send it to ************* (edited) and I'll update the description for you.

@french-home

This comment has been minimized.

Copy link
Author

@french-home french-home commented Sep 26, 2019

@datapythonista

This comment has been minimized.

Copy link
Member

@datapythonista datapythonista commented Sep 26, 2019

Thanks a lot @french-home, it does look like a bug. I updated the description with your output, will have a look to see what's wrong when I have time.

Thanks for reporting it. If you'd like to research what's the problem yourself, and you need help with that, please let me know, happy to help.

@jorisvandenbossche jorisvandenbossche changed the title BUG DataFream.rindex方法参数的错误 BUG: DataFream.reindex not following limit Sep 26, 2019
@jorisvandenbossche

This comment has been minimized.

Copy link
Member

@jorisvandenbossche jorisvandenbossche commented Sep 26, 2019

So to make it a bit more explicit. Older versions correctly apply the "limit"

In [8]: pd.__version__  
Out[8]: '0.24.2'

In [9]: test1.reindex([0, 1, 2, 3, 4, 5], method="ffill", limit=1)
Out[9]: 
     0    1    2
0    A    A    A
1    B    B    B
2    C    C    C
3    D    D    D
4    D    D    D
5  NaN  NaN  NaN

while 0.25 / master does not:

In [22]: pd.__version__   
Out[22]: '0.26.0.dev0+418.g1cfba0a87'

In [23]: test1.reindex([0, 1, 2, 3, 4, 5], method="ffill", limit=1)
Out[23]: 
   0  1  2
0  A  A  A
1  B  B  B
2  C  C  C
3  D  D  D
4  D  D  D
5  D  D  D

So this is a regression.

@jorisvandenbossche jorisvandenbossche added this to the 0.25.2 milestone Sep 26, 2019
@datapythonista datapythonista changed the title BUG: DataFream.reindex not following limit BUG: DataFrame.reindex not following limit - 方法参数的错误 Sep 26, 2019
@jorisvandenbossche

This comment has been minimized.

Copy link
Member

@jorisvandenbossche jorisvandenbossche commented Sep 26, 2019

So this boils down to RangeIndex.get_indexer no longer taking the limit keyword into account:

In [42]: idx = pd.Index(range(4))                                                                                                                                                                                  

In [43]: target = pd.Index([0, 1, 2, 3, 4, 5])                                                                                                                                                                     

In [44]: idx.get_indexer(target, method='pad', limit=1)                                                                                                                                                            
Out[44]: array([0, 1, 2, 3, 3, 3])

while the last element in that result should be -1.

This seems to be cause by #27119 (cc @toobaz )

@toobaz

This comment has been minimized.

Copy link
Member

@toobaz toobaz commented Sep 26, 2019

This seems to be cause by #27119 (cc @toobaz )

Definitely. I forgot to forward the limit argument here:

return super().get_indexer(target, method=method,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.