Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame.ffill behaves different than DataFrame.interpolate(method='ffill') along axes #12918

Closed
EVaisman opened this issue Apr 18, 2016 · 8 comments · Fixed by #33959
Closed
Labels
Bug good first issue Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Milestone

Comments

@EVaisman
Copy link

EVaisman commented Apr 18, 2016

It looks like df.ffill(axis=0) has the same behavior as test_df.interpolate(method='ffill', axis=1).

from pandas.util.testing import assert_frame_equal
import numpy as np
import pandas as pd

n = np.nan
test_df = pd.DataFrame([[0, 2, n, n],
                        [1, n, 4, 6],
                        [n, 3, 5, n]])

assert_frame_equal(
    test_df.interpolate(method='ffill', axis=1),
    test_df.ffill(axis=0),
)

Is this the desired behavior?

In [2]: pandas.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Darwin
OS-release: 15.3.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.0
nose: 1.3.7
pip: 8.1.1
setuptools: 20.3.1
Cython: 0.23.4
numpy: 1.11.0
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: 4.1.2
sphinx: 1.3.5
patsy: 0.4.0
dateutil: 2.5.2
pytz: 2016.3
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.5
matplotlib: 1.4.3
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.6.0
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: 2.6.1 (dt dec pq3 ext lo64)
jinja2: 2.7.3
boto: 2.39.0
@jorisvandenbossche
Copy link
Member

I suppose you mean the axis=1 vs axis=0 to obtain the same result? (or is there something else you didn't expect?).
At first sight, that seems like a bug to me.

For other methods (eg the default 'linear'), the direction is as expected (axis=0 -> filling per column), but for method='ffill' it is swapped.

In [77]: test_df
Out[77]:
     0    1    2    3
0  0.0  2.0  NaN  NaN
1  1.0  NaN  4.0  6.0
2  NaN  3.0  5.0  NaN

In [78]: test_df.interpolate()
Out[78]:
     0    1    2    3
0  0.0  2.0  NaN  NaN
1  1.0  2.5  4.0  6.0
2  1.0  3.0  5.0  6.0

In [81]: test_df.interpolate(axis=0, method='linear')
Out[81]:
     0    1    2    3
0  0.0  2.0  NaN  NaN
1  1.0  2.5  4.0  6.0
2  1.0  3.0  5.0  6.0

In [82]: test_df.interpolate(axis=0, method='ffill')
Out[82]:
     0    1    2    3
0  0.0  2.0  2.0  2.0
1  1.0  1.0  4.0  6.0
2  NaN  3.0  5.0  5.0

Also have to note that this behaviour is already there since the beginning (tested 0.13), but the ability to specify filling methods in interpolate is also not documented, so that is maybe the reason there weren't any bug reports.

@TomAugspurger Is there a reason for this behaviour, or is it just a bug?

@jorisvandenbossche jorisvandenbossche added the Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate label Apr 18, 2016
@EVaisman
Copy link
Author

EVaisman commented Apr 18, 2016

You are correct Joris, that's what I was referring to. 

[edit: remove email history]

@TomAugspurger
Copy link
Contributor

interpolate shares a code path with fillna here, where we try one of the fillna methods, and if the method isn't valid e.g. linear we try an interpolate method.

Might need to flip the axis argument here.
@EVaisman interested in submitting a fix and some tests?

@EVaisman
Copy link
Author

EVaisman commented Apr 18, 2016

Yes! But will probably have to wait until this weekend.

@jreback
Copy link
Contributor

jreback commented Apr 18, 2016

hmm, we should not be accepting the fill methods directly in .interpolate I don't think. (or if we do, then they should be tested / listed). So let's make this an error.

@jreback jreback added the Error Reporting Incorrect or improved errors from pandas label Apr 18, 2016
@jorisvandenbossche jorisvandenbossche modified the milestones: 0.20.0, 0.19.0 Aug 29, 2016
@jreback jreback modified the milestones: 0.20.0, Next Major Release Mar 23, 2017
@JoshuaC3
Copy link

JoshuaC3 commented Sep 7, 2018

@jreback It would be nice to see ffill as a supported interpolation function. The ffill() method does not accept the limit_area kwarg with 'inside' and 'outside' options, whereas the interpolation does.

Alternatively, these kwarg could be added to .fillna, ffill and bfill methods to achieve the same results.

Which would be more desirable? Thanks.

@jreback
Copy link
Contributor

jreback commented Sep 7, 2018

i think ok to accept them here
ffill is a kind of interpolation

but leaving the existing api as it’s pretty functional as it is)

@jorisvandenbossche jorisvandenbossche added Bug and removed Error Reporting Incorrect or improved errors from pandas labels Sep 7, 2018
@IgorFobia
Copy link

I guess I am still having the same bug with pandas version 0.24.2

import numpy as np
import pandas as pd
tdf = pd.DataFrame({'a': [0, 1, np.nan], 'b': [5, np.nan, np.nan]})

Returns a dataframe tdf like this one:

  a b
0.0 5.0
1.0 NaN
NaN NaN

fillna returns what we expect

tdf.fillna(method='pad')
  a b
0.0 5.0
1.0 5.0
1.0 5.0

While

tdf.interpolate(method='pad')

applies the padding per row:

  a b
0.0 5.0
1.0 1.0
NaN NaN

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug good first issue Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants