Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REGR: Fix interpolation on empty dataframe #35543

Merged
merged 8 commits into from
Aug 17, 2020
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.1.1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ Fixed regressions

- Fixed regression where :func:`read_csv` would raise a ``ValueError`` when ``pandas.options.mode.use_inf_as_na`` was set to ``True`` (:issue:`35493`).
- Fixed regression in :class:`pandas.core.groupby.RollingGroupby` where column selection was ignored (:issue:`35486`)
- Fixed regression where :func:`interpolate` would raise a ``TypeError`` when the dataframe was empty (:issue:`35598`).
sanderland marked this conversation as resolved.
Show resolved Hide resolved

.. ---------------------------------------------------------------------------

Expand Down
3 changes: 3 additions & 0 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -6807,6 +6807,9 @@ def interpolate(

obj = self.T if should_transpose else self

if obj.empty:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would prefer that this happens in the internal method itself (called on L6861)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error happens before this call (on line 6825)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were a few changes to interpolate in 1.1.0. will run git bisect to ascertain why this is now failing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a subtle change in #34752 where

if self.ndim == 2 and np.all(self.dtypes == np.dtype(object)):

was changed to

if obj.ndim == 2 and np.all(obj.dtypes == np.dtype(object)):

does reverting this change restore the old behaviour.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the regression is from #33084, but since the error message is about DataFrame columns, I don't think the above should have been changed either. @jbrockmendel

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would prefer that this happens in the internal method itself (called on L6861)

As @sanderland notes, the check for all object dtype indeed happens here before calling into the internal method, so the check for empty thus also needs to happen here

However, @sanderland I think the df.empty check is not fully correct. Because this was also give True if you have columns but no rows. And for example an empty dataframe with a datetime64[ns] column will give a timedelta64[ns] column as result. That's something we should keep.

So I think we should explicitly check for no columns / no rows (alternatively could also add the check to the offending if obj.ndim == 2 and np.all(obj.dtypes == np.dtype(object)):, eg .. and obj.shape[0])

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sanderland can you try @jorisvandenbossche suggestion here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, @sanderland I think the df.empty check is not fully correct. Because this was also give True if you have columns but no rows. And for example an empty dataframe with a datetime64[ns] column will give a timedelta64[ns] column as result. That's something we should keep.

I tried this in 1.0.x and it does not return a timedelta dtype, I dont see why interpolate would do this either. Could you suggest a test which fails with my approach but passes in 1.0.x?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, you're fully correct. I might have been mixing up the review of two PRs, as I was also reviewing a regression for diff() and maybe was testing that method here as well .. ;) (since df.diff() for datetime64 gives timedelta64, also for empty dataframe)

Forget my comment!

return self

if method not in fillna_methods:
axis = self._info_axis_number

Expand Down
6 changes: 6 additions & 0 deletions pandas/tests/frame/methods/test_interpolate.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,12 @@ def test_interp_basic(self):
expected.loc[5, "B"] = 9
tm.assert_frame_equal(result, expected)

def test_interp_empty(self):
df = DataFrame()
sanderland marked this conversation as resolved.
Show resolved Hide resolved
result = df.interpolate()
expected = df
tm.assert_frame_equal(result, expected)

def test_interp_bad_method(self):
df = DataFrame(
{
Expand Down