Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Fix interpolation for datetimelike dtypes #21915

Closed
wants to merge 4 commits into from

Conversation

jbrockmendel
Copy link
Member

@jbrockmendel jbrockmendel commented Jul 14, 2018

Two bugs here, one fixed, one avoided. First is in Block._interpolate where datetimelike values were not cast correctly. Second is that DataFrame.transpose will raise in some conditions (see #19198, also motivated #21908).

This adds dtype-handling code in Block._interpolate. An alternative would be to override _interpolate in subclasses. Either way works for me.

NB: This is only implemented for method='linear'. I made that explicit in the tests as a reminder to follow-up with others.

(#19199 is marked as a duplicate but it includes a bug report for the TZ-aware case that is separate bug.)

@@ -684,6 +684,8 @@ def transpose(self, *args, **kwargs):
new_axes = self._construct_axes_dict_from(self, [self._get_axis(x)
for x in axes_names])
new_values = self.values.transpose(axes_numbers)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn’t u remove the previous line?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah, thought I had removed this no-longer-needed bit, left a line behind. Will fix.


# only deal with floats
if not self.is_float:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i prefer to import these rather than use ct

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Darn, I was hoping that would catch on (I don't like the giant namespaces). Will change.

if not self.is_float:
if ct.needs_i8_conversion(self.dtype):
if ct.is_period_dtype(self.dtype):
raise NotImplementedError("PeriodDtype columns/Series don't "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let’s move code to subclasses as appropriate
this gets messy really fast otherwise

@@ -1236,6 +1250,20 @@ def func(x):

# interp each column independently
interp_values = np.apply_along_axis(func, axis, data)
if ct.needs_i8_conversion(self.dtype):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

dti = pd.date_range('2016-01-01', periods=10, tz=tz)
index = dti if use_idx else None

# Copy to avoid corrupting dti, see GH#21907
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u make this more informative ; this is very crytptix

expected.iloc[0] = pd.NaT
expected.iloc[-1] = expected.iloc[-2]

df = ser.to_frame()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you converting to frames? can u just start with frames?

@@ -1317,3 +1317,38 @@ def test_series_interpolate_intraday(self):
result = ts.reindex(new_index).interpolate(method='time')

tm.assert_numpy_array_equal(result.values, exp.values)

# TODO: De-duplicate with similar tests in test.frame.test_missing?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is much more complicated we have generic tests that already do a lot of this

you can also move these tests there

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a look for a better place.

This is also the answer to the previous question about why not starting with DataFrame: bc the test originally tested Series and DataFrame in the same test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is much more complicated we have generic tests that already do a lot of this

Both for this PR and the upcoming slew of Datetime/Timedelta/PeriodArray tests, there are a bunch of for FooArray/FooIndex/Series[foo]/DataFrame[foo-column] that are highly duplicative and I think things are falling through the cracks. Think we can find a place to put them so they can be parametrized?

@@ -360,7 +360,10 @@ def test_fillna_categorical_nan(self):
cat = Categorical([np.nan, 2, np.nan])
val = Categorical([np.nan, np.nan, np.nan])
df = DataFrame({"cats": cat, "vals": val})
res = df.fillna(df.median())
with tm.assert_produces_warning(RuntimeWarning):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated, but nice to catch.

@jbrockmendel
Copy link
Member Author

I'll hold off on pushing this until #21870 (and possibly #21903) go through to avoid rebasing hell

@jschendel jschendel added Enhancement Datetime Datetime data dtype Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Jul 15, 2018
@jbrockmendel
Copy link
Member Author

Closing temporarily to clear the queue.

@jbrockmendel jbrockmendel deleted the interp2 branch April 5, 2020 17:42
@nielsuit227
Copy link

Looks like this was lost somewhere in the process. Pity, would be great to see this somewhen :)

@jbrockmendel
Copy link
Member Author

@nielsuit227 you're welcome to pick this up and run with it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Enhancement Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: interpolation of datetime values (NaT)
4 participants