-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Fix interpolation for datetimelike dtypes #21915
Conversation
pandas/core/generic.py
Outdated
@@ -684,6 +684,8 @@ def transpose(self, *args, **kwargs): | |||
new_axes = self._construct_axes_dict_from(self, [self._get_axis(x) | |||
for x in axes_names]) | |||
new_values = self.values.transpose(axes_numbers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn’t u remove the previous line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yah, thought I had removed this no-longer-needed bit, left a line behind. Will fix.
|
||
# only deal with floats | ||
if not self.is_float: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i prefer to import these rather than use ct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Darn, I was hoping that would catch on (I don't like the giant namespaces). Will change.
if not self.is_float: | ||
if ct.needs_i8_conversion(self.dtype): | ||
if ct.is_period_dtype(self.dtype): | ||
raise NotImplementedError("PeriodDtype columns/Series don't " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let’s move code to subclasses as appropriate
this gets messy really fast otherwise
@@ -1236,6 +1250,20 @@ def func(x): | |||
|
|||
# interp each column independently | |||
interp_values = np.apply_along_axis(func, axis, data) | |||
if ct.needs_i8_conversion(self.dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
dti = pd.date_range('2016-01-01', periods=10, tz=tz) | ||
index = dti if use_idx else None | ||
|
||
# Copy to avoid corrupting dti, see GH#21907 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can u make this more informative ; this is very crytptix
expected.iloc[0] = pd.NaT | ||
expected.iloc[-1] = expected.iloc[-2] | ||
|
||
df = ser.to_frame() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are you converting to frames? can u just start with frames?
@@ -1317,3 +1317,38 @@ def test_series_interpolate_intraday(self): | |||
result = ts.reindex(new_index).interpolate(method='time') | |||
|
|||
tm.assert_numpy_array_equal(result.values, exp.values) | |||
|
|||
# TODO: De-duplicate with similar tests in test.frame.test_missing? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is much more complicated we have generic tests that already do a lot of this
you can also move these tests there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take a look for a better place.
This is also the answer to the previous question about why not starting with DataFrame: bc the test originally tested Series and DataFrame in the same test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is much more complicated we have generic tests that already do a lot of this
Both for this PR and the upcoming slew of Datetime/Timedelta/PeriodArray tests, there are a bunch of for FooArray/FooIndex/Series[foo]/DataFrame[foo-column] that are highly duplicative and I think things are falling through the cracks. Think we can find a place to put them so they can be parametrized?
@@ -360,7 +360,10 @@ def test_fillna_categorical_nan(self): | |||
cat = Categorical([np.nan, 2, np.nan]) | |||
val = Categorical([np.nan, np.nan, np.nan]) | |||
df = DataFrame({"cats": cat, "vals": val}) | |||
res = df.fillna(df.median()) | |||
with tm.assert_produces_warning(RuntimeWarning): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated, but nice to catch.
Closing temporarily to clear the queue. |
Looks like this was lost somewhere in the process. Pity, would be great to see this somewhen :) |
@nielsuit227 you're welcome to pick this up and run with it |
git diff upstream/master -u -- "*.py" | flake8 --diff
Two bugs here, one fixed, one avoided. First is in
Block._interpolate
where datetimelike values were not cast correctly. Second is thatDataFrame.transpose
will raise in some conditions (see #19198, also motivated #21908).This adds dtype-handling code in
Block._interpolate
. An alternative would be to override_interpolate
in subclasses. Either way works for me.NB: This is only implemented for
method='linear'
. I made that explicit in the tests as a reminder to follow-up with others.(#19199 is marked as a duplicate but it includes a bug report for the TZ-aware case that is separate bug.)