Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix docstrings, error messages #19672

Closed
wants to merge 7 commits into from

Conversation

jbrockmendel
Copy link
Member

The discussion in #19558 makes me think getting PeriodArray and DatetimeArray implemented sooner rather than later will make things easier for other goals. In preparation for that, this PR goes through (most) of the methods that will need to be ported and adapts their error messages to be class-agnostic, and to use str.format syntax.

Also fixes a couple of copy/paste typos in docstrings.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few comments, but in general, I would defer this to when we actually changed things instead of now already trying to be "prepared" or "robust" for such changes without knowing how it will exactly be done in practice.

I would rather put effort in actually finding out how we can such a refactor to dispatch ops to Array-likes, backing both ops for Series/Block and Index, but that is something you have to sync closely with @TomAugspurger to make sure you don't conflict with what he is doing (and I think help is certainly welcome! we just need to make sure we are not doing duplicate work)

@@ -794,7 +796,7 @@ def isin(self, values):

def shift(self, n, freq=None):
"""
Specialized shift which produces a DatetimeIndex
Specialized shift which produces a new object of the same type as self
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a useful description for users (they don't necessarily know what 'self' is)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good note. But the reason for this change isn't future-looking; this is also the docstring for TimedeltaIndex.shift.

.format(typ=type(other)))
raise TypeError("cannot add {cls} and {typ}"
.format(cls=type(self).__name__,
typ=type(other).__name__))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this is an improvement in robustness, as if series would use the same code, this 'cls' would become either "Series" or "TimedeltaArray", and "Series" is not that informative, and "TimedeltaArray" should not bubble up to user when using a Series (that would be an internal implementation detail).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above is always a TimedeltaIndex given the elif statement above

@@ -804,7 +806,7 @@ def shift(self, n, freq=None):

Returns
-------
shifted : DatetimeIndex
shifted : type(self)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here


if isinstance(other, (datetime, compat.string_types)):
if isinstance(other, datetime):
# GH#18435 strings get a pass from tzawareness compat
self._assert_tzawareness_compat(other)

other = _to_m8(other, tz=self.tz)
result = func(other)
result = binop(self, other)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because DatetimeIndex.__mro__ has 12 entries, so with the standard super usage it isn't obvious where the method is defined.

@TomAugspurger
Copy link
Contributor

Haven't looked closely yet, but I haven't touched datetimeindex yet.

Including the correct class name in the error message has been an issue, so those changes should be worthwhile.

Should be able to look closer in a few days.

@jbrockmendel
Copy link
Member Author

Just pushed a fix for the shift docstring that re-evaluates it within TimedeltaIndex and DatetimeIndex. I'm wondering if there is a metaclass (or hopefully a lighter-weight option) way to do this without having to re-define the function in each subclass. Could that be what _decorators.docstring_wrapper (which is not used) is for?

@codecov
Copy link

codecov bot commented Feb 13, 2018

Codecov Report

Merging #19672 into master will increase coverage by <.01%.
The diff coverage is 93.1%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #19672      +/-   ##
==========================================
+ Coverage   91.58%   91.58%   +<.01%     
==========================================
  Files         150      150              
  Lines       48867    48875       +8     
==========================================
+ Hits        44755    44763       +8     
  Misses       4112     4112
Flag Coverage Δ
#multiple 89.96% <93.1%> (ø) ⬆️
#single 41.75% <41.37%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/indexes/datetimelike.py 97.05% <100%> (ø) ⬆️
pandas/core/indexes/datetimes.py 95.33% <90.9%> (+0.01%) ⬆️
pandas/core/indexes/timedeltas.py 90.63% <93.33%> (+0.07%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2fdf1e2...a39b22a. Read the comment docs.

@gfyoung gfyoung added Internals Related to non-user accessible pandas implementation Docs labels Feb 14, 2018
@jbrockmendel jbrockmendel changed the title DatetimeArray/PeriodArray prelims Fix docstrings, error messages Feb 14, 2018
@jbrockmendel
Copy link
Member Author

Thoughts on this? There are a number of broken cases to fix in and around these methods; ideally we'd close this out first.

.format(typ=type(other)))
raise TypeError("cannot add {cls} and {typ}"
.format(cls=type(self).__name__,
typ=type(other).__name__))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above is always a TimedeltaIndex given the elif statement above

.format(typ=type(other).__name__))
raise TypeError("cannot subtract {cls} and {typ}"
.format(cls=type(self).__name__,
typ=type(other).__name__))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here


def _sub_datelike(self, other):
# subtract a datetime from myself, yielding a TimedeltaIndex
from pandas import TimedeltaIndex
if isinstance(other, DatetimeIndex):
# require tz compat
if not self._has_same_tz(other):
raise TypeError("DatetimeIndex subtraction must have the same "
"timezones or no timezones")
raise TypeError("{cls} subtraction must have the same "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, this will always be subtraction of DatetimeIndexes ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is now, but one of the on-deck PRs will have ndarray[datetime64] go down this path too to fix:

dti = pd.date_range('2016-01-01', periods=3)
arr = np.array(['2015-01-01', '2015-02-02', '2015-03-03'], dtype='datetime64[ns]')
>>> dti - arr
DatetimeIndex(['1971-01-01', '1970-12-01', '1970-11-03'], dtype='datetime64[ns]', freq=None)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about instead of using type(self).__name__ we make a little function which can format these. I agree with @jorisvandenbossche point here, in the case of an Index type the dtype is clear, but not for a Series.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems worthwhile but a pretty low priority. Closing for now.

.format(typ=type(other).__name__))
raise TypeError("cannot subtract {cls} and {typ}"
.format(cls=type(self).__name__,
typ=type(other).__name__))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here?

@@ -59,23 +59,25 @@ def _td_index_cmp(opname, cls, nat_result=False):
"""

def wrapper(self, other):
msg = "cannot compare a TimedeltaIndex with type {0}"
func = getattr(super(TimedeltaIndex, self), opname)
msg = "cannot compare a {cls} with type {typ}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here? self is always a TimedeltaIndex

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TDI/DTI/PI have very similar implementations for these methods, some of which can be de-duplicated if we make these class-agnostic. More to the point, we want these messages to remain accurate if/when they get moved to the array level. Also in principle this could be subclassed and we'd like the name to come out right there too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add some tests which lock down these error messages? IOW the types that are supposed to appear

@@ -794,7 +796,7 @@ def isin(self, values):

def shift(self, n, freq=None):
"""
Specialized shift which produces a DatetimeIndex
Specialized shift which produces a %(klass)s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing the overriding in the subclass just to fix this name, we can maybe reword the docstring to be clear enough?
Eg mention it works for Datetime- and TimedeltaIndex?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was the idea in the earlier version. I'm open to suggestions.

@jbrockmendel jbrockmendel mentioned this pull request Feb 16, 2018
4 tasks

def _sub_datelike(self, other):
# subtract a datetime from myself, yielding a TimedeltaIndex
from pandas import TimedeltaIndex
if isinstance(other, DatetimeIndex):
# require tz compat
if not self._has_same_tz(other):
raise TypeError("DatetimeIndex subtraction must have the same "
"timezones or no timezones")
raise TypeError("{cls} subtraction must have the same "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about instead of using type(self).__name__ we make a little function which can format these. I agree with @jorisvandenbossche point here, in the case of an Index type the dtype is clear, but not for a Series.

@@ -901,6 +903,11 @@ def _sub_datelike_dti(self, other):
new_values[mask] = libts.iNaT
return new_values.view('i8')

@Substitution(klass='DatetimeIndex')
@Appender(DatetimeIndexOpsMixin.shift.__doc__)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be a shared doc-string instead (and same for the other things you added). this is the existing pattern.

@@ -59,23 +59,25 @@ def _td_index_cmp(opname, cls, nat_result=False):
"""

def wrapper(self, other):
msg = "cannot compare a TimedeltaIndex with type {0}"
func = getattr(super(TimedeltaIndex, self), opname)
msg = "cannot compare a {cls} with type {typ}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add some tests which lock down these error messages? IOW the types that are supposed to appear

@@ -409,7 +412,7 @@ def _add_datelike(self, other):
result = checked_add_with_arr(i8, other.value,
arr_mask=self._isnan)
result = self._maybe_mask_results(result, fill_value=iNaT)
return DatetimeIndex(result, name=self.name, copy=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what changed here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just got indented (next line in the diff) b/c this branch is never reached except for in the indented case.

@jreback jreback added the Error Reporting Incorrect or improved errors from pandas label Feb 18, 2018
@jbrockmendel jbrockmendel deleted the array_prelims branch June 22, 2018 03:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Error Reporting Incorrect or improved errors from pandas Internals Related to non-user accessible pandas implementation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants