Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve sub-second datetime plotting and documentation #10076

Merged
merged 20 commits into from
Jan 19, 2018

Conversation

nmartensen
Copy link
Contributor

@nmartensen nmartensen commented Dec 22, 2017

PR Summary

Users trying to plot extremely short (sub-millisecond) time intervals using datetime will run into issues, as shown in #10073. I'd like to make the limitations more explicit in the documentation, and investigate if the situation can be improved without making things much more complicated.

Feedback is welcome.

Main changes right now are:

  • the addition of a note to the MicrosecondLocator docstring, and
  • a small change to microsecond rounding.

The rounding change improves the situation when using the early-year workaround and should not have any effect anywhere else.

The remaining issue (the reason for the warning at the end of the note) is the 10-millisecond "snapping distance" around full seconds (dates.py lines 292 to 296). This is needed in order to produce correct tick labels for plots of longer time intervals, therefore it cannot simply be dropped.

Perhaps an optional microsecond_precision parameter (default: 20) could be added to _from_ordinalf() and num2date(), with microseconds then being rounded to nearest integer multiple of this? The MicrosecondLocator would be the only one to use it, and can set it to 1. Would something like this be acceptable? Then the warning in the note could be dropped.

PR Checklist

  • Has Pytest style unit tests
  • Code is PEP 8 compliant
  • New features are documented, with examples if plot related
  • Documentation is sphinx and numpydoc compliant
  • Added an entry to doc/users/next_whats_new/ if major new feature (follow instructions in README.rst there)
  • Documented in doc/api/api_changes.rst if API changed in a backward-incompatible way

@nmartensen
Copy link
Contributor Author

updated with new commits:

  • replace the original full-second snapping with musec-interval rounding
  • DateFormatter: use full musec precision tick labeling when using '%f' in strftime format string, but otherwise always round
  • add a warning when detecting years where the precision might not be sufficient.

@nmartensen
Copy link
Contributor Author

With these commits, output with year <= 20 now looks exactly like the "Expected outcome" in #10073. Higher years trigger a warning. The precision issues with modern years are clearly visible in the tick labels and are not hidden in any way.

Whenever the numeric input is small enough to provide actual microsecond
precision, we don't want to always round down, but instead to the
nearest microsecond.

The difference becomes visible when the AutoDateLocator picks the
MicrosecondLocator, and the format string for the tick labels is
configured to show microseconds. Support for both was added in
Matplotlib 2.0.
_to_ordinalf is used by num2date to actually convert floating
point numbers to datetime objects. The main purpose of this conversion
is to prepare the datetimes from which tick labels are generated. Since
datetimes have microsecond resolution, but the input (float) does not
have microsecond precision for any modern date, a 10-microsecond
"snapping distance" around full seconds was used to obtain timestamps
suitable for tick labels with per-second and lower resolution.

However, when higher resolution is needed (which is now possible after
the introduction of Microsecond support in AutoDateLocator and
-Formatter in Matplotlib 2.0), unconditional full-second distance
snapping is not appropriate: It is unable to perform rounding to
milliseconds, and in addition may lead to multiple identical tick labels
at highest plotting time resolution.

The solution implemented here is to introduce a microsecond precision
parameter (`musec_prec`) to the appropriate functions and function
calls, and perform rounding to the nearest multiple of the given number
of microseconds. The default value of 20 offers the same effect as the
previous "snapping distance" around full seconds. In addition, it can
also be turned off (when set to 1) or be used to perform rounding to
full milliseconds.
DateLocator: viewlim_to_dt and datalim_to_dt should always get the most
accurate datetimes we can offer, so turn off the newly introduced
interval rounding for them.

AutoDateFormatter: We need to decide whether the tick labels should be
rounded when formatting them or not.  Rounding is most likely always
desired, except when exact microseconds are needed.

Inclusion of microsecond resolution in the strftime format string is
used as an indication that the user really wants to see microseconds and
they are relevant, so simply do not round them in that case.
The proposed workaround now works correctly, after the musec precision
handling has been added. Therefore the warning is no longer needed.
The warning threshold of is based on the assumption that the user really
wants full microsecond resolution in the plot, and expects the plot
frame to come out nicely (no jagged corners). Besides being able to
identify accurate tick labels/positions, it requires a small "accuracy
margin" for the drawing operations.

If we assume a desired accuracy of 50ns in the underlying numerical time
data, we need to go back to at least the year 27. It cannot hurt to
start warning about the issue slightly earlier, hence the treshold is
set to 20.
@nmartensen
Copy link
Contributor Author

nmartensen commented Dec 27, 2017

more updates:

  • commit messages provide more background information
  • code changes are PEP 8 compliant
  • updated test results (test_axes.py::test_date_timezone_*, test_dates.py::test_DateFormatter, test_auto_date_locator*)

Unfortunately github does not support stacked pull requests yet. Otherwise this WIP would consist of three PRs, one on top of the previous:

  1. documentation changes (first three commits)
  2. changes to microsecond rounding (remaining commits)
  3. improved millisecond tick labeling without microseconds (not included yet)

Commits implementing (3) already exist, but that's probably too much for this single PR. If you prefer, (1) and (2) can also be split into separate pull requests.

@tacaswell tacaswell added this to the v2.2 milestone Jan 2, 2018
@tacaswell
Copy link
Member

@nmartensen Thanks for working on this!

However, I am not sure that this is the best approach here. Adding a kwarg to all of the time related functions is a bit disruptive.

How hard would it be to up the precision of the internal storage type?

@efiring
Copy link
Member

efiring commented Jan 2, 2018

It seems to me that time-handling with high precision is so fundamentally broken (e.g., it ignores leap seconds) with all calendar-related code, not to mention matplotlib's use of double precision days since an absurdly long-ago epoch, that trying to patch it up with a hack--a fake year--is just making things more confusing. Wouldn't it make more sense for the user to pick an origin (e.g., the beginning of an experiment) and label the axis with microseconds (or milliseconds, or floating-point seconds) from that origin? Maybe make custom locators and formatters?

@jklymak
Copy link
Member

jklymak commented Jan 2, 2018

datetime supports microseconds, and we purport to plot datetimes. So I guess we should be able to handle microsecond input.

  1. We could stop translating to epoch and let the unit converter carry around a time=0 internal variable (probably easiest if its a day=0 datetime).
  2. We could make datenums be np.longdouble.

I think 2 is easiest, though I'm not sure if our plotting respects np.longdouble? which you'd want to get the actual position on the axis correct.

I think 1 is most platform independent? 1) would probably just require date2num and num2date to have optional day=0 kwargs that allow an offset in calculating the floating-point value for drawing.

@efiring
Copy link
Member

efiring commented Jan 3, 2018

Using long double is not an option--essentially, there is no such thing. See https://docs.scipy.org/doc/numpy-dev/user/basics.types.html.

Changing our datenum is a big api break. If we add an alternative--like datetime64, in one or more of its variants--we would have to maintain full compatibility with existing datenums. This might be the way to go, since we should have native datetime64 support anyway.

What we have is a bit silly, but it works for normal dates and times. Datetime has microseconds, but given that it doesn't handle leap seconds, it is a mistake to think it is handling time units with high accuracy. Furthermore, what is a sensible way of labeling ticks on a time axis covering extremely short intervals? I doubt that it involves year, month, day, hour, minute, second, and microseconds.

I think the best course of action here is to start by documenting the present limitation, and then carefully think through strategies for improvement, starting with the need for our own datetime64 support (recognizing that datetime64 is flawed, but we are probably stuck with it.)

@jklymak
Copy link
Member

jklymak commented Jan 3, 2018

@efiring yeah I agree.

If thats the consensus, maybe it makes more sense to deprecate the microsecond Locator so people don't expect it to work?

WRT datetime64, we already have support: #9779

@efiring
Copy link
Member

efiring commented Jan 3, 2018 via email

@nmartensen
Copy link
Contributor Author

If thats the consensus, maybe it makes more sense to deprecate the microsecond Locator so people don't expect it to work?

Users not knowing about implementation details (such as microsecondlocators) may still expect it to work, since plotting datetime.datetime in general is supported, and datetime.datetime has microseconds.

What's wrong with documenting the limitations and making it work as good as possible? The note added to the documentation already contains advice to use some other time format.

@jklymak
Copy link
Member

jklymak commented Jan 7, 2018

If it could be done without a lot of extra code outside the locator and formatter it’s probably fine.

I think the current situation plus a better doc/info output would suffice.

addition of a new kwarg to num2date and _from_ordinalf is too intrusive.
Since plotting microsecond time ranges with current dates isn't really
possible, we don't need to try, so we can simply always do the musec
interval rounding.

Insert a hard-coded threshold making microsecond plotting still work for
users trying to use the 'early-year' workaround hack.
This reverts commit 63b9849.

Latest code changes now lead to the original result images from the axes
tests.
@efiring
Copy link
Member

efiring commented Jan 7, 2018

What is the use case for a microsecond locator? Under what circumstances does it really make sense to use datetime machinery and calendars for generating tick labels for extremely short intervals? What can a user do with such a locator that cannot be done as well or better with plain numbers, using non-date-specific locators?
Sometimes things get into mpl before we realize they shouldn't; I think the microsecond locator is in that category. Matplotlib suffers from bloat. When we have the opportunity to trim it out, we should do so.
I'm open to being convinced otherwise if there is a compelling use case.

Copy link
Member

@jklymak jklymak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, except for the few minor points below. Certainly having 20 micro-second resolution work nicely is better than having quasi random numbers spit out.

@@ -1296,6 +1304,11 @@ def get_locator(self, dmin, dmax):
locator = RRuleLocator(rrule, self.tz)
else:
locator = MicrosecondLocator(interval, tz=self.tz)
if dmin.year > 20 and interval < 1000:
warnings.warn('Plotting microsecond time intervals is not'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd do _log.warn so knowledgeable users can turn this warning off.

# Round down to the nearest microsecond.
dt += datetime.timedelta(microseconds=int(remainder * MUSECONDS_PER_DAY))
# Round to the nearest microsecond.
dt += datetime.timedelta(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be an else of the if below, couldn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then the code below would need to change. The above is needed to set the hours, minutes, and seconds; the stuff below changes only microseconds.

@@ -96,11 +96,11 @@
<../gallery/ticks_and_spines/date_demo_rrule.html>`_.

* :class:`AutoDateLocator`: On autoscale, this class picks the best
:class:`RRuleLocator` to set the view limits and the tick
:class:`DateLocator` to set the view limits and the tick
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in general its more useful to point to the RRuleLocator than the generic DateLocator. If you want to be pedantic, you can refer to both.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my first draft I had been referring to both, but then found it too complicated/pedantic for this description. The Yearlocator is also no RRuleLocator.

@jklymak
Copy link
Member

jklymak commented Jan 9, 2018

@efiring I think this PR improves the situation. It actually simplifies the microsecond handling, and it improves the documentation and adds a warning to the user that MPL/datetime is doing something imprecise. We can talk more about deprecation later, as better Datetime handling is still a todo item I have so it won't be forgotten.

@jklymak
Copy link
Member

jklymak commented Jan 19, 2018

@nmartensen is this still a WIP?

@nmartensen nmartensen changed the title WIP: sub-second datetime plotting improve sub-second datetime plotting and documentation Jan 19, 2018
@nmartensen
Copy link
Contributor Author

No, you are right. I don't have plans for further changes, unless you have more comments. Changed the title accordingly.

@jklymak
Copy link
Member

jklymak commented Jan 19, 2018

Great - it needs a second approver before we can merge. @efiring appears somewhat against, so that may squash other support 😉 which means it might not get merged. This happens sometimes. However, your contribution is very much aprpeciated, and if it doesn't get included, please do take advantage of having the machinery set up to send more PRs.

Copy link
Member

@efiring efiring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can agree with one of @jklymak's comments: the code addition is moderate, it is not affecting the API, and overall this is an improvement, particularly via the documentation.

@jklymak jklymak merged commit ac2da0e into matplotlib:master Jan 19, 2018
@jklymak
Copy link
Member

jklymak commented Jan 19, 2018

Thanks @efiring and thanks a lot @nmartensen !

@QuLogic QuLogic modified the milestones: needs sorting, v2.2.0 Feb 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants