Fix ConciseDateFormatter when plotting a range included in a second #17269

leolchat · 2020-04-30T03:39:13Z

When micros only are changing (the plot is showing a range inside a second),
the ConciseDateFormatter previously computed a level 0 : year,
which would make it display only years... when microseconds resolution is expected.

When micros only are changing (we are zoomed inside a second), the formatter previously computed a level 0 : year, displaying only years... when microseconds were expected.

jklymak

Can you provide a test and or example? Thanks!

leolchat · 2020-04-30T04:54:26Z

Fair enough. The current test_concise_formatter test rely on some floating point rounding errors to be correct, they are quite tricky to change. And I might also need to change the AutoDateLocator to fix those tests...

I can also simply add a separate test only for that little part of logic.

leolchat · 2020-04-30T05:10:18Z

I am adding just a separate test for now showing the bug fixed by this PR, it is called test_concise_formatter_subsecond

jklymak

I don't see a problem with this. We didn't implement it at first, because microsecond dates were so problematic, but with the new epoch that is less of a problem. But maybe a few more cases could be tested to make sure? I'm also not sure how "concise" these dates are going to be, but not sure you can do anything about that.

I forget, what happened at millisecond resolution before?

jklymak · 2020-04-30T14:20:27Z

lib/matplotlib/tests/test_dates.py

+    year_1996 = 9861.0
+    strings = formatter.format_ticks([
+        year_1996,
+        year_1996 + 500 / mdates.MUSECONDS_PER_DAY,


Thanks for the test. What happens if it is year_1996 - 200 / us, year_1996, year_1996 + 200 / us? Also it would be good to know what the offset text says for these cases.

with -200 +200 there are 2 different second values (59 and 00) so the original code compute level 5 correctly and you get ["59.9998", "00.0002"]

leolchat · 2020-04-30T14:49:46Z

I do believe it is a bug and not the original dev not willing to do something about it. The code is full of things to allow displaying correctly the fractional part of seconds. And it affects plotting milliseconds or anything subsecond not just microseconds.

Maybe I should have made that clear in my tiny test, but a range of almost a second
year_1996, year_1996 + 999999 / mdates.MUSECONDS_PER_DAY
will return ["1996, "1996"] without this PR

leolchat · 2020-04-30T14:55:07Z

For the conciseness part the result, it could definitely be improved with levels bellow level 5 ( mostly meant to print seconds with fractional part "%S.%f" ) I personally would greatly benefit from a 'millisecond' level.

leolchat · 2020-04-30T14:59:27Z

For the offset part, well it is correctly done once the level is correctly computed.
Which btw made that bug even more hair pulling since when zooming some data, suddenly only years would be displayed with no offset (at level 0 meaning years, the default format of the offset is empty) when one wanted fractions of a second and the offset to show the full date and time...

jklymak · 2020-04-30T15:01:08Z

We could consider adding milliseconds.

Again, we did have logic for smaller levels, but the round off errors due to the old epoch being so far from modern meant this largely returned nonsense anyway. The general approach was to steer folks away from fine-grained times if they need sub-second resolution (which I still think is safest. Do you really need to know when 07:00 was if you are interested in milli-second signals?)

leolchat · 2020-04-30T15:17:42Z

I have 1khz data spanning many days (or weeks) at a time, which I regularly plot as a whole and zoom in sections, sometimes zooming in as much as to the ms scale.

But in general, if the data is timestamped even short sections of it are nice to plot and have the offset show the full date. Numpy and pandas now handle dates in a sane way so I guess many people are banging their head with matplotlib strange subsecond behaviours and issues. At least I have for many years and I usually end up doing axis ticks by hand for plots I want to share....

jklymak · 2020-04-30T15:31:17Z

OK, so we can either put this in as is, or fix the logic to re-introduce "level == 6". I guess I'd favour the latter now that milliseconds in modern dates can be properly round-tripped in num2date.

jklymak

Happy for you to work on re-introducing level 6, or I can do it when I have time.

jklymak · 2020-04-30T15:32:33Z

lib/matplotlib/dates.py

        for level in range(5, -1, -1):
            if len(np.unique(tickdate[:, level])) > 1:
                break
+            elif level == 0:


I think we should just re-introduce level 6, and have proper format for that level.

@leolchat did you have any interest in moving this forward, or should I convert to an issue. I still think this should get its own level, versus assuming level==0 is level 5.

if we reach level == 0 without calling break, it means that every level tested has one unique value, which is what happens when the only differences between ticks are sub-second:

[ 2020-06-06 18:08:30.4, 2020-06-06 18:08:30.5, 2020-06-06 18:08:30.6, 2020-06-06 18:08:30.7]

(That is what the test I added is testing)

If it was not subsecond, then it would be the unit above years, which doesn't exist in datetime.

leolchat · 2020-04-30T15:39:01Z

I see that newer version of matplotlib now allow to set the epoch date with set_epoch, that will improve things for sure too.

level == 6 meaning microseconds would be not too hard, but level 6 meaning milliseconds will be quite hackish (not allow easy format change by the user) since python strftime doesn't provide formatters for it sadly. Handling custom formatters is probably not something matplotlib wants?

For level 6 do you have a reference to some old code?

jklymak · 2020-04-30T15:50:13Z

level == 6 meaning microseconds would be not too hard, but level 6 meaning milliseconds will be quite hackish (not allow easy format change by the user) since python strftime doesn't provide formatters for it sadly. Handling custom formatters is probably not something matplotlib wants?

ConciseDateFormatter is still experimental, so we could do things if you have a reasonable proposal.

For level 6 do you have a reference to some old code?

Sorry that was all squashed out ;-). But we could add a millisecond and microsecond level and change the formatting at those levels.

leolchat · 2020-04-30T18:07:05Z

Just a remark reading your comment that ConciseDateFormatter is experimental, I would be happy to use the default AutoDateFormatter if I could have it display an offset easily. In the end the "concise" version really only add 2 things: handle the offset and allow custom format for zeros. And I don't feel like the custom zeros are that useful vs having a well handled major vs minor ticks.

jklymak · 2020-04-30T19:01:02Z

Custom formats for zeros are the whole point of ConciseDateFormatter. They allow us not to have to write "Dec 2019", "Jan 2020", "Feb 2020", which is what AutoDateFormatter has to do, and instead write "Dec", "2020", "Feb".

If you are interested in kHz data I agree that may not be so useful but for normal dates it’s a huge space savings without loss of information.

If you wanted to work on either formatter to get what you want, I’m sure that would be welcome.

leolchat · 2020-04-30T21:06:29Z

Well, Pandas as some auto formatters they register for their pd.Timestamp type, which are a little nicer by default than the default np.datetime64 ones, but it also has kinks and issues when going subsecond.

It is a bit sad to have so many parallel piece of code trying to do the same thing. I am not sure whether I would spend more time trying to improve/fix the pandas or the matplotlib one.

One thing is sure, there is also a lot of possible improvement when plotting np.timedetla64 and such. That might be a bigger improvement.

jklymak · 2020-07-16T15:57:46Z

lib/matplotlib/dates.py

        for level in range(5, -1, -1):
            if len(np.unique(tickdate[:, level])) > 1:
                break
+            elif level == 0:


@leolchat did you have any interest in moving this forward, or should I convert to an issue. I still think this should get its own level, versus assuming level==0 is level 5.

leolchat · 2020-07-16T16:20:07Z

I have fixed it differently on my local computer because the code has other bugs, one in particular: if out of luck you end up with ticks having all the same seconds but not all the same minutes, you might end up using the level matching seconds.
So locally I have

        for level in range(0, 6, 1):
            if len(np.unique(tickdate[:, level])) > 1:
                break

Which makes the definition of the level clear: level is the biggest unit which has more than one value. This makes clear what the offset needs to be (everything above that level is common to all ticks so it doesn't need to be printed more than once). Clearly the original programmer went out of his way to not write that, so there must be reasons.

I suspect I have other changes required for this clean definition of level to work. My local version of matplotlib is quite hacked.

This PR is still a good small patch in my opinion, fixing the current logic for that exact problem. Do you see any downside of this code? (an example which was working before but doesn't now?)

jklymak · 2020-07-16T16:29:33Z

Can you elaborate on that bug with an example?

I'm the original programmer 😉 but make no claims that what I did is perfect...

Level 0 is supposed to be for multiple years, so I'm not clear how setting microseconds to that level solves any problems. But I admit I'm confused why that doesn't break the tests.

leolchat · 2020-07-17T03:17:44Z

The other bug is if we have:

[ 2020-06-06 18:08:30,  2021-06-06 18:08:31]

The current code is testing level 5 first (seconds), there are 2 different seconds so it will declare that the level which matters is at the seconds and the offset displayed will be 2020-06-06 18:08:30, and the ticks will show 30seconds and 31seconds, making you believe only 1 second passed between the 2 dates, when there is a full year (and a second).

leolchat · 2020-07-17T03:21:02Z

I left another comment explaining this patch with maybe a better explanation below your review.

jklymak · 2020-07-17T04:17:19Z

The current code is testing level 5 first (seconds), there are 2 different seconds so it will declare that the level which matters is at the seconds and the offset displayed will be 2020-06-06 18:08:30, and the ticks will show 30seconds and 31seconds, making you believe only 1 second passed between the 2 dates, when there is a full year (and a second).

Can you post complete examples? I can't reproduce what you are saying.

import matplotlib.pyplot as plt
import numpy as np
plt.rcParams['date.converter'] = 'concise'

dates = [np.datetime64( '2020-06-06 18:08:30'),  np.datetime64('2021-06-06 18:08:31')]

fig, ax = plt.subplots()
ax.plot(dates, [0, 1])

plt.show()

gives Level 1, as expected:

leolchat · 2020-07-17T04:24:03Z

The bug happens when the ticks are only those values, not when the data points are. (Since the code we are talking about works on tick values, not on data point).
I have had that happen very often, but I have not saved a simple example. I will try to think about a way to reproduce from a simple command.

jklymak · 2020-07-17T04:31:53Z

Oh, OK, I see. that's strange:

import matplotlib.pyplot as plt
import numpy as np
print(plt.rcParams.keys())
plt.rcParams['date.converter'] = 'concise'
dates = [np.datetime64( '2020-06-06 18:08:30'),  np.datetime64('2021-06-06 18:08:31')]

fig, ax = plt.subplots()
ax.plot(dates, [0, 1])
ax.set_xticks(dates)

plt.show()

jklymak · 2020-07-17T04:34:17Z

... but that can only really happen if you've set the ticks manually?

leolchat · 2020-07-17T04:40:03Z

It happens in fairly common situations. I virtually never set the ticks manually, but I have seen that bug many times. I remember it being easy to trigger when zooming in and out.

leolchat · 2020-07-17T04:42:20Z

Changing the definition of what is a level seems a bit more than the goal of this current PR. And if one wants I would really push to get level being defined cleanly like something I suggest #17269 (comment)

jklymak · 2020-07-17T04:42:26Z

Using the AutoLocator?

leolchat · 2020-07-17T04:43:08Z

I believe so, if not, one of the other usual locator.

jklymak · 2020-07-17T04:54:48Z

The code goes backwards to test the level because you want the level to be at the smallest non-repeating level. i.e. you could have "2012, Apr, Jul, Oct, 2013, Apr" and that can't be at the annual level (0) because you want the months to be labelled (i.e. level 1). So your patch above can't work as is.

Usually the seconds tick in the AutoDateLocator is zero. I don't understand why it might not be. If you have a reproducible error (ahem on an unhacked version) that would be quite helpful. Thanks!

jklymak · 2020-07-17T16:26:13Z

Just to be clear:

import matplotlib.pyplot as plt
import numpy as np

plt.rcParams['date.converter'] = 'concise'

dates = np.arange('2000-12-15', '2002-01-15', dtype='datetime64[D]')

fig, ax = plt.subplots()
ax.plot(dates, np.arange(0, len(dates)))

plt.show()

Master:

your proposed change:

        for level in range(6):
            if len(np.unique(tickdate[:, level])) > 1:
                break

leolchat · 2020-07-17T22:28:09Z

I find it confusing to have 2 parallel discussions here, this PR is not about the definition of level, it is a fairly straightforward bug fix.

Do you mind talking about your last point in a separate issue maybe? I might have a bit of time next week to play around and find you examples breaking the current definition of levels.

leolchat · 2020-07-17T22:29:41Z

PS all your examples use plt.rcParams['date.converter'] = 'concise' which doesn't work for me in a fresh venv with 3.2.1 matplotlib. Is there some specific version needed?

jklymak · 2020-07-17T22:53:29Z

The discussion about levels is because I'd like us just to use a level 6 for milliseconds, rather than just falling back to level 5. But you are saying its possible for seconds or milliseconds to not be zero if the ticks span months or years which is a more fundamental issue with the algorithm if that can actually happen.

jklymak · 2020-07-17T22:53:49Z

master has plt.rcParams['date.converter'] = 'concise'...

leolchat · 2020-07-18T00:40:12Z

Ok, so we keep all in the same thread.

I never proposed to only change the computation of level, I proposed to change its definition and I said I have a bunch of other changes to make it work. My goal is that it doesn't rely on some special property of the locator (in my opinion a formatter should give decent result what ever the locator or even when users provides their ticks).

When using AutoDateLocator(minticks=5, maxticks=8) for the locator, you get

Which starts to be confusing, ticks are misaligned (not really the formatter fault...) and mostly the offset doesn't apply to most of the range and one can easily read that the start is in 2021.
A good level definition would say: there is no offset possible here since years are varying, so do not display an offset (which matches choosing level 0), the fact that the other ticks are a bunch of '2021' is a secondary question:
If years are varying, what label should be displayed?
I suggest that it should depends on the number of ticks and their spacing.

The current algorithm is very brittle, relies on zeroes to appear, and doesn't adjust with the number of ticks.

leolchat · 2020-07-18T00:41:18Z

using AutoDateLocator(minticks=20, maxticks=30) you get

There are no zeroes helping at the year mark here (actually the zero are month because of the wrong level), which makes for a pretty fun read of this plot.
People wonder: is it 2020-2021 or 2021-2022 ? Why is there Jun in the bottom right?

leolchat · 2020-07-18T01:02:52Z

Then if one zoom in, it will hit the bug that this PR fixes.

zoom by hand or directly using ax.set_xlim(np.datetime64( '2020-06-06 18:08:30.3'), np.datetime64( '2020-06-06 18:08:30.5')) gives

jklymak · 2020-07-18T02:22:22Z

For this PR I would like a level 6, instead of just putting it to level 5 if no level is found. Again, if you dont' want to do that, I understand, and I will address the bug above in a different PR (#17269 (comment)).

#17269 (comment) looks to be a bug....

leolchat · 2020-07-18T02:59:34Z

The bug just above (#17269 (comment)) is fixed by this PR. I am not sure what other PR you want to do. Is there a way to fix it you would prefer?

The other one (#17269 (comment)) is not a bug, it is a consequence of how levels are currently defined/computed.

leolchat · 2020-07-18T03:01:32Z

This PR changes the last plot to

jklymak

I guess this is fine. I'd prefer a test that didn't use arbitrary tick values, but its not too important.

dstansby

Thanks for the fix @leolchat 🎉

Fix ConciseDateFormatter when only micros change

4876bb5

When micros only are changing (we are zoomed inside a second), the formatter previously computed a level 0 : year, displaying only years... when microseconds were expected.

jklymak reviewed Apr 30, 2020

View reviewed changes

Separate test

986a0f9

flake8 fixes

2c27a02

jklymak reviewed Apr 30, 2020

View reviewed changes

jklymak added this to the v3.4.0 milestone Apr 30, 2020

jklymak added topic: date handling topic: ticks axis labels labels Apr 30, 2020

jklymak reviewed Apr 30, 2020

View reviewed changes

jklymak requested changes Jul 16, 2020

View reviewed changes

jklymak mentioned this pull request Jul 18, 2020

ConciseFormatter doesn't always label start of year properly #17958

Closed

jklymak approved these changes Jul 18, 2020

View reviewed changes

leolchat mentioned this pull request Jul 20, 2020

Leo concise levels2 #17977

Closed

6 tasks

dstansby approved these changes Jul 25, 2020

View reviewed changes

dstansby merged commit 0339042 into matplotlib:master Jul 25, 2020

leolchat deleted the patch-1 branch July 26, 2020 04:58

Uh oh!

Fix ConciseDateFormatter when plotting a range included in a second #17269

Fix ConciseDateFormatter when plotting a range included in a second #17269

Uh oh!

Conversation

leolchat commented Apr 30, 2020

Uh oh!

jklymak left a comment

Choose a reason for hiding this comment

Uh oh!

leolchat commented Apr 30, 2020

Uh oh!

leolchat commented Apr 30, 2020

Uh oh!

jklymak left a comment

Choose a reason for hiding this comment

Uh oh!

jklymak Apr 30, 2020

Choose a reason for hiding this comment

Uh oh!

leolchat Apr 30, 2020

Choose a reason for hiding this comment

Uh oh!

leolchat commented Apr 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leolchat commented Apr 30, 2020

Uh oh!

leolchat commented Apr 30, 2020

Uh oh!

jklymak commented Apr 30, 2020

Uh oh!

leolchat commented Apr 30, 2020

Uh oh!

jklymak commented Apr 30, 2020

Uh oh!

jklymak left a comment

Choose a reason for hiding this comment

Uh oh!

jklymak Apr 30, 2020

Choose a reason for hiding this comment

Uh oh!

jklymak Jul 16, 2020

Choose a reason for hiding this comment

Uh oh!

leolchat Jul 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leolchat Jul 17, 2020

Choose a reason for hiding this comment

Uh oh!

leolchat commented Apr 30, 2020

Uh oh!

jklymak commented Apr 30, 2020

Uh oh!

leolchat commented Apr 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jklymak commented Apr 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leolchat commented Apr 30, 2020

Uh oh!

jklymak Jul 16, 2020

Choose a reason for hiding this comment

Uh oh!

leolchat commented Jul 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jklymak commented Jul 16, 2020

Uh oh!

leolchat commented Jul 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leolchat commented Jul 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jklymak commented Jul 17, 2020

Uh oh!

leolchat commented Apr 30, 2020 •

edited

Loading

leolchat Jul 17, 2020 •

edited

Loading

leolchat commented Apr 30, 2020 •

edited

Loading

jklymak commented Apr 30, 2020 •

edited

Loading

leolchat commented Jul 16, 2020 •

edited

Loading

leolchat commented Jul 17, 2020 •

edited

Loading

leolchat commented Jul 17, 2020 •

edited

Loading

leolchat commented Jul 18, 2020 •

edited

Loading

leolchat commented Jul 18, 2020 •

edited

Loading

leolchat commented Jul 18, 2020 •

edited

Loading

leolchat commented Jul 18, 2020 •

edited

Loading