Skip to content

Fix ConciseDateFormatter when plotting a range included in a second #17269

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 25, 2020

Conversation

leolchat
Copy link
Contributor

When micros only are changing (the plot is showing a range inside a second),
the ConciseDateFormatter previously computed a level 0 : year,
which would make it display only years... when microseconds resolution is expected.

When micros only are changing (we are zoomed inside a second),
the formatter previously computed a level 0 : year,
displaying only years... when microseconds were expected.
Copy link
Member

@jklymak jklymak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide a test and or example? Thanks!

@leolchat
Copy link
Contributor Author

Fair enough. The current test_concise_formatter test rely on some floating point rounding errors to be correct, they are quite tricky to change. And I might also need to change the AutoDateLocator to fix those tests...

I can also simply add a separate test only for that little part of logic.

@leolchat
Copy link
Contributor Author

I am adding just a separate test for now showing the bug fixed by this PR, it is called test_concise_formatter_subsecond

Copy link
Member

@jklymak jklymak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a problem with this. We didn't implement it at first, because microsecond dates were so problematic, but with the new epoch that is less of a problem. But maybe a few more cases could be tested to make sure? I'm also not sure how "concise" these dates are going to be, but not sure you can do anything about that.

I forget, what happened at millisecond resolution before?

year_1996 = 9861.0
strings = formatter.format_ticks([
year_1996,
year_1996 + 500 / mdates.MUSECONDS_PER_DAY,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the test. What happens if it is year_1996 - 200 / us, year_1996, year_1996 + 200 / us? Also it would be good to know what the offset text says for these cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with -200 +200 there are 2 different second values (59 and 00) so the original code compute level 5 correctly and you get ["59.9998", "00.0002"]

@leolchat
Copy link
Contributor Author

leolchat commented Apr 30, 2020

I do believe it is a bug and not the original dev not willing to do something about it. The code is full of things to allow displaying correctly the fractional part of seconds. And it affects plotting milliseconds or anything subsecond not just microseconds.

Maybe I should have made that clear in my tiny test, but a range of almost a second
year_1996, year_1996 + 999999 / mdates.MUSECONDS_PER_DAY
will return ["1996, "1996"] without this PR

@leolchat
Copy link
Contributor Author

For the conciseness part the result, it could definitely be improved with levels bellow level 5 ( mostly meant to print seconds with fractional part "%S.%f" ) I personally would greatly benefit from a 'millisecond' level.

@leolchat
Copy link
Contributor Author

For the offset part, well it is correctly done once the level is correctly computed.
Which btw made that bug even more hair pulling since when zooming some data, suddenly only years would be displayed with no offset (at level 0 meaning years, the default format of the offset is empty) when one wanted fractions of a second and the offset to show the full date and time...

@jklymak
Copy link
Member

jklymak commented Apr 30, 2020

We could consider adding milliseconds.

Again, we did have logic for smaller levels, but the round off errors due to the old epoch being so far from modern meant this largely returned nonsense anyway. The general approach was to steer folks away from fine-grained times if they need sub-second resolution (which I still think is safest. Do you really need to know when 07:00 was if you are interested in milli-second signals?)

@leolchat
Copy link
Contributor Author

I have 1khz data spanning many days (or weeks) at a time, which I regularly plot as a whole and zoom in sections, sometimes zooming in as much as to the ms scale.

But in general, if the data is timestamped even short sections of it are nice to plot and have the offset show the full date. Numpy and pandas now handle dates in a sane way so I guess many people are banging their head with matplotlib strange subsecond behaviours and issues. At least I have for many years and I usually end up doing axis ticks by hand for plots I want to share....

@jklymak
Copy link
Member

jklymak commented Apr 30, 2020

OK, so we can either put this in as is, or fix the logic to re-introduce "level == 6". I guess I'd favour the latter now that milliseconds in modern dates can be properly round-tripped in num2date.

Copy link
Member

@jklymak jklymak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy for you to work on re-introducing level 6, or I can do it when I have time.

@@ -775,6 +775,10 @@ def format_ticks(self, values):
for level in range(5, -1, -1):
if len(np.unique(tickdate[:, level])) > 1:
break
elif level == 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should just re-introduce level 6, and have proper format for that level.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leolchat did you have any interest in moving this forward, or should I convert to an issue. I still think this should get its own level, versus assuming level==0 is level 5.

Copy link
Contributor Author

@leolchat leolchat Jul 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we reach level == 0 without calling break, it means that every level tested has one unique value, which is what happens when the only differences between ticks are sub-second:

[ 2020-06-06 18:08:30.4,  2020-06-06 18:08:30.5,  2020-06-06 18:08:30.6,  2020-06-06 18:08:30.7]

(That is what the test I added is testing)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it was not subsecond, then it would be the unit above years, which doesn't exist in datetime.

@leolchat
Copy link
Contributor Author

I see that newer version of matplotlib now allow to set the epoch date with set_epoch, that will improve things for sure too.

level == 6 meaning microseconds would be not too hard, but level 6 meaning milliseconds will be quite hackish (not allow easy format change by the user) since python strftime doesn't provide formatters for it sadly. Handling custom formatters is probably not something matplotlib wants?

For level 6 do you have a reference to some old code?

@jklymak
Copy link
Member

jklymak commented Apr 30, 2020

level == 6 meaning microseconds would be not too hard, but level 6 meaning milliseconds will be quite hackish (not allow easy format change by the user) since python strftime doesn't provide formatters for it sadly. Handling custom formatters is probably not something matplotlib wants?

ConciseDateFormatter is still experimental, so we could do things if you have a reasonable proposal.

For level 6 do you have a reference to some old code?

Sorry that was all squashed out ;-). But we could add a millisecond and microsecond level and change the formatting at those levels.

@leolchat
Copy link
Contributor Author

leolchat commented Apr 30, 2020

Just a remark reading your comment that ConciseDateFormatter is experimental, I would be happy to use the default AutoDateFormatter if I could have it display an offset easily. In the end the "concise" version really only add 2 things: handle the offset and allow custom format for zeros. And I don't feel like the custom zeros are that useful vs having a well handled major vs minor ticks.

@jklymak
Copy link
Member

jklymak commented Apr 30, 2020

Custom formats for zeros are the whole point of ConciseDateFormatter. They allow us not to have to write "Dec 2019", "Jan 2020", "Feb 2020", which is what AutoDateFormatter has to do, and instead write "Dec", "2020", "Feb".

If you are interested in kHz data I agree that may not be so useful but for normal dates it’s a huge space savings without loss of information.

If you wanted to work on either formatter to get what you want, I’m sure that would be welcome.

@leolchat
Copy link
Contributor Author

Well, Pandas as some auto formatters they register for their pd.Timestamp type, which are a little nicer by default than the default np.datetime64 ones, but it also has kinks and issues when going subsecond.

It is a bit sad to have so many parallel piece of code trying to do the same thing. I am not sure whether I would spend more time trying to improve/fix the pandas or the matplotlib one.

One thing is sure, there is also a lot of possible improvement when plotting np.timedetla64 and such. That might be a bigger improvement.

@@ -775,6 +775,10 @@ def format_ticks(self, values):
for level in range(5, -1, -1):
if len(np.unique(tickdate[:, level])) > 1:
break
elif level == 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leolchat did you have any interest in moving this forward, or should I convert to an issue. I still think this should get its own level, versus assuming level==0 is level 5.

@leolchat
Copy link
Contributor Author

leolchat commented Jul 16, 2020

I have fixed it differently on my local computer because the code has other bugs, one in particular: if out of luck you end up with ticks having all the same seconds but not all the same minutes, you might end up using the level matching seconds.
So locally I have

        for level in range(0, 6, 1):
            if len(np.unique(tickdate[:, level])) > 1:
                break

Which makes the definition of the level clear: level is the biggest unit which has more than one value. This makes clear what the offset needs to be (everything above that level is common to all ticks so it doesn't need to be printed more than once). Clearly the original programmer went out of his way to not write that, so there must be reasons.

I suspect I have other changes required for this clean definition of level to work. My local version of matplotlib is quite hacked.

This PR is still a good small patch in my opinion, fixing the current logic for that exact problem. Do you see any downside of this code? (an example which was working before but doesn't now?)

@jklymak
Copy link
Member

jklymak commented Jul 16, 2020

Can you elaborate on that bug with an example?

I'm the original programmer 😉 but make no claims that what I did is perfect...

Level 0 is supposed to be for multiple years, so I'm not clear how setting microseconds to that level solves any problems. But I admit I'm confused why that doesn't break the tests.

@leolchat
Copy link
Contributor Author

leolchat commented Jul 17, 2020

The other bug is if we have:

[ 2020-06-06 18:08:30,  2021-06-06 18:08:31]

The current code is testing level 5 first (seconds), there are 2 different seconds so it will declare that the level which matters is at the seconds and the offset displayed will be 2020-06-06 18:08:30, and the ticks will show 30seconds and 31seconds, making you believe only 1 second passed between the 2 dates, when there is a full year (and a second).

@leolchat
Copy link
Contributor Author

leolchat commented Jul 17, 2020

I left another comment explaining this patch with maybe a better explanation below your review.

@jklymak
Copy link
Member

jklymak commented Jul 17, 2020

The current code is testing level 5 first (seconds), there are 2 different seconds so it will declare that the level which matters is at the seconds and the offset displayed will be 2020-06-06 18:08:30, and the ticks will show 30seconds and 31seconds, making you believe only 1 second passed between the 2 dates, when there is a full year (and a second).

Can you post complete examples? I can't reproduce what you are saying.

import matplotlib.pyplot as plt
import numpy as np
plt.rcParams['date.converter'] = 'concise'

dates = [np.datetime64( '2020-06-06 18:08:30'),  np.datetime64('2021-06-06 18:08:31')]

fig, ax = plt.subplots()
ax.plot(dates, [0, 1])

plt.show()

gives Level 1, as expected:

CD

@leolchat
Copy link
Contributor Author

The bug happens when the ticks are only those values, not when the data points are. (Since the code we are talking about works on tick values, not on data point).
I have had that happen very often, but I have not saved a simple example. I will try to think about a way to reproduce from a simple command.

@jklymak
Copy link
Member

jklymak commented Jul 17, 2020

Oh, OK, I see. that's strange:

import matplotlib.pyplot as plt
import numpy as np
print(plt.rcParams.keys())
plt.rcParams['date.converter'] = 'concise'
dates = [np.datetime64( '2020-06-06 18:08:30'),  np.datetime64('2021-06-06 18:08:31')]

fig, ax = plt.subplots()
ax.plot(dates, [0, 1])
ax.set_xticks(dates)

plt.show()

@jklymak
Copy link
Member

jklymak commented Jul 17, 2020

... but that can only really happen if you've set the ticks manually?

@leolchat
Copy link
Contributor Author

It happens in fairly common situations. I virtually never set the ticks manually, but I have seen that bug many times. I remember it being easy to trigger when zooming in and out.

@leolchat
Copy link
Contributor Author

Changing the definition of what is a level seems a bit more than the goal of this current PR. And if one wants I would really push to get level being defined cleanly like something I suggest #17269 (comment)

@jklymak
Copy link
Member

jklymak commented Jul 17, 2020

Using the AutoLocator?

@leolchat
Copy link
Contributor Author

I believe so, if not, one of the other usual locator.

@jklymak
Copy link
Member

jklymak commented Jul 17, 2020

The code goes backwards to test the level because you want the level to be at the smallest non-repeating level. i.e. you could have "2012, Apr, Jul, Oct, 2013, Apr" and that can't be at the annual level (0) because you want the months to be labelled (i.e. level 1). So your patch above can't work as is.

Usually the seconds tick in the AutoDateLocator is zero. I don't understand why it might not be. If you have a reproducible error (ahem on an unhacked version) that would be quite helpful. Thanks!

@jklymak
Copy link
Member

jklymak commented Jul 17, 2020

Just to be clear:

import matplotlib.pyplot as plt
import numpy as np

plt.rcParams['date.converter'] = 'concise'

dates = np.arange('2000-12-15', '2002-01-15', dtype='datetime64[D]')

fig, ax = plt.subplots()
ax.plot(dates, np.arange(0, len(dates)))

plt.show()

Master:

old

your proposed change:

        for level in range(6):
            if len(np.unique(tickdate[:, level])) > 1:
                break

proposed

@leolchat
Copy link
Contributor Author

I find it confusing to have 2 parallel discussions here, this PR is not about the definition of level, it is a fairly straightforward bug fix.

Do you mind talking about your last point in a separate issue maybe? I might have a bit of time next week to play around and find you examples breaking the current definition of levels.

@leolchat
Copy link
Contributor Author

PS all your examples use plt.rcParams['date.converter'] = 'concise' which doesn't work for me in a fresh venv with 3.2.1 matplotlib. Is there some specific version needed?

@jklymak
Copy link
Member

jklymak commented Jul 17, 2020

The discussion about levels is because I'd like us just to use a level 6 for milliseconds, rather than just falling back to level 5. But you are saying its possible for seconds or milliseconds to not be zero if the ticks span months or years which is a more fundamental issue with the algorithm if that can actually happen.

@jklymak
Copy link
Member

jklymak commented Jul 17, 2020

master has plt.rcParams['date.converter'] = 'concise'...

@leolchat
Copy link
Contributor Author

leolchat commented Jul 18, 2020

Ok, so we keep all in the same thread.

I never proposed to only change the computation of level, I proposed to change its definition and I said I have a bunch of other changes to make it work. My goal is that it doesn't rely on some special property of the locator (in my opinion a formatter should give decent result what ever the locator or even when users provides their ticks).

When using AutoDateLocator(minticks=5, maxticks=8) for the locator, you get
image

Which starts to be confusing, ticks are misaligned (not really the formatter fault...) and mostly the offset doesn't apply to most of the range and one can easily read that the start is in 2021.
A good level definition would say: there is no offset possible here since years are varying, so do not display an offset (which matches choosing level 0), the fact that the other ticks are a bunch of '2021' is a secondary question:
If years are varying, what label should be displayed?
I suggest that it should depends on the number of ticks and their spacing.

The current algorithm is very brittle, relies on zeroes to appear, and doesn't adjust with the number of ticks.

@leolchat
Copy link
Contributor Author

leolchat commented Jul 18, 2020

using AutoDateLocator(minticks=20, maxticks=30) you get
image

There are no zeroes helping at the year mark here (actually the zero are month because of the wrong level), which makes for a pretty fun read of this plot.
People wonder: is it 2020-2021 or 2021-2022 ? Why is there Jun in the bottom right?

@leolchat
Copy link
Contributor Author

leolchat commented Jul 18, 2020

Then if one zoom in, it will hit the bug that this PR fixes.

zoom by hand or directly using ax.set_xlim(np.datetime64( '2020-06-06 18:08:30.3'), np.datetime64( '2020-06-06 18:08:30.5')) gives
image

@jklymak
Copy link
Member

jklymak commented Jul 18, 2020

For this PR I would like a level 6, instead of just putting it to level 5 if no level is found. Again, if you dont' want to do that, I understand, and I will address the bug above in a different PR (#17269 (comment)).

#17269 (comment) looks to be a bug....

@leolchat
Copy link
Contributor Author

leolchat commented Jul 18, 2020

The bug just above (#17269 (comment)) is fixed by this PR. I am not sure what other PR you want to do. Is there a way to fix it you would prefer?

The other one (#17269 (comment)) is not a bug, it is a consequence of how levels are currently defined/computed.

@leolchat
Copy link
Contributor Author

This PR changes the last plot to
image

Copy link
Member

@jklymak jklymak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is fine. I'd prefer a test that didn't use arbitrary tick values, but its not too important.

@leolchat leolchat mentioned this pull request Jul 20, 2020
6 tasks
Copy link
Member

@dstansby dstansby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix @leolchat 🎉

@dstansby dstansby merged commit 0339042 into matplotlib:master Jul 25, 2020
@leolchat leolchat deleted the patch-1 branch July 26, 2020 04:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants