Masked array with plot_date chooses far too large time span #7787

Open
gerritholl opened this Issue Jan 10, 2017 · 3 comments

Projects

None yet

2 participants

@gerritholl
gerritholl commented Jan 10, 2017 edited

Bug summary

When plotting the attached time vs. masked arrays using plot_date, matplotlib chooses a timespan of 4½ years despite the time axis covering less than 4 days (I had to make it a zip or github would not permit it as an attachment). This happens because most of the data in y is masked; the data in the time-axis are not masked at all, so it should still be able to choose a sensible time range.

Code for reproduction

plot_date(t, x)

Actual outcome

actual outcome

Expected outcome

The expected span of time axes is obtained with

plot_date(t.data, x.data)

expected outcome

Matplotlib version

I'm using matplotlib 2.0.0rc2 installed through pip

@tacaswell tacaswell added this to the 2.1 (next point release) milestone Jan 10, 2017
@tacaswell
Member

It looks like all of the times are identical in the top plot. This is probably a duplicate of #5963

The way we deal with singular time ranges (that is the max and the min are the same) is to expard the range by some percentage, which works pretty well for non-date values. However the way that we internally store dates as float days from 1900 (for historical reasons) and expanding times around 'now' (relative to 1900) results in very wide ranges.

Any suggestions as to how to, from a single time value, correctly guess the range would be greatly appreciated 😄 .

I would be very surprised if this was a regression.

@gerritholl

I don't believe it's a regression. I think we can do a little better for the special case where we do have more than one available time, but it gets reduced to one due to masking on the data. Perhaps the broader issue is; should the time range to be plotted be based on the range of t, or on the range of t[~y.mask]? I would argue the former is preferable and it would avoid the problem (I agree that it's otherwise a duplicate).

@tacaswell
Member

But then Line2D has to have a special case for masked data and autoscaling. There may be many cases where you do not want to show the full range of the masked data (so we would end up adding an API to control when the masked data is used in auto-scaling and it just gets messy).

I suggest you write a helper function like:

def masked_date_plotting(x, y, *, ax=None, **kwargs):
    if ax is None:
        ax = plt.gca()
    ret = ax.plot_data(x, y, **kwargs)
    ax.set_xlim(np.min(x.data), np.max(x.data))
    return ret

You might want to add some checks to never shrink the current data limits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment