BUG: strange timeseries plot behavior #6608

rosnfeld · 2014-03-11T23:10:06Z

After some discussion below, here's a simple repro case:

s1 = pd.Series([1, 2, 3], index=[datetime.datetime(1995, 12, 31), datetime.datetime(2000, 12, 31), datetime.datetime(2005, 12, 31)])
s2 = pd.Series([1, 2, 3], index=[datetime.datetime(1997, 12, 31), datetime.datetime(2003, 12, 31), datetime.datetime(2008, 12, 31)])

# plot first series, then add the second series to those axes, then try adding the first series again
ax = s1.plot()
s2.plot(ax=ax)
s1.plot(ax=ax)

causes

Traceback (most recent call last):
  File "simple_repro.py", line 10, in <module>
    s1.plot(ax=ax)
  File "/home/andrew/git/pandas-rosnfeld/pandas/tools/plotting.py", line 2116, in plot_series
    plot_obj.generate()
  File "/home/andrew/git/pandas-rosnfeld/pandas/tools/plotting.py", line 920, in generate
    self._make_plot()
  File "/home/andrew/git/pandas-rosnfeld/pandas/tools/plotting.py", line 1482, in _make_plot
    self._make_ts_plot(data)
  File "/home/andrew/git/pandas-rosnfeld/pandas/tools/plotting.py", line 1577, in _make_ts_plot
    _plot(data, 0, ax, label, self.style, **kwds)
  File "/home/andrew/git/pandas-rosnfeld/pandas/tools/plotting.py", line 1553, in _plot
    style=style, **kwds)
  File "/home/andrew/git/pandas-rosnfeld/pandas/tseries/plotting.py", line 82, in tsplot
    left, right = _get_xlim(ax.get_lines())
  File "/home/andrew/git/pandas-rosnfeld/pandas/tseries/plotting.py", line 226, in _get_xlim
    left = min(x[0].ordinal, left)
AttributeError: 'datetime.datetime' object has no attribute 'ordinal'

-- ORIGINAL MESSAGE --

Here's a small dataset:

date,region,value
1996-12-31,BRA,4.5
2003-12-31,BRA,3.7
2007-12-31,BRA,2.2
1995-12-31,COL,6.3
2000-12-31,COL,4.9
2005-12-31,COL,5.1
2010-12-31,COL,3.4
1997-12-31,PAN,6.3
2003-12-31,PAN,5.1
2008-12-31,PAN,3.9
1990-12-31,VEN,6.7
1991-12-31,VEN,5.4
1992-12-31,VEN,4.5
1993-12-31,VEN,4
1994-12-31,VEN,3.9
1995-12-31,VEN,4.1
1996-12-31,VEN,4.4
1997-12-31,VEN,4.5
1998-12-31,VEN,4.6
1999-12-31,VEN,4.1
2000-12-31,VEN,3.9
2007-12-31,VEN,3.7

If I read this in using

data = pd.read_csv('./data.csv', parse_dates='date', index_col='date')

and then try and plot it using

data.groupby('region').value.plot(legend=True)

I get more or less what I expect (perhaps the xlim doesn't go up to 2010-12-31, but otherwise fine).

If I delete out the BRA rows and try this again, I get:

Traceback (most recent call last):
  File "repro.py", line 6, in <module>
    data.groupby('region').value.plot()
  File "/home/andrew/git/pandas-rosnfeld/pandas/core/groupby.py", line 342, in wrapper
    return self.apply(curried)
  File "/home/andrew/git/pandas-rosnfeld/pandas/core/groupby.py", line 428, in apply
    return self._python_apply_general(f)
  File "/home/andrew/git/pandas-rosnfeld/pandas/core/groupby.py", line 432, in _python_apply_general
    self.axis)
  File "/home/andrew/git/pandas-rosnfeld/pandas/core/groupby.py", line 958, in apply
    res = f(group)
  File "/home/andrew/git/pandas-rosnfeld/pandas/core/groupby.py", line 426, in f
    return func(g, *args, **kwargs)
  File "/home/andrew/git/pandas-rosnfeld/pandas/core/groupby.py", line 333, in curried
    return f(x, *args, **kwargs)
  File "/home/andrew/git/pandas-rosnfeld/pandas/tools/plotting.py", line 1921, in plot_series
    plot_obj.generate()
  File "/home/andrew/git/pandas-rosnfeld/pandas/tools/plotting.py", line 912, in generate
    self._make_plot()
  File "/home/andrew/git/pandas-rosnfeld/pandas/tools/plotting.py", line 1379, in _make_plot
    self._make_ts_plot(data, **self.kwds)
  File "/home/andrew/git/pandas-rosnfeld/pandas/tools/plotting.py", line 1450, in _make_ts_plot
    _plot(data, 0, ax, label, self.style, **kwds)
  File "/home/andrew/git/pandas-rosnfeld/pandas/tools/plotting.py", line 1434, in _plot
    style=style, **kwds)
  File "/home/andrew/git/pandas-rosnfeld/pandas/tseries/plotting.py", line 82, in tsplot
    left, right = _get_xlim(ax.get_lines())
  File "/home/andrew/git/pandas-rosnfeld/pandas/tseries/plotting.py", line 226, in _get_xlim
    left = min(x[0].ordinal, left)
AttributeError: 'datetime.datetime' object has no attribute 'ordinal'

If I delete out both BRA and VEN rows, then there is no exception raised but I only see one series plotted and the x-axis is not formatted as a date.

One could also approach this whole exercise via something like

data = pd.read_csv('./data.csv', parse_dates='date')
data.pivot('date', 'region', 'value').plot()

but this works even worse, I just get a truncated VEN series and nothing else.

This is with current master pandas (but also happens in 0.13.1) and matplotlib 1.3.1.

Are there known issues with plotting sparse-yet-overlapping timeseries?

The text was updated successfully, but these errors were encountered:

rosnfeld · 2014-03-11T23:30:31Z

I guess if I do

data.pivot('date', 'region', 'value').interpolate().plot()

or

pivoted = data.pivot('date', 'region', 'value')
pivoted.index = pd.to_datetime(pivoted.index)
pivoted.interpolate(method='time').plot()

I get something like what I wanted as it cleans up the missing values. Interpolate's a new feature I hadn't seen before. (cool!)

Maybe this is all user-error but I had been doing groupby plotting like this for a while, and had been getting what looked like correct results. I feel there may actually be a bug here someplace, that groupby behavior seems so bizarre.

rosnfeld · 2014-03-12T00:07:10Z

Actually, interpolation is not really what I want, as it makes it seem as if there is more data than is actually present. I basically just want to see the various region series all plotted as they would be if they were plotted individually, except all together on the same axes.

(and a loop like

figure = plt.figure()
ax = figure.gca()
data = pd.read_csv('./data.csv', parse_dates='date')
for region in data.region.unique():
    subset = data[data.region == region]
    subset = subset.set_index('date')
    subset.value.plot(ax=ax, label=region)

seems to just over-write the axes)

jorisvandenbossche · 2014-03-12T14:49:05Z

I think the groupby/plot issue seems certainly like a bug. I can't fully lay my hand on it, but I think it has something to do with combining regular/irregular timeseries.

The issue with the xlim not respecting the data is because it is updated by the last group (while this is a smaller group), and this is a seperate issue I think (you can open another issue for that).

The reason you get almost no points on the plot with pivot you seem to already figured out, this is indeed because of all the NaN values. You can also deal with this by plotting points instead of lines.

rosnfeld · 2014-03-13T03:04:15Z

Thanks @jorisvandenbossche, it looks like the "separate issue" is already filed as #2960 . So this one is just the groupby/plot weirdness.

Did you mean to add to your comment? (it ends in what looks like the start to some code)

jorisvandenbossche · 2014-03-13T09:22:05Z

@rosnfeld ah yes, I first wanted to add a code snippet how to do your last example easier, but this also had the same bug, but forgot to remove it. Removed it now

jreback · 2014-04-21T17:16:55Z

@rosnfeld @jorisvandenbossche

so this is the exception that's in the top section?

what is causing this?

rosnfeld · 2014-04-22T02:26:07Z

Yeah, the exception is the most alarming thing, though changing the data slightly causes some other incorrect behavior (missing/incorrect plotting, which is harder to spot/diagnose).

I don't know what's causing it without further investigation, but I can try to investigate and hopefully submit a fix. (it will be my first time digging into the plotting code, not sure how involved it is)

TomAugspurger · 2014-04-22T14:01:32Z

The error is occurring in

def _get_xlim(lines):
    left, right = np.inf, -np.inf
    for l in lines:
        x = l.get_xdata()
        left = min(x[0].ordinal, left)
        right = max(x[-1].ordinal, right)
    return left, right

The line left = min(to_ordinal(x[0]), left) apparently expects a PeriodIndex.
For whatever reason, when you select sub = data[data.region != 'BRA'] and plot that, you get an array of datetime.datetime objects at that point, instead of a PeriodIndex.

I'm not too familiar with our Datetime code, but does anyone know why these aren't the same?

In [40]: import datetime
In [41]: b = datetime.datetime(1997, 12, 31, 0, 0)

In [42]: y = pd.Period(year=1997, month=12, day=31, hour=0, minute=0, freq='S')

In [43]: b.toordinal()
Out[43]: 729389

In [44]: y.ordinal
Out[44]: 883526400

jorisvandenbossche · 2014-04-22T14:16:01Z

Some time ago I looked into this (for another issue), and then one of the fundamental problems was the design of pandas plotting for timeseries splitted in two ways: with datetimes when having irregular serieses, and with ordinals when having a regular timeseries (which was then converted to periodindex), and that those two types are incompatible with each other (so when you combine both types in a certain way it gives problems).

But I have to dig it up again to fully remember (I have some overview of the problem somewhere, but never finished it). I don't know i I find some time in the short term, but will try.

jorisvandenbossche · 2014-04-22T14:19:25Z

And the datetime.datetime.toordinal is in days, the pandas.Period.ordinal int he frequency you specified (in this case seconds).
Plus datetime.datetime.toordinal is since 01/01/0001, pandas base is 1970

rosnfeld · 2014-04-22T14:20:08Z

Yes, I looked at this a little last night, and agree with what you're saying - when you whittle the dataset down to just COL/PAN/VEN rows and then plot, COL and VEN get converted to have PeriodIndexes, but PAN stays with a DatetimeIndex for some reason, and then plotting them all on the same axes (via groupby) blows up somehow.

TomAugspurger · 2014-04-22T15:38:18Z

Thanks.

@rosnfeld you may want the x_compat=True keyword argument to plot. That seems to "solve" the problem

rosnfeld · 2014-04-23T02:23:51Z

Indeed, thanks! I actually hadn't seen that option before. It also fixes the "missing series" variant I mentioned in the original description. The bad xlim variant for the original dataset still remains, though.

I'll try and dig into things a bit and see what can be done - I presume that not requiring x_compat is desirable.

TomAugspurger · 2014-04-23T02:45:42Z

Yeah it would be desirable. May be tricky though. I'm guessing that argument was added for cases precisely like this one.

rosnfeld · 2014-04-23T04:25:16Z

For a bit more detail, I think this is what's going on:

pandas uses special timeseries plotting if it can infer a "periodic" frequency from a series. While it takes a bit of digging, this is part of the _use_dynamic_x() check in tools/plotting.py:

    def _make_plot(self):
        # this is slightly deceptive
        if not self.x_compat and self.use_index and self._use_dynamic_x():
            data = self._maybe_convert_index(self.data)
            self._make_ts_plot(data)
        else:
...  # regular plotting

This special tseries logic converts plotted series to use a PeriodIndex, and sets a "base" version of the frequency on the axes object for later reference. (Note that x_compat disables all of this and uses regular, non-tseries plotting)

The first series to be plotted in my dataset (COL) gets a frequency of '5A-DEC', which can be converted to a period. In the timeseries plotting code the "base" version of this frequency ('A-DEC') gets assigned to the axes object.

The 3-item DatetimeIndex of 1997-12-31, 2003-12-31, and 2008-12-31 for the next series (PAN) has a surprising inferred frequency of 'WOM-5WED' since 1997, 2003, and 2008 all ended on a Wednesday (the 5th Wednesday of December). pandas can't convert frequencies like that to periods, so it uses regular plotting rather than the special timeseries plotting for that series, and its index is not converted to PeriodIndex. It doesn't try and use the axes frequency since it has already inferred a frequency for this series.

The next and final series (VEN) does not have an inferred frequency, so it inherits the axes frequency, and tries to use tseries plotting again. tseries plotting tries to re-calculate x_lim's to include all data, so it looks at the lines already plotted, but it assumes all existing lines will have PeriodIndex data. It blows up when it tries to call 'ordinal' on the DatetimeIndex entries from the earlier (PAN) series.

I'm not sure what the right fix is here. Frequency inference clearly makes some interesting choices, that are relied on in other parts of the codebase. I'm not sure if either the frequency inference or the usage of it should be modified. Timeseries plotting should maybe tolerate non-PeriodIndex data when calculating x_lim, though I don't yet understand much of that code yet, e.g. why PeriodIndex is desirable.

jreback · 2014-04-23T13:23:19Z

@rosnfeld so this occurs when you have multiple series overlaid on the same plot and 1 is converted to PeriodIndex for display while 1 is not.

can you edit the top of the post to make it easily copy-pastable for the failing case?

rosnfeld · 2014-04-23T15:16:10Z

I added an even simpler example at the start of the post. No groupby or anything, just plot a timeseries with an inferred frequency that can be converted to a period, then one that can't, then the first one again, all on the same axes, and you get the same stack trace.

Hope that's along the lines of what you were looking for.

jreback · 2014-04-23T15:19:44Z

ok...I guess the soln is in the plotting routines to check if their is a plot on the axis already that has a conflicting axis/index, then handle the current plotting better.

I am not sure if this involves too much introspection or is even possible (e.g. you would have to get the index state from the axis and not sure if saved 'enough' to be able to figure out what is up)

@rosnfeld give it a shot?

jreback · 2014-04-23T15:20:07Z

moving to 0.15, but if you are able to figure out soon can move back

rosnfeld · 2014-04-23T15:31:25Z

Sure, I can take a shot at it. I'm optimistic something can be done.

rosnfeld · 2014-07-02T00:20:05Z

I think this one should be re-opened - my bad for having a comment in #7322 saying "this does not fix #6608".

However, I think @sinhrks has some PRs that look to affect this behavior somewhat, changing this issue if not closing it.

sinhrks · 2014-07-05T04:40:27Z

#7459 partially fixes this not to raise AttributeError.

But unable to set correct xlim and formatter yet. The result after #7459 is as below.

.

rosnfeld · 2014-07-05T08:56:30Z

Well, regular vs irregular series have pretty different ordinals, as in @TomAugspurger comment above, so I think the problem is unfortunately deeper than just xlim/representation. A solution might be to rework _use_dynamic_x() (in tools/plotting.py), to better catch cases that might mix these two together.

jreback · 2014-10-04T19:54:03Z

@TomAugspurger push?

jreback · 2014-10-05T15:26:18Z

@TomAugspurger status (pushing #7670) ok, so push this as well

jorisvandenbossche · 2016-10-01T14:37:36Z

It looks like this issue is solved in the meantime. At least the simplified example at the top now works correctly for me.

@rosnfeld Would you be able to test with your more complex example as well?

rosnfeld · 2016-10-01T16:58:40Z

Yes! I tested with the more complex example and everything works now. (as of 0.18.1)

rosnfeld · 2016-12-16T13:59:19Z

I see this is still open - should I close it?

Or do people want some unit tests to explicitly try to protect against this happening again? Unfortunately given our (or at least my) incomplete understanding of why it was happening and how it has since been fixed, perhaps the best we could do would be writing a test that would have failed against code from a couple of years ago.

Not sure what community practice is on things like this.

jorisvandenbossche · 2016-12-16T14:03:03Z

Yes, I would first like to see a test added to confirm this (and keep it working!). A PR very welcome!

validation tests, closes pandas-dev#6608.

jorisvandenbossche added the Bug label Mar 12, 2014

jorisvandenbossche added Visualization labels Mar 12, 2014

jreback added this to the 0.14.0 milestone Mar 13, 2014

jreback modified the milestones: 0.15.0, 0.14.0 Apr 23, 2014

rosnfeld mentioned this issue Apr 28, 2014

Plotting irregular timeseries with different indexes: last plotted timeseries dictates xaxis range #2960

Closed

rosnfeld mentioned this issue Jun 3, 2014

BUG: xlim on plots with shared axes (GH2960, GH3490) #7322

Merged

jreback closed this as completed in #7322 Jul 1, 2014

jreback reopened this Jul 2, 2014

sinhrks mentioned this issue Jul 5, 2014

(WIP) BUG/CLN: Better timeseries plotting / refactoring tsplot #7670

Closed

8 tasks

jreback modified the milestones: 0.15.0, 0.15.1 Jul 6, 2014

jreback modified the milestones: 0.15.1, 0.15.0 Oct 6, 2014

jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015

jorisvandenbossche mentioned this issue Jun 1, 2016

plotting mangled with DatetimeIndex to ax with sharex and dissimilar indices #13341

Closed

jorisvandenbossche mentioned this issue Oct 1, 2016

BUG in plotting timeseries data with twinx (different data representation on each ax) #14322

Open

jorisvandenbossche added Difficulty Novice Testing pandas testing functions or related to the test suite and removed Bug labels Dec 16, 2016

jorisvandenbossche mentioned this issue Jan 5, 2017

API/CLN: timeseries plotting #15071

Open

huguesv mentioned this issue May 23, 2017

BUG: strange timeseries plot behavior #16461

Merged

jreback modified the milestones: 0.21.0, Next Major Release May 24, 2017

jreback closed this as completed in b0038ac May 24, 2017

stangirala pushed a commit to stangirala/pandas that referenced this issue Jun 11, 2017

BUG: strange timeseries plot behavior (pandas-dev#16461)

b6ca76a

validation tests, closes pandas-dev#6608.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: strange timeseries plot behavior #6608

BUG: strange timeseries plot behavior #6608

rosnfeld commented Mar 11, 2014

rosnfeld commented Mar 11, 2014

rosnfeld commented Mar 12, 2014

jorisvandenbossche commented Mar 12, 2014

rosnfeld commented Mar 13, 2014

jorisvandenbossche commented Mar 13, 2014

jreback commented Apr 21, 2014

rosnfeld commented Apr 22, 2014

TomAugspurger commented Apr 22, 2014

jorisvandenbossche commented Apr 22, 2014

jorisvandenbossche commented Apr 22, 2014

rosnfeld commented Apr 22, 2014

TomAugspurger commented Apr 22, 2014

rosnfeld commented Apr 23, 2014

TomAugspurger commented Apr 23, 2014

rosnfeld commented Apr 23, 2014

jreback commented Apr 23, 2014

rosnfeld commented Apr 23, 2014

jreback commented Apr 23, 2014

jreback commented Apr 23, 2014

rosnfeld commented Apr 23, 2014

rosnfeld commented Jul 2, 2014

sinhrks commented Jul 5, 2014

rosnfeld commented Jul 5, 2014

jreback commented Oct 4, 2014

jreback commented Oct 5, 2014

jorisvandenbossche commented Oct 1, 2016

rosnfeld commented Oct 1, 2016

rosnfeld commented Dec 16, 2016

jorisvandenbossche commented Dec 16, 2016

BUG: strange timeseries plot behavior #6608

BUG: strange timeseries plot behavior #6608

Comments

rosnfeld commented Mar 11, 2014

rosnfeld commented Mar 11, 2014

rosnfeld commented Mar 12, 2014

jorisvandenbossche commented Mar 12, 2014

rosnfeld commented Mar 13, 2014

jorisvandenbossche commented Mar 13, 2014

jreback commented Apr 21, 2014

rosnfeld commented Apr 22, 2014

TomAugspurger commented Apr 22, 2014

jorisvandenbossche commented Apr 22, 2014

jorisvandenbossche commented Apr 22, 2014

rosnfeld commented Apr 22, 2014

TomAugspurger commented Apr 22, 2014

rosnfeld commented Apr 23, 2014

TomAugspurger commented Apr 23, 2014

rosnfeld commented Apr 23, 2014

jreback commented Apr 23, 2014

rosnfeld commented Apr 23, 2014

jreback commented Apr 23, 2014

jreback commented Apr 23, 2014

rosnfeld commented Apr 23, 2014

rosnfeld commented Jul 2, 2014

sinhrks commented Jul 5, 2014

rosnfeld commented Jul 5, 2014

jreback commented Oct 4, 2014

jreback commented Oct 5, 2014

jorisvandenbossche commented Oct 1, 2016

rosnfeld commented Oct 1, 2016

rosnfeld commented Dec 16, 2016

jorisvandenbossche commented Dec 16, 2016