BUG in plotting timeseries data with twinx (different data representation on each ax) #14322

Open
cygenb0ck opened this Issue Sep 29, 2016 · 8 comments

Comments

Projects
None yet
3 participants

cygenb0ck commented Sep 29, 2016 edited

Simplified the example.
During issue reporting i updated my pandas version from 0.13 to 0.18.1 - with version 0.13 i was able to add the whole DataFrame to my plot, only after i selected some rows between dates it produces the error. After the update the behaviour became worse - also adding the whole DataFrame to the plot now produces the error.

A small, complete example of the issue

import pandas
import dateutil.parser
import matplotlib.pyplot as plt

p_vals = {
    'x_vals' : [
        "2006-12-17 00:00:00+01:00",
        "2006-12-18 00:00:00+01:00",
        "2006-12-19 00:00:00+01:00",
        "2006-12-20 00:00:00+01:00",
        "2006-12-21 00:00:00+01:00",
        "2006-12-22 00:00:00+01:00",
        "2006-12-23 00:00:00+01:00",
        "2006-12-24 00:00:00+01:00",
        "2006-12-25 00:00:00+01:00",
        "2006-12-26 00:00:00+01:00",
    ],
    'y_vals' : [
        10,9,8,7,6,5,4,3,2,1
    ]
}

p_vals2 = {
    'x_vals' : [
        "2006-12-17 00:00:00+01:00",
        "2006-12-18 00:00:00+01:00",
        "2006-12-19 00:00:00+01:00",
        "2006-12-20 00:00:00+01:00",
        "2006-12-21 00:00:00+01:00",
    ],
    'y_vals' : [
        1,2,3,4,5
    ]
}

p_vals['x_vals'] = [ dateutil.parser.parse(x) for x in p_vals['x_vals'] ]
p_vals2['x_vals'] = [ dateutil.parser.parse(x) for x in p_vals2['x_vals'] ]

df = pandas.DataFrame(data = [1,2,3,4,5], index=["2006-12-17","2006-12-18","2006-12-19","2006-12-20","2006-12-21"])
df.index = pandas.to_datetime(df.index, format="%Y-%m-%d")

fig, ax1 = plt.subplots()
ax2 = ax1.twinx()

ax1.plot(p_vals['x_vals'], p_vals['y_vals'], color="r")
#ax2.plot(p_vals2['x_vals'], p_vals2['y_vals'], color="b") # works as intended, see second attached image
df.plot(ax=ax2, color="b") # hides data on ax1, see first image

plt.show()

Expected Output

Output of pd.show_versions()

## INSTALLED VERSIONS

commit: None
python: 3.4.3.final.0
python-bits: 64
OS: Linux
OS-release: 3.19.0-69-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.1
nose: 1.3.1
pip: 8.1.2
setuptools: 3.3
Cython: None
numpy: 1.11.1
scipy: 0.13.3
statsmodels: None
xarray: None
IPython: 1.2.1
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.6.1
blosc: None
bottleneck: None
tables: 3.1.1
numexpr: 2.2.2
matplotlib: 1.5.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.2.1
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None
pandas_how_it_looks
pandas_how_it_should_look

Contributor

TomAugspurger commented Sep 29, 2016

Pandas 0.13 is quite old, can you try with a more recent version? Also see if you can simplify your example a bit.

also the x label look strange.

What do you mean by strange?

i just updated to pandas: 0.18.1 - sry for not trying with updated panda version
now both of my plot calls hide the data on the first axis.

sorry for my bad wording - by strange i meant, just, that is looks different

@cygenb0ck Can you try to simplify the example? Eg try to make it reproducible by not having to read a csv file but just create the data with code. Also try to remove other things that are not essential to the problem as much as possible.

cygenb0ck changed the title from After selecting rows between dates and plotting with matplotlib, plotted rows hide first axis to Plotting DataFrame on second axis hides data on first axis - was: Plotting a DataFrame on second axis hides data on first axis Oct 1, 2016

@jorisvandenbossche
simplified the example and changed the subject

cygenb0ck changed the title from Plotting DataFrame on second axis hides data on first axis - was: Plotting a DataFrame on second axis hides data on first axis to Plotting DataFrame on second axis hides data on first axis - was: After selecting rows between dates and plotting with matplotlib, plotted rows hide first axis Oct 1, 2016

@cygenb0ck Thanks a lot! That let me look at it, and it's a bit a gotcha with the dates.

To start, it's not an issue with the twinx. Eg if you try the following similar example (but without using datetimes), you will see it works as expected:

fig, ax1 = plt.subplots()
ax2 = ax1.twinx()

ax1.plot([1,3,2], color="r")

df3 = pd.DataFrame({'col': [2,5,1]})
df3.plot(ax=ax2, color="b")
# df3['col'].plot(ax=ax2, color="b") # to plot one column not full dataframe

The reason it does not work with the example data you gave, is not because the plot is overwritten, but because the data on the first ax now fall outside the visible plot (if you zoom out enough, you will see both lines). This is because the dates are handled differently in the two cases.
The reason for that is a problem in pandas' plotting machinery to combine both irregular and regular time series in one plot (and because your data on ax1 have hours (although daily freq), they are regarded as irregular, the data on ax2 are regular). Related issues are #6608, #9053, #13341. We should definitely solve this ...
However, in this case it seems also specific to using twinx, as not using this does also solve the issue (then the second data are plotted fine).

Workaround you can use for now is by also plotting on ax2 with the matplotlib plot call:

fig, ax1 = plt.subplots()
ax1.plot(p_vals['x_vals'], p_vals['y_vals'], color="r")
ax2 = ax1.twinx()
ax2.plot(df.index, df[0].values, color="b")

jorisvandenbossche changed the title from Plotting DataFrame on second axis hides data on first axis - was: After selecting rows between dates and plotting with matplotlib, plotted rows hide first axis to BUG in plotting timeseries data with twinx (different data representation on each ax) Oct 1, 2016

Apparently, using x_compat=True is also a way to get this working:

fig, ax1 = plt.subplots()
ax1.plot(p_vals['x_vals'], p_vals['y_vals'], color="r")
ax2 = ax1.twinx()
df.plot(ax=ax2, x_compat=True, color="b")

It's mentioned in the docs: http://pandas.pydata.org/pandas-docs/stable/visualization.html#suppressing-tick-resolution-adjustment (although for another reason, I am not that familiar with this keyword)

@jorisvandenbossche
thank you very much for the workaround with x_compat=True. I can finally plot my data and continue my project.

This was only partly closed by #14330 (this example still does not work when first plotting the irregular series, #14330 added the test but commented it out)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment