don't register pandas mpl unit converters upon import #2579

Open
changhiskhan opened this Issue Dec 21, 2012 · 25 comments

Projects

None yet

10 participants

@wesm
Member
wesm commented Jan 20, 2013

Is this critical for 0.10.1?

@changhiskhan
Contributor

no, it's also a slog. lots of separate plotting functions that need to register the unit converters

@ruidc
Contributor
ruidc commented Mar 27, 2013

What about putting them all in a single registration method that's only called when plotting?

@wesm
Member
wesm commented Apr 8, 2013

Pushing past 0.11

@jreback
Contributor
jreback commented Sep 22, 2013

@cpcloud is still an issue?

@cpcloud
Contributor
cpcloud commented Sep 22, 2013

those are still registered...so, yes

@cpcloud
Contributor
cpcloud commented Sep 22, 2013

i can take a look

@cpcloud cpcloud was assigned Sep 22, 2013
@cpcloud
Contributor
cpcloud commented Sep 27, 2013

this is super low priority ... pushing to 0.14

@jreback jreback modified the milestone: 0.15.0, 0.14.0 Feb 18, 2014
@cpcloud cpcloud removed their assignment Feb 21, 2014
@rhattersley

this is super low priority

But also super annoying if you're on the receiving end of it. You can get away with it if you're only using pandas for your plots, but this kind of side-effect is really not pleasant when pandas is used as one component amongst several.

register it upon plotting?

This is only a small improvement - the basic problem of global side-effects still exists.


So I'd much rather make it an explicit user action (with another action available to undo it), and/or a temporary state change that only persists for the lifetime of the pandas plotting routines.

For example, an explicit approach might be:

# <user code>
with pandas.use_nice_date_formats():
    plt.plot(...)

Whereas the automatic temporary state change might look (logically) like:

# <pandas implementation>
class Series(...):
    def plot(...):
        with pandas.use_nice_date_formats():
            plt.plot(...)

# <user code>
ts.plot()
@jorisvandenbossche
Member

For me personally the explicit approach (with ...:) is out of question, as the fact that you can just plot a timeseries with ts.plot() and actually see something sensible on the xlabels is really a strength of pandas, certainly for interactive exploring work (matplotlib really does a bad job at this).
Which does not mean that such a context manager could be usefull feature to have in other circumstances (or to implement the automatic state change).

The automatic temporary state change sounds more attractive to me as a user, but I am not familiar enough with the plotting code to know if this would be easily possible to implement.

@rhattersley

My preference would be to have both. The pandas date formatters do a better job than the defaults (which is why they exist!) so it would be nice to be able to use them (in a controlled fashion!) in other circumstances.

@ruidc
Contributor
ruidc commented Jun 27, 2014

In case it helps some one else: I've been working around it with the following in our startup:


def replace_pandas_mpl_conversions():
    try:
        import matplotlib.dates #to force inbuilt type registration
        d = matplotlib.units.registry.copy() #get state
        import pandas #replaces some matplotlib entries
    except ImportError:
        return
    #reinstate previous registrations
    matplotlib.units.registry.clear()
    matplotlib.units.registry.update(d)

The first problem is not knowing that pandas is taking over - which is a no-no in my book. I agree that it should be explicit and able to be turned off in a setting - we personally do not use matplotlib via pandas but only directly.

@jreback jreback modified the milestone: 0.16.0, 0.17.0 Jan 26, 2015
@TomAugspurger
Contributor

We should be able to wrap something like @ruidc's code up in an option. I'll see if I have time today.

@ocehugo
ocehugo commented Jul 21, 2016

@TomAugspurger,

I just want do add more superannoying this is and urge the fix so we can be all happy fellas.

All packages that import pandas cause this.

So this breaks all calls with datetime objects in matplotlib with old dates.

In the case one might think: but why you don't go with the flow and use pandas to deal with dates!? Is much better... OK ... but wait I can't! Since pandas do not support date_ranges before 1978 (out of bounds).

import pandas as pd
xtime = pd.date_range('1/1/1976','11/1/1976',freq='M')
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1676-01-01 00:00:00

Using the kludge above works but still a kludge hidden in here.

@jreback
Contributor
jreback commented Jul 21, 2016

what version of pandas are you showing above ?

@ocehugo
ocehugo commented Jul 21, 2016 edited
@jreback
Contributor
jreback commented Jul 21, 2016

your example of using date_range works fine
you have a copy paste error

@TomAugspurger
Contributor

@ocehugo I ran out of time today, so feel free to work on it. I think a pd.option to toggle the regiration is the way to handle it.

@ocehugo
ocehugo commented Jul 21, 2016 edited

@jreback yeah, pasting error (should be 1676 in xtime). Not a pandas heavy user so didn't remembered the period_range method ( I could do that but this would required me to change all code related so easier to use the kludge). @TomAugspurger, as said not a pandas guy but if you tell me where this should be I could pull something. I believe the kludge above should be default to not cause conflict with other packages but don't know the consequences of that.

just to stress out , I have some source code that use xarray and others with statsmodels. Just the fact of importing xarray makes all my matplotlib plotting functions not to work.

@TomAugspurger
Contributor

@ocehugo thanks for taking this.

The goal is to get something like pd.options.plotting.register_converters (that name isn't set in stone). That can be set to either True or False.
The option you'll add will be in https://github.com/pydata/pandas/blob/master/pandas/core/config_init.py

For backwards compatibility the default will have to be True (register the converters).
The documentation for the config options is at the top of this file. You'll want to use a callback that calls converters.register (see below) each time the option is set / reset.

The actual converters are defined in https://github.com/pydata/pandas/blob/fc16f1fd21aee163e93e5713a0676f7a79838897/pandas/tseries/converter.py#L53

and used in https://github.com/pydata/pandas/blob/fc16f1fd21aee163e93e5713a0676f7a79838897/pandas/tools/plotting.py#L35

I would add an argument to converters.register

def register(present=True):
    pairs = [
        (lib.Timestamp, DatetimeConverter()),
        (Period, PeriodConverter()),
        ...  # the rest of the converters
    ]
    for key, value in pairs:
        if present:
            units.registry[key] = value
        else:
            units.registry.pop(key, None)

And then tests + documentation 😄 Hopefully not too much work, but let me know if you have questions.

@tacaswell
Contributor

@ocehugo I am impressed by the bread-crumb trail on this issue.

@jorisvandenbossche
Member
jorisvandenbossche commented Jul 21, 2016 edited

An option to configure whether pandas converters are registered for matplotlib or not is certainly a good idea. But can't we also just fix our DatetimeConverter to actually work with all datetime.datetime values?

The following one-line change seems to fix the example from @ocehugo : 095a2ef (just converting the values itself when to_datetime failed) (but maybe I am missing the complexity of the issue)

You still get the adapted axis formatting from pandas (which you could then turn off with the option), but at least the plots would work.

@ocehugo
ocehugo commented Jul 22, 2016 edited

@tacaswell yeah, messed up with issues in almost all related packages since I was not expecting that importing a package would create such a big issue in almost all my source code ,since some basic packages need to support pandas, they need to import it so the problem was all around and such i went to blame matplotlib first...statsmodels...pandas). Should have investigate further before, but was pretty much present in almost all tests that i did until I had to go to raw jupyter without my default profile).

@TomAugspurger will take a look at that but not until next week. But solution above is tempting

@jorisvandenbossche
Member
jorisvandenbossche commented Jul 22, 2016 edited

@ocehugo If pandas would still register its own converter, but if this would not break code that would run without having imported pandas, would that be OK for you?
There will still be differences by importing pandas (eg in the axis formatting), but I would suppose it would solve the biggest problem?

@sinhrks sinhrks added a commit that referenced this issue Aug 16, 2016
@jorisvandenbossche @sinhrks jorisvandenbossche + sinhrks BUG: handle outofbounds datetimes in DatetimeConverter
xref #2579    This at least solves the direct negative consequence
(erroring code by importing pandas) of registering our converters by
default.

Author: Joris Van den Bossche <jorisvandenbossche@gmail.com>

Closes #13801 from jorisvandenbossche/plot-datetime-converter and squashes the following commits:

6b6b08e [Joris Van den Bossche] BUG: handle outofbounds datetimes in DatetimeConverter
5d791cc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment