VIS: errorbar plotting #3796

Closed
cpcloud opened this Issue Jun 7, 2013 · 12 comments

Comments

Projects
None yet
7 participants
@cpcloud
Member

cpcloud commented Jun 7, 2013

quoting @jreback

maybe

s = Series

s.plot(error_bar = [-1,1])

equiv to:

df = DataFrame(dict(s = s, top = s.std(), bot = -s.std())).plot()

bottom seems the right way to do this....

@cpcloud cpcloud referenced this issue Jun 7, 2013

Closed

error bars #3774

@ghost ghost assigned cpcloud Jul 3, 2013

@alefnula

This comment has been minimized.

Show comment Hide comment
@alefnula

alefnula Sep 5, 2013

Contributor

It would be also great if something like this could work:

df = pd.DataFrame({'row': list('abc' * 6),
                   'col': list('xxxxxxyyyyyyzzzzzz'),
                   'value': np.random.randn(18)+5})
mean = df.pivot_table(values='value', rows='row',
                      cols='col', aggfunc='mean')
err  = df.pivot_table(values='value', rows='row',
                      cols='col', aggfunc='std')
mean.plot(kind='bar', yerr=err)

Or any variation on the subject...

Contributor

alefnula commented Sep 5, 2013

It would be also great if something like this could work:

df = pd.DataFrame({'row': list('abc' * 6),
                   'col': list('xxxxxxyyyyyyzzzzzz'),
                   'value': np.random.randn(18)+5})
mean = df.pivot_table(values='value', rows='row',
                      cols='col', aggfunc='mean')
err  = df.pivot_table(values='value', rows='row',
                      cols='col', aggfunc='std')
mean.plot(kind='bar', yerr=err)

Or any variation on the subject...

@cpcloud

This comment has been minimized.

Show comment Hide comment
@cpcloud

cpcloud Sep 5, 2013

Member

I've been thinking a little about this API. It's a bit annoying because there umpteen parameters to matplotlib.errorbar, so it might be worth making a separate method or a function in pandas.tools.plotting. Generally speaking though we don't want to pollute the namespace with a method for every kind of plot, but here it might be necessary.

@alefnula Your input on the API is most welcome.

Member

cpcloud commented Sep 5, 2013

I've been thinking a little about this API. It's a bit annoying because there umpteen parameters to matplotlib.errorbar, so it might be worth making a separate method or a function in pandas.tools.plotting. Generally speaking though we don't want to pollute the namespace with a method for every kind of plot, but here it might be necessary.

@alefnula Your input on the API is most welcome.

@cpcloud

This comment has been minimized.

Show comment Hide comment
@cpcloud

cpcloud Sep 5, 2013

Member

On second thought, the most obvious of having plot(kind='line', yerr=yerr, xerr=xerr, **kwds) is probably the way to go. Maybe allowing yerr and xerr to be column names?

Member

cpcloud commented Sep 5, 2013

On second thought, the most obvious of having plot(kind='line', yerr=yerr, xerr=xerr, **kwds) is probably the way to go. Maybe allowing yerr and xerr to be column names?

@kdebrab

This comment has been minimized.

Show comment Hide comment
@kdebrab

kdebrab Oct 10, 2013

Contributor

I'm not fully mastering the forking, branching, pull requesting process yet, so I try to contribute in this rather primitive way...

The example of alefnula works perfect if one includes the following few lines at line 1488 of pandas.tools.plotting (i.e. inside the for loop of the _make_plot function):

from pandas.core.frame import DataFrame
if 'yerr' in self.kwds and isinstance(self.kwds['yerr'], DataFrame):
    kwds['yerr'] = self.kwds['yerr'][label]

With those lines added, also this matplotlib gallery example becomes straightforward:

import pandas as pd
import matplotlib.pyplot as plt

index = tuple('ABDCE')
means_men = (20, 35, 30, 35, 27)
std_men = (2, 3, 4, 1, 2)
means_women = (25, 32, 34, 20, 25)
std_women = (3, 5, 2, 3, 3)

df_means = pd.DataFrame({'Men':means_men, 'Women':means_women}, index=index)
df_yerr = pd.DataFrame({'Men':std_men, 'Women':std_women}, index=index)

fig, ax = plt.subplots()
df_means.plot(kind='bar',
             ax=ax,
             color=['b','r'],
             alpha=0.4,
             yerr=df_yerr,
             error_kw={'ecolor': '0.3'})
ax.set(xlabel='Group', ylabel='Scores', title='Scores by group and gender')
plt.show()
Contributor

kdebrab commented Oct 10, 2013

I'm not fully mastering the forking, branching, pull requesting process yet, so I try to contribute in this rather primitive way...

The example of alefnula works perfect if one includes the following few lines at line 1488 of pandas.tools.plotting (i.e. inside the for loop of the _make_plot function):

from pandas.core.frame import DataFrame
if 'yerr' in self.kwds and isinstance(self.kwds['yerr'], DataFrame):
    kwds['yerr'] = self.kwds['yerr'][label]

With those lines added, also this matplotlib gallery example becomes straightforward:

import pandas as pd
import matplotlib.pyplot as plt

index = tuple('ABDCE')
means_men = (20, 35, 30, 35, 27)
std_men = (2, 3, 4, 1, 2)
means_women = (25, 32, 34, 20, 25)
std_women = (3, 5, 2, 3, 3)

df_means = pd.DataFrame({'Men':means_men, 'Women':means_women}, index=index)
df_yerr = pd.DataFrame({'Men':std_men, 'Women':std_women}, index=index)

fig, ax = plt.subplots()
df_means.plot(kind='bar',
             ax=ax,
             color=['b','r'],
             alpha=0.4,
             yerr=df_yerr,
             error_kw={'ecolor': '0.3'})
ax.set(xlabel='Group', ylabel='Scores', title='Scores by group and gender')
plt.show()
@athrpf

This comment has been minimized.

Show comment Hide comment
@athrpf

athrpf Oct 23, 2013

The solution of kdebrab is definitely the right way to go. However, by default matplotlib uses the same color for the errorbars if no error_kw is provided, which makes it impossible to see. I would suggest using a different default for the error bar colors.

athrpf commented Oct 23, 2013

The solution of kdebrab is definitely the right way to go. However, by default matplotlib uses the same color for the errorbars if no error_kw is provided, which makes it impossible to see. I would suggest using a different default for the error bar colors.

@arnaldorusso

This comment has been minimized.

Show comment Hide comment
@arnaldorusso

arnaldorusso Nov 20, 2013

I have tried this too and could not get this working.
I also agree with the method cited by alefnula. This was my first try, with no previous knowledge that it could not work.
One method that take in consideration the errorbars as a DataFrame or a Serie could be the way.

I have tried this too and could not get this working.
I also agree with the method cited by alefnula. This was my first try, with no previous knowledge that it could not work.
One method that take in consideration the errorbars as a DataFrame or a Serie could be the way.

@r-b-g-b

This comment has been minimized.

Show comment Hide comment
@r-b-g-b

r-b-g-b Nov 22, 2013

I've made some headway on implementing this feature (using a lot of the nice suggestions in this thread). Right now my changes allow for errorbars on Line and Bar plots given a list/tuple/ndarray/Series of error values, a DataFrame of label-matched errors, or a column name. There are a couple of things I'm struggling with in terms of code neatness at this point.

Log scale on line plots: the way this currently works is that depending on the values of logx/y keyword arguments, a specific matplotlib function is used (plot, semilogx, semilogy, or loglog). However, the simplest way to get error bars is with the matplotlib's errorbars, which can handle log scales via interacting with the axis directly:

ax.set_xscale('log')
ax.set_yscale('log')

I think a clean (but less conservative) way to get around this is to rewrite the line plot code to use matplotlib's plot and handle log axes with ax.set_xscale and ax.set_yscale, which as far as I can tell is what semilogx/y are doing under the hood. Any thoughts? I'll also give it a shot, maybe it's not going to be disruptive as I suspect.

r-b-g-b commented Nov 22, 2013

I've made some headway on implementing this feature (using a lot of the nice suggestions in this thread). Right now my changes allow for errorbars on Line and Bar plots given a list/tuple/ndarray/Series of error values, a DataFrame of label-matched errors, or a column name. There are a couple of things I'm struggling with in terms of code neatness at this point.

Log scale on line plots: the way this currently works is that depending on the values of logx/y keyword arguments, a specific matplotlib function is used (plot, semilogx, semilogy, or loglog). However, the simplest way to get error bars is with the matplotlib's errorbars, which can handle log scales via interacting with the axis directly:

ax.set_xscale('log')
ax.set_yscale('log')

I think a clean (but less conservative) way to get around this is to rewrite the line plot code to use matplotlib's plot and handle log axes with ax.set_xscale and ax.set_yscale, which as far as I can tell is what semilogx/y are doing under the hood. Any thoughts? I'll also give it a shot, maybe it's not going to be disruptive as I suspect.

@TomAugspurger

This comment has been minimized.

Show comment Hide comment
@TomAugspurger

TomAugspurger Nov 25, 2013

Contributor

You can give it a shot. You'll need to be careful if you're changing things with the LinePlot code. Unless I'm looking in the wrong places, I don't think that any of it is tested.

Contributor

TomAugspurger commented Nov 25, 2013

You can give it a shot. You'll need to be careful if you're changing things with the LinePlot code. Unless I'm looking in the wrong places, I don't think that any of it is tested.

@cpcloud

This comment has been minimized.

Show comment Hide comment
@cpcloud

cpcloud Nov 25, 2013

Member

It's not directly tested. It's only tested through the user-facing API.

Best,
Phillip Cloud

On Mon, Nov 25, 2013 at 9:08 AM, Tom Augspurger notifications@github.comwrote:

You can give it a shot. You'll need to be careful if you're changing
things with the LinePlot code. Unless I'm looking in the wrong places, I
don't think that any of it is tested.


Reply to this email directly or view it on GitHubhttps://github.com/pydata/pandas/issues/3796#issuecomment-29203701
.

Member

cpcloud commented Nov 25, 2013

It's not directly tested. It's only tested through the user-facing API.

Best,
Phillip Cloud

On Mon, Nov 25, 2013 at 9:08 AM, Tom Augspurger notifications@github.comwrote:

You can give it a shot. You'll need to be careful if you're changing
things with the LinePlot code. Unless I'm looking in the wrong places, I
don't think that any of it is tested.


Reply to this email directly or view it on GitHubhttps://github.com/pydata/pandas/issues/3796#issuecomment-29203701
.

@r-b-g-b

This comment has been minimized.

Show comment Hide comment
@r-b-g-b

r-b-g-b Nov 25, 2013

I'm new to contributing, so I'd welcome any advice on how to make sure this is working properly. I've written some of my own tests, which are basically just running as many different combinations of plot arguments as I can think of. It survives those. Is there anything else I should try?

r-b-g-b commented Nov 25, 2013

I'm new to contributing, so I'd welcome any advice on how to make sure this is working properly. I've written some of my own tests, which are basically just running as many different combinations of plot arguments as I can think of. It survives those. Is there anything else I should try?

@TomAugspurger

This comment has been minimized.

Show comment Hide comment
@TomAugspurger

TomAugspurger Nov 25, 2013

Contributor

Have you pushed your code to a branch yet? Try that and then we can look at it.

Here's some notes on contributing, just post if you get stuck somewhere.

Contributor

TomAugspurger commented Nov 25, 2013

Have you pushed your code to a branch yet? Try that and then we can look at it.

Here's some notes on contributing, just post if you get stuck somewhere.

@r-b-g-b

This comment has been minimized.

Show comment Hide comment
@r-b-g-b

r-b-g-b Nov 25, 2013

Thanks for the link! I'll try to get up to speed. In the meantime, I've pushed the changes to my github account... https://github.com/gibbonorbiter/pandas.

r-b-g-b commented Nov 25, 2013

Thanks for the link! I'll try to get up to speed. In the meantime, I've pushed the changes to my github account... https://github.com/gibbonorbiter/pandas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment