truncated violin plots #1494

Closed
ricky-r opened this Issue Mar 19, 2014 · 8 comments

Projects

None yet

3 participants

@ricky-r
ricky-r commented Mar 19, 2014

Hi all,

I have a problem with violin plots when I use the log scale on the y axis. The underlying coloured area goes missing in a one or two boxplots as soon as I turn on the log scale.
truncated_violin
The warning I get on the terminal is:
/usr/local/lib/python2.7/dist-packages/matplotlib/axes.py:1171: UserWarning: aspect is not supported for Axes with xscale=linear, yscale=log
'yscale=%s' % (xscale, yscale))

Is there anything I can do to fix this?

Thanks!

@josef-pkt
Member

@phobson Do you have an idea what the warning might mean?

@ricky-r Can you provide a reproducible example, with real or made up data?
Can you check the smallest values in the first violin plot? Try to clip the smallest values to something larger.
I'm just guessing, we need to figure out if this is the data, the way the violins are calculated or something with matplotlib.

Is this with default settings or did you set any options?

It looks strange to me that the violins are not smooth in this case.

@phobson
Contributor
phobson commented Mar 19, 2014

My guess is that matplotlib thinks (and I agree) that setting an aspect ratio on an axes is ambiguous if the scales aren't the same.

If I had to guess, the funky drawing and warning are unrelated. What you're seeing here might a result of the KDE reaching down into negative values. Like josef said, a working example would help greatly.

@josef-pkt
Member

Thanks Paul,

If the kde has negative values, then we would need an option for users to truncate the violin instead of using the full default range.

possible enhancement to kde:
If the non-smooth shape comes from the log-scale transformation of the original kde, then we could try to get the kde in the transformed space.
One possibility might be to call the violin plots with the transformed data, and then adjust the tick labels on the y axis.
Another possibility, that I never checked: treat the kde as a non-linear transformation of a density, the density of a non-linearly transformed variable is not just a rescaling of the axis.

@ricky-r
ricky-r commented Mar 19, 2014

Can I just give you guys a pickled plot? If not, where can I upload the data from my plot?

Anyway, the data in the first violinplot should be comparable with that in the second one in my example. The ranges sould be very similar.

@phobson
Contributor
phobson commented Mar 19, 2014

A pickled plot won't do anything for us. We need code that generates the data and constructs the figure demonstrating this problem. Creating a gist at https://gist.github.com/ would be great.

@phobson
Contributor
phobson commented Mar 19, 2014

I'm increasingly convinced this is the result of lognormally-distributed data:
http://nbviewer.ipython.org/gist/phobson/9650257

edit: updated the notebook with a quick example of taking the log of the data and redefining the y-labels accordingly

@ricky-r
ricky-r commented Mar 19, 2014

Ok guys, it was a lot simpler than what I thought. I went through the data that I'm plotting and found out that whenever my values included a zero, the corresponding violin plots had the behaviour in the figure above. In any case the boxplot was correct, just the area around it (the violin plot) got messed up a bit.

I have no problem removing these values from my dataset, so no big deal. I'll make a wild guess here: does matplotlib remove all zeros by default when in log scale?

(The warning was indeed unrelated to this issue. Nothing appears when I plot the same data in interactive mode)

@phobson
Contributor
phobson commented Mar 20, 2014

@ricky-r can you close this?

@ricky-r ricky-r closed this Mar 21, 2014
@josef-pkt josef-pkt added the PR label Apr 14, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment