New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Violin plots #2116

Merged
merged 26 commits into from Nov 1, 2017

Conversation

Projects
None yet
5 participants
@etpinard
Member

etpinard commented Oct 24, 2017

Violin plots are coming to plotly.js 馃帀 馃幓 馃帀

Python users can already create violins using @cldougl's create_violin figure factory (example); this PR will bring violin plots to all plotly.js consumer with an API very similar to the box trace type.


IMPORTANT: After the first push of 2017/10/24, this PR is very much a WIP. Several API decisions remain to be made. See the first few comments on commit 3438eae for more info.

traceLayerClasses: [
'imagelayer',
'maplayer',
'barlayer',
'carpetlayer',
'violinlayer',

This comment has been minimized.

@etpinard

etpinard Oct 24, 2017

Member

Making sure violins are under boxes always.

@etpinard

etpinard Oct 24, 2017

Member

Making sure violins are under boxes always.

module.exports = {
supplyDefaults: supplyDefaults,
handleSampleDefaults: handleSampleDefaults,
handlePointsDefaults: handlePointsDefaults

This comment has been minimized.

@etpinard

etpinard Oct 24, 2017

Member

I made a separate reorganisation commit, to show a new way to factor out common trace module blocks. I think this method is a little more consistent with ES6 modules. For example here, supplyDefaults would be the default export.

@etpinard

etpinard Oct 24, 2017

Member

I made a separate reorganisation commit, to show a new way to factor out common trace module blocks. I think this method is a little more consistent with ES6 modules. For example here, supplyDefaults would be the default export.

etpinard added some commits Oct 30, 2017

factor out box/whiskers and mean/sd plotting routine
- to make them reusable in violin/plot.js
- add support for asymmetric bdPos to support one-sided violins
implement violinmode, violingroup and violingroupgap
- totally indenpendent from their box* counterpart
2nd cut violin calc/plot attributes + improve violin curve paths
- add 'kernel' enumerated
- implement 'scalegroup & 'scalemode' (instead of 'scaleby')
- implement 'spanmode' & 'span'
- make 'side' an enumerated w/ vals 'both', 'negative' and 'positive'
  to be general enough for horizontal and vertical violin
- 馃敧 'line.smoothing'
add findPointOnPath geometry2d util function
- to be used to find pt on violin bezier curves

etpinard added some commits Oct 31, 2017

implement violin 'inner' style options
- i.e. showinnerbox & showmeanline and friends.
pass hoverlayer to trace module hoverPoints
- in preparation for violin 'kde' hover handling
implement violin hover
- with three flags: 'violins', 'points' (similar to box traces)
  and 'kde' which show the point on the kde line along
  with the line to crosses the hovered-on violin
@etpinard

This comment has been minimized.

Show comment
Hide comment
@etpinard

etpinard Oct 31, 2017

Member

@alexcjohnson tagging this thing as reviewable. 馃弫

I still need to fill in a few descriptions 鉁忥笍 , add a few tests (especially for one-sided violins) 馃敀 and I think something is off in the way I'm computing scalegroup fields 馃捇 , but oh well, this PR should be in a good enough state to start reviewing 馃攷

Member

etpinard commented Oct 31, 2017

@alexcjohnson tagging this thing as reviewable. 馃弫

I still need to fill in a few descriptions 鉁忥笍 , add a few tests (especially for one-sided violins) 馃敀 and I think something is off in the way I'm computing scalegroup fields 馃捇 , but oh well, this PR should be in a good enough state to start reviewing 馃攷

@alexcjohnson

This comment has been minimized.

Show comment
Hide comment
@alexcjohnson

alexcjohnson Oct 31, 2017

Contributor

is 2x BW kernel-dependent? Presumably epanechnikov and other hard-terminated kernels don't even get that far, and we wouldn't want to autorange for that (or display zero thickness for a while at the ends...)

Contributor

alexcjohnson commented on src/traces/violin/attributes.js in ad51966 Oct 31, 2017

is 2x BW kernel-dependent? Presumably epanechnikov and other hard-terminated kernels don't even get that far, and we wouldn't want to autorange for that (or display zero thickness for a while at the ends...)

This comment has been minimized.

Show comment
Hide comment
@etpinard

etpinard Oct 31, 2017

Member

is 2x BW kernel-dependent?

We could definitevely make it kernel-dependent.

Currently with the rule-of-thumb bandwidth, gaussian:

image

epanechnikov:

image

Member

etpinard replied Oct 31, 2017

is 2x BW kernel-dependent?

We could definitevely make it kernel-dependent.

Currently with the rule-of-thumb bandwidth, gaussian:

image

epanechnikov:

image

This comment has been minimized.

Show comment
Hide comment
@alexcjohnson

alexcjohnson Oct 31, 2017

Contributor

鉂わ笍 spanmode BTW - seems like it encapsulates the common settings really nicely.

Contributor

alexcjohnson replied Oct 31, 2017

鉂わ笍 spanmode BTW - seems like it encapsulates the common settings really nicely.

@alexcjohnson

This comment has been minimized.

Show comment
Hide comment
@alexcjohnson

alexcjohnson Oct 31, 2017

Contributor

Do we need inner or would showbox etc be enough, since it's already an attribute of a violin plot? Also would it be worthwhile making sub-containers, ie:

{
  showbox,
  box: {
    width,
    fillcolor,
    line: {width, color}
  },
  showmeanline,
  meanline: {width, color}
}
Contributor

alexcjohnson commented on src/traces/violin/attributes.js in 6ffc379 Oct 31, 2017

Do we need inner or would showbox etc be enough, since it's already an attribute of a violin plot? Also would it be worthwhile making sub-containers, ie:

{
  showbox,
  box: {
    width,
    fillcolor,
    line: {width, color}
  },
  showmeanline,
  meanline: {width, color}
}

This comment has been minimized.

Show comment
Hide comment
@etpinard

etpinard Oct 31, 2017

Member

A very valid point. I was debating between the two options.

@cldougl @chriddyp what do you think?

Member

etpinard replied Oct 31, 2017

A very valid point. I was debating between the two options.

@cldougl @chriddyp what do you think?

This comment has been minimized.

Show comment
Hide comment
@chriddyp

chriddyp Oct 31, 2017

Member

@alexcjohnson 's proposal looks good to me, although I'm not quite sure what inner refers to (is there an outer?)

Member

chriddyp replied Oct 31, 2017

@alexcjohnson 's proposal looks good to me, although I'm not quite sure what inner refers to (is there an outer?)

This comment has been minimized.

Show comment
Hide comment
@etpinard

etpinard Oct 31, 2017

Member

inner as in inside the violin, not a separate box trace. I thought it would feel weird to have showbox in a violin trace: some people might think there should be a showviolin in box traces, which we won't allow. Adding inner break that unwanted symmetry.

Moreover, I think we should be careful whenever we add nested attributes. I feel most users are confused by them (especially when they used them via restyle/relayout). Nested attributes do make parts of our API very clean (line and marker nested attributes in scatter-like traces are a good example), but when there are only a few nested keys (e.g. width, fillcolor and color) they might not be worth it in my opinion. Take for example axis grid attributes, they go as showgrid, gridwidth and gridcolor, not grid.visible, grid.color and grid.width.

Member

etpinard replied Oct 31, 2017

inner as in inside the violin, not a separate box trace. I thought it would feel weird to have showbox in a violin trace: some people might think there should be a showviolin in box traces, which we won't allow. Adding inner break that unwanted symmetry.

Moreover, I think we should be careful whenever we add nested attributes. I feel most users are confused by them (especially when they used them via restyle/relayout). Nested attributes do make parts of our API very clean (line and marker nested attributes in scatter-like traces are a good example), but when there are only a few nested keys (e.g. width, fillcolor and color) they might not be worth it in my opinion. Take for example axis grid attributes, they go as showgrid, gridwidth and gridcolor, not grid.visible, grid.color and grid.width.

This comment has been minimized.

Show comment
Hide comment
@etpinard

etpinard Nov 1, 2017

Member

Updated to nested attribute syntax in 0a32b98

Member

etpinard replied Nov 1, 2017

Updated to nested attribute syntax in 0a32b98

This comment has been minimized.

Show comment
Hide comment
@etpinard

etpinard Nov 1, 2017

Member

fixup: that's commit 0a32b98

Member

etpinard replied Nov 1, 2017

fixup: that's commit 0a32b98

@alexcjohnson

This comment has been minimized.

Show comment
Hide comment
@alexcjohnson

alexcjohnson Oct 31, 2017

Contributor

These kernels don't have the same width - which I guess means RMS width? That would be the sqrt of the third column in https://en.wikipedia.org/wiki/Kernel_(statistics)#Kernel_functions_in_common_use ie 1/sqrt(5) for epanechnikov and 1 for gaussian. So changing from gaussian to epanechnikov would make the distribution look less smooth just because it's narrower, nothing to do with the detailed kernel shape. Might be worthwhile checking what other packages do, but I would have thought it would be more appropriate to normalize the widths - which I guess would make epanechnikov 3/(4*sqrt(5)) * (1 - v*v/5)

Contributor

alexcjohnson commented on src/traces/violin/helpers.js in ad51966 Oct 31, 2017

These kernels don't have the same width - which I guess means RMS width? That would be the sqrt of the third column in https://en.wikipedia.org/wiki/Kernel_(statistics)#Kernel_functions_in_common_use ie 1/sqrt(5) for epanechnikov and 1 for gaussian. So changing from gaussian to epanechnikov would make the distribution look less smooth just because it's narrower, nothing to do with the detailed kernel shape. Might be worthwhile checking what other packages do, but I would have thought it would be more appropriate to normalize the widths - which I guess would make epanechnikov 3/(4*sqrt(5)) * (1 - v*v/5)

This comment has been minimized.

Show comment
Hide comment
@etpinard

etpinard Oct 31, 2017

Member

I've never seen violins plotted with anything other than a gaussian kernel. I'm starting to think we should perhaps 馃敧 that kernel attribute.

Member

etpinard replied Oct 31, 2017

I've never seen violins plotted with anything other than a gaussian kernel. I'm starting to think we should perhaps 馃敧 that kernel attribute.

This comment has been minimized.

Show comment
Hide comment
@alexcjohnson

alexcjohnson Oct 31, 2017

Contributor

I'm starting to think we should perhaps 馃敧 that kernel attribute.

Fine by me, will @cpsievert complain that he can't reproduce all of geom_violin? Which, if I'm reading it right, has options "gaussian", "rectangular", "triangular", "epanechnikov", "biweight", "cosine" or "optcosine"? https://stat.ethz.ch/R-manual/R-devel/library/stats/html/density.html

Wouldn't be hard to add in later anyhow, so you could defer these comments about bandwidth and removing zeros at the end.

Contributor

alexcjohnson replied Oct 31, 2017

I'm starting to think we should perhaps 馃敧 that kernel attribute.

Fine by me, will @cpsievert complain that he can't reproduce all of geom_violin? Which, if I'm reading it right, has options "gaussian", "rectangular", "triangular", "epanechnikov", "biweight", "cosine" or "optcosine"? https://stat.ethz.ch/R-manual/R-devel/library/stats/html/density.html

Wouldn't be hard to add in later anyhow, so you could defer these comments about bandwidth and removing zeros at the end.

This comment has been minimized.

Show comment
Hide comment
@etpinard

etpinard Oct 31, 2017

Member

Right, that's the density geom, geom_violin doesn't have such option (I think)

Member

etpinard replied Oct 31, 2017

Right, that's the density geom, geom_violin doesn't have such option (I think)

This comment has been minimized.

Show comment
Hide comment
@alexcjohnson

alexcjohnson Oct 31, 2017

Contributor

Ah, I missed that there are two functions there with somewhat different signatures, weird. Sure, 馃敧 for now anyway.

Contributor

alexcjohnson replied Oct 31, 2017

Ah, I missed that there are two functions there with somewhat different signatures, weird. Sure, 馃敧 for now anyway.

This comment has been minimized.

Show comment
Hide comment
@cpsievert

cpsievert Oct 31, 2017

ggplot2::geom_density() does indeed eventually call stats::density()...

I wouldn't worry too much about covering all of the kernel options...similar to geom_boxplot(), the default in ggplotly() will be to pre-aggregate (and use a scatter trace in this case), but it's also possible to "opt-in" to the plotly.js aggregated version

cpsievert replied Oct 31, 2017

ggplot2::geom_density() does indeed eventually call stats::density()...

I wouldn't worry too much about covering all of the kernel options...similar to geom_boxplot(), the default in ggplotly() will be to pre-aggregate (and use a scatter trace in this case), but it's also possible to "opt-in" to the plotly.js aggregated version

This comment has been minimized.

Show comment
Hide comment
@alexcjohnson

alexcjohnson Oct 31, 2017

Contributor

the default in ggplotly() will be to pre-aggregate

Because you're concerned about sending too much data? Seems a bit of a shame, as we can do a bunch of cool things with a dedicated trace type that we can't do if you smoosh it into scatter (have you seen the hover effects??? 馃槏 but other effects like one-sided could be annoying with scatter)

So I wonder if we should provide an intermediate option to let you still use the actual violin trace type. Like a weight array, so you would pre-bin into some large-ish number of bins (1000?) and provide y and weight (which in this case would be just the count for that bin). Certainly not necessary for violins rev 1 but if we can do something simple like that to give more folks access to @etpinard's awesome work, we should! For box too I guess.

Contributor

alexcjohnson replied Oct 31, 2017

the default in ggplotly() will be to pre-aggregate

Because you're concerned about sending too much data? Seems a bit of a shame, as we can do a bunch of cool things with a dedicated trace type that we can't do if you smoosh it into scatter (have you seen the hover effects??? 馃槏 but other effects like one-sided could be annoying with scatter)

So I wonder if we should provide an intermediate option to let you still use the actual violin trace type. Like a weight array, so you would pre-bin into some large-ish number of bins (1000?) and provide y and weight (which in this case would be just the count for that bin). Certainly not necessary for violins rev 1 but if we can do something simple like that to give more folks access to @etpinard's awesome work, we should! For box too I guess.

This comment has been minimized.

Show comment
Hide comment
@etpinard

etpinard Oct 31, 2017

Member

Referencing #1059

Member

etpinard replied Oct 31, 2017

Referencing #1059

This comment has been minimized.

Show comment
Hide comment
@alexcjohnson

alexcjohnson Oct 31, 2017

Contributor

Referencing #1059

Ah right - though violins have more info than boxes so the box solution wouldn't work here. Pre-binned & weighted data, though it would slightly change the outcome, would work in either case. There are also cases where your data is explicitly weighted in some way that would just use a mechanism like this directly. Anyway not for now...

Contributor

alexcjohnson replied Oct 31, 2017

Referencing #1059

Ah right - though violins have more info than boxes so the box solution wouldn't work here. Pre-binned & weighted data, though it would slightly change the outcome, would work in either case. There are also cases where your data is explicitly weighted in some way that would just use a mechanism like this directly. Anyway not for now...

This comment has been minimized.

Show comment
Hide comment
@etpinard

etpinard Nov 1, 2017

Member

Commit 677aacc 馃敧 kernel.

Member

etpinard replied Nov 1, 2017

Commit 677aacc 馃敧 kernel.

@alexcjohnson

This comment has been minimized.

Show comment
Hide comment
@alexcjohnson

alexcjohnson Oct 31, 2017

Contributor

hmm, this is a little tricky... it's giving the v labels more digits than they warrant. Axes.hoverLabelText calls Axes.tickText with 'hover' mode. This whole machinery is getting awfully contorted at this point, but 'hover' mode gives several more digits than you need to distinguish pixels, as it's meant to label data points precisely without going overboard. But regular Axes.tickText might have too few digits to distinguish pixels. This is really a new use case - though if we put continuous labels on lines it will show up again there. I'm OK leaving it as is for now, but lets make an issue to clean up this machinery and provide a tickText variant tailored to pixel-level detail.

Contributor

alexcjohnson commented on src/traces/violin/hover.js in bc6bc02 Oct 31, 2017

hmm, this is a little tricky... it's giving the v labels more digits than they warrant. Axes.hoverLabelText calls Axes.tickText with 'hover' mode. This whole machinery is getting awfully contorted at this point, but 'hover' mode gives several more digits than you need to distinguish pixels, as it's meant to label data points precisely without going overboard. But regular Axes.tickText might have too few digits to distinguish pixels. This is really a new use case - though if we put continuous labels on lines it will show up again there. I'm OK leaving it as is for now, but lets make an issue to clean up this machinery and provide a tickText variant tailored to pixel-level detail.

This comment has been minimized.

Show comment
Hide comment
@etpinard

etpinard Nov 1, 2017

Member

Ha right. I had a feeling tickText didn't exactly do the right thing. New issue 鈫笍 #2137

Member

etpinard replied Nov 1, 2017

Ha right. I had a feeling tickText didn't exactly do the right thing. New issue 鈫笍 #2137

etpinard added some commits Oct 31, 2017

remove 'kernel' from violin attributes
- as non-gaussian kernels may require us to make soft spanmode bounds
  kernel-dependent, which will require some trial-and-error
- as most other libraries don't support non-guassian kernel, let's
  defer this.
update box and meanline attribute syntax
- use nested attribute style for box and meanline settings
- update test and mocks
add violin style mock
- that 馃敀 custom bandwidth and some box style settings.
@alexcjohnson

This comment has been minimized.

Show comment
Hide comment
@alexcjohnson

alexcjohnson Nov 1, 2017

Contributor

Beautiful! 馃幍 Now we really have some music for our 馃拑 !

Contributor

alexcjohnson commented Nov 1, 2017

Beautiful! 馃幍 Now we really have some music for our 馃拑 !

@etpinard etpinard merged commit 81ddaca into master Nov 1, 2017

1 check passed

ci/circleci Your tests passed on CircleCI!
Details

@etpinard etpinard deleted the violins-dev branch Nov 1, 2017

@etpinard etpinard referenced this pull request Jan 30, 2018

Closed

[feature] Violin plots #1024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment