Skip to content

Scatter plot with marginal histograms #1445

@sursu

Description

@sursu

I have tried to reproduce the following plot using plotly:
image
(Source: Fundamentals of Bayesian Data Analysis in R (Ch.2), Datacamp)

It is essentially a scatter plot with marginal histograms of n_visitors and proportion_clicks obtained from:

n_samples = 100000
n_ads_shown = 100
proportion_clicks = np.random.uniform(high=0.2, size=n_samples)
n_visitors = np.random.binomial(n=n_ads_shown, p=proportion_clicks, size=n_samples)

My implementation is the following:

fig = go.FigureWidget()
trace1 = fig.add_scatter(x=n_visitors, y=proportion_clicks, mode='markers', name='points',
                         marker = dict(size=10,
                                       opacity=.1,
                                       color='white',
                                       line = dict(width=1, color = '#1f77b4')
                                      )
                        )
trace2 = fig.add_histogram(x=n_visitors, name='x density', marker=dict(color='#1f77b4', opacity=0.7),
                      yaxis='y2'
                     )
trace3 = fig.add_histogram(y=proportion_clicks, name='y density', marker=dict(color='#1f77b4', opacity=0.7), 
                      xaxis='x2'
                     )
fig.layout = dict(xaxis=dict(domain=[0, 0.85], showgrid=False, zeroline=False),
                  yaxis=dict(domain=[0, 0.85], showgrid=False, zeroline=False),
                  showlegend=False,
                  margin=dict(t=50),
                  hovermode='closest',
                  bargap=0,
                  xaxis2=dict(domain=[0.85, 1], showgrid=False, zeroline=False),
                  yaxis2=dict(domain=[0.85, 1], showgrid=False, zeroline=False),
                  height=600,
                 )
fig

newplot

The problems with this implementation are the following:

1. It is slow (roughly 10 times slower than R)

R code:

n_samples <- 100000
n_ads_shown <- 100
proportion_clicks <- runif(n = n_samples, min = 0.0, max = 0.2)
n_visitors <- rbinom(n = n_samples, size = n_ads_shown, prob = proportion_clicks)

scatterhist = function(x, y, xlab="", ylab=""){
  zones=matrix(c(2,0,1,3), ncol=2, byrow=TRUE)
  layout(zones, widths=c(4/5,1/5), heights=c(1/5,4/5))
  xhist = hist(x, plot=FALSE)
  yhist = hist(y, plot=FALSE)
  top = max(c(xhist$counts, yhist$counts))
  par(mar=c(3,3,1,1))
  plot(x,y)
  par(mar=c(0,3,1,1))
  barplot(xhist$counts, axes=FALSE, ylim=c(0, top), space=0)
  par(mar=c(3,0,1,1))
  barplot(yhist$counts, axes=FALSE, xlim=c(0, top), space=0, horiz=TRUE)
  par(oma=c(3,3,0,0))
  mtext(xlab, side=1, line=1, outer=TRUE, adj=0, 
        at=.8 * (mean(x) - min(x))/(max(x)-min(x)))
  mtext(ylab, side=2, line=1, outer=TRUE, adj=0, 
        at=(.8 * (mean(y) - min(y))/(max(y) - min(y))))
}

scatterhist(n_visitors, proportion_clicks)

Where scatterhist function was taken from here.

Also, JupyterLab crushes if I run the code a couple of times.

2. Interactivity

When I select all the data points corresponding to e.g. 10 n_visitors, I expect the histogram to the right of the scatter plot to reflect these changes. However, only the histogram on top of the plot does so.

To be more precise:
I want something like this:
image
but, I get something like this:
newplot 1

3. Editing the plot is not straightforward

If I try to add frames to the scatter plot as in the very first plot, it becomes a mess. Adding zeroline=True and mirror=True to both xaxis and yaxis does not bring me closer.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions