-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Description
I have tried to reproduce the following plot using plotly:
(Source: Fundamentals of Bayesian Data Analysis in R (Ch.2), Datacamp)
It is essentially a scatter plot with marginal histograms of n_visitors
and proportion_clicks
obtained from:
n_samples = 100000
n_ads_shown = 100
proportion_clicks = np.random.uniform(high=0.2, size=n_samples)
n_visitors = np.random.binomial(n=n_ads_shown, p=proportion_clicks, size=n_samples)
My implementation is the following:
fig = go.FigureWidget()
trace1 = fig.add_scatter(x=n_visitors, y=proportion_clicks, mode='markers', name='points',
marker = dict(size=10,
opacity=.1,
color='white',
line = dict(width=1, color = '#1f77b4')
)
)
trace2 = fig.add_histogram(x=n_visitors, name='x density', marker=dict(color='#1f77b4', opacity=0.7),
yaxis='y2'
)
trace3 = fig.add_histogram(y=proportion_clicks, name='y density', marker=dict(color='#1f77b4', opacity=0.7),
xaxis='x2'
)
fig.layout = dict(xaxis=dict(domain=[0, 0.85], showgrid=False, zeroline=False),
yaxis=dict(domain=[0, 0.85], showgrid=False, zeroline=False),
showlegend=False,
margin=dict(t=50),
hovermode='closest',
bargap=0,
xaxis2=dict(domain=[0.85, 1], showgrid=False, zeroline=False),
yaxis2=dict(domain=[0.85, 1], showgrid=False, zeroline=False),
height=600,
)
fig
The problems with this implementation are the following:
1. It is slow (roughly 10 times slower than R)
R code:
n_samples <- 100000
n_ads_shown <- 100
proportion_clicks <- runif(n = n_samples, min = 0.0, max = 0.2)
n_visitors <- rbinom(n = n_samples, size = n_ads_shown, prob = proportion_clicks)
scatterhist = function(x, y, xlab="", ylab=""){
zones=matrix(c(2,0,1,3), ncol=2, byrow=TRUE)
layout(zones, widths=c(4/5,1/5), heights=c(1/5,4/5))
xhist = hist(x, plot=FALSE)
yhist = hist(y, plot=FALSE)
top = max(c(xhist$counts, yhist$counts))
par(mar=c(3,3,1,1))
plot(x,y)
par(mar=c(0,3,1,1))
barplot(xhist$counts, axes=FALSE, ylim=c(0, top), space=0)
par(mar=c(3,0,1,1))
barplot(yhist$counts, axes=FALSE, xlim=c(0, top), space=0, horiz=TRUE)
par(oma=c(3,3,0,0))
mtext(xlab, side=1, line=1, outer=TRUE, adj=0,
at=.8 * (mean(x) - min(x))/(max(x)-min(x)))
mtext(ylab, side=2, line=1, outer=TRUE, adj=0,
at=(.8 * (mean(y) - min(y))/(max(y) - min(y))))
}
scatterhist(n_visitors, proportion_clicks)
Where scatterhist
function was taken from here.
Also, JupyterLab crushes if I run the code a couple of times.
2. Interactivity
When I select all the data points corresponding to e.g. 10 n_visitors, I expect the histogram to the right of the scatter plot to reflect these changes. However, only the histogram on top of the plot does so.
To be more precise:
I want something like this:
but, I get something like this:
3. Editing the plot is not straightforward
If I try to add frames to the scatter plot as in the very first plot, it becomes a mess. Adding zeroline=True
and mirror=True
to both xaxis
and yaxis
does not bring me closer.