# Generating random charts with `Altair` and `Scipy`

Based on _The Weighted Average Illusion:  Biases in Perceived Mean Position in Scatterplots_, the authors used the following: 

> To generate the x- and y-data, we used Poisson disk sampling [50] to produce 30 uniquely distributed point grids, with minimum distance between the boundaries of any two points set at 8 pixels. This methodology is similar to Gleicher et al. [34]. Each dataset always contained 30 marks, with the number of points selected in piloting.

For the sake of integrating with the `Revisit` platformm, we will use `Altair` and `Scipy`. Luckily `scipy` provides a possion-disk sampler. 

- Note that QMC only provide an $n \times d$ array of numbers in $[0, 1]$. ([source](https://docs.scipy.org/doc/scipy/reference/stats.qmc.html))

In [3]:
# import altair as alt 
import polars as pl
from scipy.stats import qmc # quasi monte carlo submodule 
import numpy as np
import altair as alt

## Playing with the basic syntax 

In [7]:
# Poisson disk sampling 
# https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.qmc.PoissonDisk.html

rng = np.random.default_rng()
engine = qmc.PoissonDisk(d=2, radius = 0.08, rng = None)
sample = engine.random(30).round(2) # round this to 2 

# turn sample into df for plotting
df = pl.DataFrame(sample)
df
# df.head(5)

# plot using altair
(
    
    alt.Chart(df).mark_point(filled = True).encode(
        alt.X('column_0').scale(domain=(0, 1)),
        alt.Y('column_1').scale(domain=(0, 1))
    ).properties(
        # width = 300,
        # height = 300
    )
)

The default widht and height of altair is **300 pixels**. The default `mark_point` size (**pixel area**, NOT radius) is 30 pixels. 

## Define plotting function 

In [208]:
def create_plot(n=30, point_radius=12.5, point_gap=8, canvas_size=500, optimization_on=True):
    # initialize poisson engine 
    engine = qmc.PoissonDisk(d=2, 
                             radius = (point_radius * 2 + point_gap) * 1.05 / canvas_size, 
                             hypersphere="volume", 
                             ncandidates = 30, 
                             optimization="lloyd" if optimization_on else None)
    # sample dots and scale them 
    sample = qmc.scale(engine.random(n), [point_radius, point_radius], [canvas_size - point_radius, canvas_size - point_radius])
    # turn it to dataframe 
    df = pl.DataFrame(sample)
    # turn it to chart 
    chart = alt.Chart(df).mark_point(filled = True, size = point_radius * point_radius * np.pi, color = "gray").encode(
        alt.X('column_0').scale(domain=(0, canvas_size)),
        alt.Y('column_1').scale(domain=(0, canvas_size))
    ).properties(
        width = canvas_size,
        height = canvas_size
    )
    return chart, df, df.mean() # df.mean().to_numpy().flatten()

In [192]:
# optimization makes the points spread more evenly across the canvas 
create_plot()[0]

In [193]:
create_plot()[1]
create_plot()[2]

array([275.91652845, 163.21250007])

In [180]:
# or else the dots look more condensed 
create_plot(optimization_on=False)[0]

## Calculate mean 

In [209]:
a, b, c = create_plot(optimization_on=False)

In [210]:
a

In [212]:
# Add centroid afterward
centroid_mark = alt.Chart(c).mark_circle(
    size=15**2 * np.pi,
    color='red',
    stroke='black',
    strokeWidth=2
).encode(
    x=alt.X('column_0', scale=alt.Scale(domain=[0, 500]), axis=None),
    y=alt.Y('column_1', scale=alt.Scale(domain=[0, 500]), axis=None)
)

# Layer them together
chart_with_centroid = a + centroid_mark

In [213]:
chart_with_centroid