# Generating random charts with `Altair` and `Scipy`

Based on _The Weighted Average Illusion:  Biases in Perceived Mean Position in Scatterplots_, the authors used the following: 

> To generate the x- and y-data, we used Poisson disk sampling [50] to produce 30 uniquely distributed point grids, with minimum distance between the boundaries of any two points set at 8 pixels. This methodology is similar to Gleicher et al. [34]. Each dataset always contained 30 marks, with the number of points selected in piloting.

For the sake of integrating with the `Revisit` platformm, we will use `Altair` and `Scipy`. Luckily `scipy` provides a possion-disk sampler. 

- Note that QMC only provide an $n \times d$ array of numbers in $[0, 1]$. ([source](https://docs.scipy.org/doc/scipy/reference/stats.qmc.html))

In [1]:
# import altair as alt 
import polars as pl
from scipy.stats import qmc # quasi monte carlo submodule 
import numpy as np
import altair as alt

## Playing with the basic syntax 

In [2]:
# Poisson disk sampling 
# https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.qmc.PoissonDisk.html

rng = np.random.default_rng()
engine = qmc.PoissonDisk(d=2, radius = 0.08, rng = None)
sample = engine.random(30).round(2) # round this to 2 

# turn sample into df for plotting
df = pl.DataFrame(sample)
df
# df.head(5)

# plot using altair
(
    
    alt.Chart(df).mark_point(filled = True).encode(
        alt.X('column_0').scale(domain=(0, 1)),
        alt.Y('column_1').scale(domain=(0, 1))
    ).properties(
        # width = 300,
        # height = 300
    )
)

The default widht and height of altair is **300 pixels**. The default `mark_point` size (**pixel area**, NOT radius) is 30 pixels. 

## Define plotting function 

In [11]:
def create_plot(n=30, point_radius=12.5, point_gap=8, canvas_size=500, optimization_on=True):
    # initialize poisson engine 
    engine = qmc.PoissonDisk(d=2, 
                             radius = (point_radius * 2 + point_gap) * 1.05 / canvas_size, 
                             hypersphere="volume", 
                             ncandidates = 30, 
                             optimization="lloyd" if optimization_on else None)
    # sample dots and scale them 
    sample = qmc.scale(engine.random(n), [point_radius, point_radius], [canvas_size - point_radius, canvas_size - point_radius])
    # turn it to dataframe 
    df = pl.DataFrame(sample)
    # turn it to chart 
    chart = alt.Chart(df).mark_point(filled = True, size = point_radius * point_radius * np.pi, color = "gray").encode(
        alt.X('column_0').scale(domain=(0, canvas_size)).axis(labels = False, grid = False, title = None),
        alt.Y('column_1').scale(domain=(0, canvas_size)).axis(labels = False, grid = False, title = None)
    ).properties(
        width = canvas_size,
        height = canvas_size
    )
    return chart, df, df.mean() # df.mean().to_numpy().flatten()

In [12]:
# optimization makes the points spread more evenly across the canvas 
create_plot()[0]

In [13]:
create_plot()[1]
create_plot()[2]

column_0,column_1
f64,f64
196.123503,357.47793


In [38]:
# or else the dots look more condensed 
create_plot(optimization_on=True)[0]

## Outputing things into vega spec

Using `vl-convert`: https://github.com/vega/vl-convert

In [41]:
import vl_convert as vlc

In [43]:
c = create_plot(optimization_on=False)[0]
c.to_json()

'{\n  "$schema": "https://vega.github.io/schema/vega-lite/v6.1.0.json",\n  "config": {\n    "view": {\n      "continuousHeight": 300,\n      "continuousWidth": 300\n    }\n  },\n  "data": {\n    "name": "data-b30a3782f6062cf47ec6c858df7b0928"\n  },\n  "datasets": {\n    "data-b30a3782f6062cf47ec6c858df7b0928": [\n      {\n        "column_0": 241.65198664869078,\n        "column_1": 80.47699196819835\n      },\n      {\n        "column_0": 208.1645241099953,\n        "column_1": 91.83466986346522\n      },\n      {\n        "column_0": 230.92446203585322,\n        "column_1": 19.10934119202788\n      },\n      {\n        "column_0": 277.1857975892772,\n        "column_1": 57.23401227736489\n      },\n      {\n        "column_0": 288.44969483981157,\n        "column_1": 89.18687288991124\n      },\n      {\n        "column_0": 195.31783332949078,\n        "column_1": 38.896401090204094\n      },\n      {\n        "column_0": 221.81903054094263,\n        "column_1": 122.64486723792244\n  

In [45]:
vega_spec = vlc.vegalite_to_vega(c.to_json(), vl_version="5.20")
vega_spec

{'$schema': 'https://vega.github.io/schema/vega/v5.json',
 'background': 'white',
 'padding': 5,
 'width': 500,
 'height': 500,
 'style': 'cell',
 'data': [{'name': 'data-b30a3782f6062cf47ec6c858df7b0928',
   'values': [{'column_0': 241.6519866486908, 'column_1': 80.47699196819835},
    {'column_0': 208.1645241099953, 'column_1': 91.83466986346522},
    {'column_0': 230.9244620358532, 'column_1': 19.10934119202788},
    {'column_0': 277.1857975892772, 'column_1': 57.23401227736489},
    {'column_0': 288.44969483981157, 'column_1': 89.18687288991124},
    {'column_0': 195.3178333294908, 'column_1': 38.896401090204094},
    {'column_0': 221.81903054094263, 'column_1': 122.64486723792244},
    {'column_0': 167.04882851998275, 'column_1': 13.770002799070069},
    {'column_0': 293.04262758557763, 'column_1': 19.272199476799024},
    {'column_0': 156.67185019662443, 'column_1': 58.67243162746145},
    {'column_0': 135.1188418655325, 'column_1': 28.755534941499583},
    {'column_0': 94.708231

## Calculate mean 

In [209]:
a, b, c = create_plot(optimization_on=False)

In [210]:
a

In [212]:
# Add centroid afterward
centroid_mark = alt.Chart(c).mark_circle(
    size=15**2 * np.pi,
    color='red',
    stroke='black',
    strokeWidth=2
).encode(
    x=alt.X('column_0', scale=alt.Scale(domain=[0, 500]), axis=None),
    y=alt.Y('column_1', scale=alt.Scale(domain=[0, 500]), axis=None)
)

# Layer them together
chart_with_centroid = a + centroid_mark

In [213]:
chart_with_centroid