# Individual sensor permutation tests

The heterogeneity figure in the paper is striking. Is it to be trusted, or is there just so much noise in the data that it's unsurprising? The overall numbers are based on the full dataset, but that figure is per-sensor. Under a null hypothesis of no difference between sensors, what is the variance of a similar plot?

We do this by doing permutation tests for each sensor and computing 1000 realizations of the mean difference. We use these to compute 1000 realizations of the variance of the per-sensor heterogeneity, and compare to the variance of the observed distribution. The permutation test forces the "true" value of the difference for each sensor to be identically 0. It's okay that in the data the "true" value if there were no heterogeneity is -0.56; we're only looking at the variance so a global shift will fall out.

In [None]:
using KFactors, ThreadsX, Pipe, StatsBase, Random, Plots, StableRNGs

In [None]:
rng = StableRNG(938421345)

In [None]:
raw_data = KFactors.read_data("../data/peaks_merged.parquet");

In [None]:
data = KFactors.create_test_data(raw_data, KFactors.Periods.SPRING_2022);

In [None]:
data = data[data.period .∈ Ref(Set([:prepandemic, :postlockdown])), :];

In [None]:
# permutation test is simpler here, since we're doing it per sensor - no need to block bootstrap
# as each per-sensor observation is independent

# give sensors 1-n indices
sensors = collect(unique(data.station))
data.sensor_idx = map(id -> findfirst(sensors .== id), data.station)
results = zeros(Float64, 1000, length(sensors))

# barrier function for type stability
function _update(stations, values, results, permutation)
    for (station, value) in zip(stations, values)
        results[permutation, station] = value
    end
end

function sensor_period_diff(occ, period)
    mean(@view occ[period .== :postlockdown]) - mean(@view occ[period .== :prepandemic])
end

for permutation in collect(1:1000)
    # NB not thread safe
    data.permuted_period = Random.shuffle(rng, data.period)
    res = @pipe groupby(data, :sensor_idx) |>
        combine(_, [:peak_hour_occ, :permuted_period] => sensor_period_diff => :Δpeak_hour_occ)
    _update(res.sensor_idx, res.Δpeak_hour_occ, results, permutation)
end

In [None]:
# Results contains 1000 permutations of differences with 
variances = dropdims(var(results, dims=2), dims=2)
histogram(variances)

In [None]:
variances

In [None]:
# compute variance for the observed distribution
obs_dist = @pipe groupby(data, :sensor_idx) |>
    combine(_, [:peak_hour_occ, :period] => sensor_period_diff => :obs_diff)


test_stat = var(obs_dist.obs_diff)


In [None]:
# compute the p-value.
(1 - mean(test_stat .> variances)) * 2

## Conclusion

This bootstrapped distribution is much tighter than the observed distribution. Not the (sole) explanation for heterogeneity.