To revisit the experiment design of the privacy and contact tracing effectiveness paper.

# The Setup (so far)

We have an $n$-dimensional lattice with $p$ (proportion) of the edges $E$ rewired.

The contact tracing system is deployed such that $q$ of the rewired edges (_distant_) are traced, and $r$ of the original lattice (_close_) edges are traced.

Holding $p$ constant for now, the model definition implies a probability distribution $P(i | q, r)$.

This distribution has moments _mean_ $\mu_{q,r}$ and _standard deviation $\sigma_{q,r}$.

There are other values of interest defined as follows:

$$T = qp + r (1 - p)$$

## A complication

We believe that for low enough values of $q$ and $r$ the distribution $P(i | q, r)$ is bimodal.

The lower mode consists of cases where the epidemic does not occur because the disease dies out quickly.

The upper mode consists of epidemic cases. In the untraced graph, epidemics are of large size because the boundary of the infected region of the graph can expand geometrically as the disease traverses distant edges.

However, as the contract tracing system is more widely deployed (higher $q$ and $r$), the upper peak diminishes and epidemic sizes become more of a tail.

In all cases, with only a single index at the start of the simulation, early extinctions are common.



# Operationalizing the research question

We are interested, ultimately, in the design of contact tracing systems and especially the sensitivity of those designs to the privacy preferences of their users.

We have asserted that the _distant_ edges are more likely to be sensitive than _close_ edges, and that a user might opt out of the entire system rather than have a distant edge traced.

For this reason, we are interested in the relative significance of distant and close edges for tracing efficacy.

We believe that distant edges are more important for the spread of the disease.

**Q1: How do we operationalize the relative importance of the distance vs. close edges for the spread of the disease?**

We believe (based on preliminery results) that nonetheless the distant edges are not more important than close edges for the _tracing_ of the disease.

**Q2: How do we operationalize the relative importance of distant vs. close edges for the tracing of the disease?**

# Q1: 

In earlier tests, we found that the final infected ratio of networks _without_ the contact tracing system was increasing in $p$, with an inflection point.

In subsequent runs, we have picked a $p$ value that is on the phase boundary so that the effects of the contact tracing system would be visible. (I.e., with $p$ too high or too low, the graph topology might overwhelm any contact tracing effect.)

# Q2: 

One way to do this is to look at the effect of each additional edge on the mean infection rate.

One approach: we can compare $\partial_q \mu$ with $\partial_r \mu$, the effect of a change in $q$ or $r$ on the mean infected ratio. We can then normalize these values to determine whether an additional distant or closed edge is more effective at improving the contact tracing system.

The distant edges is more effective if the following condition holds:

$$\frac{(1 - p) \partial_q \mu}{p \partial_r \mu}(q,r) > 1$$

However, we cannot compute this value directly. The functions involved are indeed not continuous when defined on finite graphs. Also, we need to work from simulated samples of $P$. Furthermore, a difference in means will not have any direct meaning in terms of statistical significance.

So instead of trying to compute a partial derivative, we need to look at statistical effect sizes instead. We consider [Cohen's d](https://en.wikiversity.org/wiki/Cohen%27s_d), the difference of means divided by the pooled standard deviation, as an effect size measure:

$$d(x_1, x_2) = \frac{\bar{\mu}_1 - \bar{\mu}_2}{\bar{\sigma}_{1,2}}$$

Where $\bar{\mu}$ and $\bar{\sigma}$ are the empirical mean and standard deviations for two cases 1 and 2.

For a sampled grid of different values of $q$ and $r$ that are spaced at $\Delta_q$ and $\Delta_r$, we can compute 

$$d_q(q, r) = (x(q, r),x(q + \Delta_q, r)) = \frac{\mu_{q,r} - \mu_{q + \Delta_q, r}}{\sigma_{q,r}}$$

and 

$$d_r(q,r) = d(x(q, r),x(q + \Delta_r, r)) = \frac{\mu_{q,r} - \mu_{q + \Delta_r, r}}{\sigma_{q,r}}$$

to compute the effect size of shifts along the grid, and normalize these values to compare the approximate estimated effect of an additional distant or close edge:

$$\frac{(1 - p) d_q (q, r)}{p d_r (q, r)} > 1$$




## Further considerations, for design

In considering whether the effect size is great enough to warrant a change in the design of contact tracing system, we need to explicitly model user preferences.


### A (too) simple model

A simple model is that the marginal user will decide against using the contact tracing system if it means that a distant edge is traced.

For a lattice connectivity $K$ (i.e., the number of nodes each node is connected to in the original lattice), the average number of distant edges per node is $Kp$; the average number of close edges is $K(1 - p)$.

The designer is notionally deciding between $n$ nodes adopting, with all viable close and distant edges traced, and $n + 1$ nodes adopting, but with the $(n + 1)$th node blocking the tracing on their own distant edge.

In this case, it will _always_ be better to have the distant edges be optional, because there will be more traced edges in the privacy-preserving case.

However, this model is almost certainly too simplistic, as is hardly interacts with the effectiveness of the contact tracing at all.

### A different model

Here, the user's utility is modeled more explicitly. The privacy cost of having a traced distant edge is weighed against other costs such as desires for personal and public health. Depending on this utility function, we see either fewer nodes, but fully traced (close and distant edges), or more nodes, but with few distant edges among them traced.

Note that the distribution of edges in either case will be different from the case we've been studying thus far, which has 100% adoption but arbitrarily limits edge tracing in order to measure the effectiveness rates of different edges. If we had a different way of operationalizing Q2, we might approach this differently.