# Exploratory Demo of Additive Site Methods


*Goal:* infer prevalence of additive, small-effect genome sites from knockout experiments.

**Outline:**
- Generate example genome
- Survey outcomes across a range of knockout doses (number of sites knocked out) in order to choose dose levels to focus testing on
   - want doses where detectable fitness effects occur *sometimes* (but not *always* or *never*)
- For each chosen dose level, do many knockout experiments to measure probability of detectable fitness effect at that dose
- Fit negative binomial distribution to estimate underlying quantity of small-effect sites and their effect sizes


## Preliminaries


In [1]:
import numpy as np

from pylib.analyze_additive import (
    assay_additive_naive,
    pick_doses_extrema,
)
from pylib.modelsys_explicit import GenomeExplicit
from pylib.modelsys_explicit import (
    CalcKnockoutEffectsAdditive,
    create_additive_array,
    GenomeExplicit,
)


Method implementations are organized as external Python source files within the local `pylib` directory.


In [2]:
np.random.seed(1234)


Ensure reproducibility.


## Set Up Sample Genome


Create a genome with 1,000 distinct sites, with 5% having an additive fitness effect when knocked out.
Effect sizes are distributed uniformly between 0 and 0.2, relative to the detectability threshold of 1.0.


In [3]:
num_sites = 1000
distn = lambda x: np.random.rand(x) * 0.7  # mean effect size of 0.35
additive_array = create_additive_array(num_sites, 0.05, distn)
genome = GenomeExplicit(
    [CalcKnockoutEffectsAdditive(additive_array)],
)


The true number of additive sites is


In [4]:
num_additive_sites = additive_array.astype(bool).sum()
num_additive_sites


50

## Choose Dose Levels


Perform exploratory knockout experiments at a broad range of dose levels to decide the knockout doses to focus on testing.


In [5]:
knockout_doses = pick_doses_extrema(
    genome.test_knockout, num_sites, max_doses=5, smear_count=250
)
knockout_doses


array([ 25,  53,  82, 111, 140])

## Estimate Additive Site Prevalence


Do 1,000 knockout tests at each dose, fit distribution to observed sensitivity rates at each dose, and then estimate per-site effect size and num additive sites.


In [6]:
est = assay_additive_naive(
    genome.test_knockout, num_sites, knockout_doses, num_replications=1000
)
display(est)


{'num additive sites': 30.0,
 'per-site effect size': 0.5,
 'negative binomial fit': {'r': 2,
  'p': 0.03,
  'fit quantiles': [0.17196237831183514,
   0.4747450581391982,
   0.7090592221553703,
   0.8492146234542075,
   0.9250517311316833],
  'error': 0.006826150230427061},
 'knockout doses': array([ 25,  53,  82, 111, 140]),
 'dose sensitivies': [0.141, 0.411, 0.723, 0.878, 0.953]}

In [7]:
print("actual", num_additive_sites)
print("estimated", est["num additive sites"])


actual 50
estimated 30.0
