# Exploratory Demo of Additive Site Methods


*Goal:* infer prevalence of additive, small-effect genome sites from knockout experiments.

**Outline:**
- Generate example genome
- Survey outcomes across a range of knockout doses (number of sites knocked out)
- Choose doses to focus on, do many knockout experiments to measure probability of detectable fitness effect at each dose
- Fit negative binomial distribution to estimate underlying quantity of small-effect sites and their effect sizes


## Preliminaries


In [1]:
import numpy as np

from pylib.analyze_additive import (
    assay_additive_naive,
    pick_doses_extrema,
)
from pylib.modelsys_explicit import GenomeExplicit
from pylib.modelsys_explicit import (
    CalcKnockoutEffectsAdditive,
    create_additive_array,
    GenomeExplicit,
)


Method implementations are organized as external Python source files within the local `pylib` directory.


In [2]:
np.random.seed(1234)


Ensure reproducibility.


## Set Up Sample Genome


Create a genome with 1,000 distinct sites, with 5% having a knockout fitness effect below detectability threshold.
Effect sizes are distributed uniformly between 0 and 0.2, relative to the detectability threshold of 1.0.
Knockout effects are assumed additive.


In [3]:
num_sites = 1000
distn = lambda x: np.random.rand(x) * 0.2  # mean effect size of 0.1
additive_array = create_additive_array(num_sites, 0.05, distn)
genome = GenomeExplicit(
    [CalcKnockoutEffectsAdditive(additive_array)],
)


For comparison, the true number of additive sites is


In [4]:
num_additive_sites = additive_array.astype(bool).sum()
num_additive_sites


50

## Estimate Additive Sites


Perform exploratory knockout experiments at a broad range of dose levels to decide the knockout doses to focus on testing.


In [5]:
knockout_doses = pick_doses_extrema(
    genome.test_knockout, num_sites, max_doses=5, smear_count=250
)
knockout_doses


array([ 87, 136, 186, 236, 286])

Do 1,000 knockout tests at each dose and calculate estimate based on the distribution of sensitivity rates at each dose.


In [6]:
est = assay_additive_naive(
    genome.test_knockout, num_sites, knockout_doses, num_replications=1000
)
display(est)


{'num additive sites': 60.0,
 'per-site effect size': 0.07692307692307693,
 'negative binomial fit': {'r': 13,
  'p': 0.06,
  'fit quantiles': [0.006879833407397874,
   0.113119404469053,
   0.4169207581176875,
   0.7350341276676735,
   0.9128724581627263],
  'error': 0.0010567470760802563},
 'knockout doses': array([ 87, 136, 186, 236, 286]),
 'dose sensitivies': [0.013, 0.141, 0.43, 0.729, 0.907]}

In [7]:
print("actual", num_additive_sites)
print("estimated", est["num additive sites"])


actual 50
estimated 60.0
