# Bootstrapping model fits
The previous section describes fitting a single model.
But we may also want to have confidence estimates for the fit.
We can do that via bootstrapping the data set.

The overall recommended workflow is to first fit models to all the data to determine the number of epitopes, etc.
Then once the desired fitting parameters are determined, you can bootstrap to get confidence on predictions.

Here we illustrate bootstrapping on the simulated RBD data using the noisy data with an average of 2 mutations per gene.

## Get a model fit to all the data
The first step is just to fit a `Polyclonal` model to all the data.
We just did that in the previous notebook for our RBD example and saved the model using [pickle](https://docs.python.org/3/library/pickle.html), so here we just read in that model rather than re-fitting.
We will call this the "root" model as it's used as the starting point (root) for the subsequent bootstrapping.
Note that data (which we will bootstrap) are attached to this pre-fit model:

In [1]:
# NBVAL_IGNORE_OUTPUT

import pickle

with open("fit_RBD_model.pickle", "rb") as f:
    root_poly = pickle.load(f)

root_poly.data_to_fit

Unnamed: 0,library,aa_substitutions,concentration,prob_escape,IC90
0,avg2muts,,0.25,0.05044,0.1128
1,avg2muts,,0.25,0.14310,0.1128
2,avg2muts,,0.25,0.05452,0.1128
3,avg2muts,,0.25,0.08473,0.1128
4,avg2muts,,0.25,0.04174,0.1128
...,...,...,...,...,...
89995,avg2muts,Y508V,4.00,0.00000,0.2531
89996,avg2muts,Y508V A520L,4.00,0.03180,0.4688
89997,avg2muts,Y508V H519N,4.00,0.10630,0.5528
89998,avg2muts,Y508W,4.00,0.03754,0.2285


## Create and fit bootstrapped models
To create the bootstrapped models, we initialize a `PolyclonalCollection`, here just using 5 samples for speed (for good error estimates you may want more on the order of 20 to 100).
Note it is important that the root model you are using has already been fit to the data!

In [2]:
import polyclonal

n_bootstrap_samples = 5

bootstrap_poly = polyclonal.PolyclonalCollection(
    root_polyclonal=root_poly,
    n_bootstrap_samples=n_bootstrap_samples,
)

Now fit the models:

In [3]:
# NBVAL_IGNORE_OUTPUT

import time

start = time.time()
print(f"Starting fitting bootstrap models at {time.asctime()}")
n_fit, n_failed = bootstrap_poly.fit_models()
print(f"Fitting took {time.time() - start:.3g} seconds, finished at {time.asctime()}")
assert n_failed == 0 and n_fit == n_bootstrap_samples

Starting fitting bootstrap models at Mon Mar 14 16:08:27 2022
Fitting took 99.9 seconds, finished at Mon Mar 14 16:10:07 2022


## Look at summarized results

In [4]:
bootstrap_poly.activity_wt_df_replicates.round(3)

Unnamed: 0,epitope,activity,bootstrap_replicate
0,1,1.266,1
1,2,3.179,1
2,3,1.927,1
3,1,1.152,2
4,2,3.171,2
5,3,2.042,2
6,1,1.273,3
7,2,3.151,3
8,3,1.988,3
9,1,1.305,4


In [5]:
bootstrap_poly.activity_wt_df.round(3)

Unnamed: 0,epitope,mean_activity,median_activity,std_activity
0,1,1.254,1.272,0.059
1,2,3.159,3.155,0.016
2,3,1.971,1.957,0.046


In [6]:
bootstrap_poly.mut_escape_df_replicates.round(3)

Unnamed: 0,epitope,site,wildtype,mutant,mutation,escape,bootstrap_replicate
0,1,331,N,A,N331A,0.100,1
1,1,331,N,D,N331D,0.074,1
2,1,331,N,E,N331E,0.288,1
3,1,331,N,F,N331F,0.119,1
4,1,331,N,G,N331G,2.150,1
...,...,...,...,...,...,...,...
28975,3,531,T,R,T531R,0.991,5
28976,3,531,T,S,T531S,0.588,5
28977,3,531,T,V,T531V,0.775,5
28978,3,531,T,W,T531W,1.267,5


In [7]:
bootstrap_poly.mut_escape_df.round(3)

Unnamed: 0,epitope,site,wildtype,mutant,mutation,mean_escape,median_escape,std_escape,n_bootstrap_replicates,frac_bootstrap_replicates
0,1,331,N,A,N331A,0.076,0.041,0.078,5,1.0
1,1,331,N,D,N331D,0.450,0.147,0.491,5,1.0
2,1,331,N,E,N331E,0.030,-0.107,0.393,5,1.0
3,1,331,N,F,N331F,0.326,0.216,0.262,5,1.0
4,1,331,N,G,N331G,1.954,2.045,0.480,5,1.0
...,...,...,...,...,...,...,...,...,...,...
5791,3,531,T,R,T531R,0.894,0.946,0.102,5,1.0
5792,3,531,T,S,T531S,0.678,0.733,0.182,5,1.0
5793,3,531,T,V,T531V,0.717,0.773,0.161,5,1.0
5794,3,531,T,W,T531W,1.406,1.452,0.285,5,1.0


In [8]:
bootstrap_poly.mut_escape_site_summary_df_replicates.round(3)

Unnamed: 0,epitope,site,wildtype,mean,total positive,max,min,total negative,bootstrap_replicate
0,1,331,N,0.731,11.722,2.150,-0.031,-0.031,1
1,1,332,I,0.764,15.631,2.546,-0.618,-1.125,1
2,1,333,T,0.353,7.295,1.429,-0.500,-0.941,1
3,1,334,N,0.688,12.558,2.320,-0.170,-0.170,1
4,1,335,L,0.088,5.379,1.404,-1.386,-3.701,1
...,...,...,...,...,...,...,...,...,...
2590,3,527,P,0.800,13.598,1.661,0.050,0.000,5
2591,3,528,K,0.701,12.625,1.080,0.235,0.000,5
2592,3,529,K,0.712,12.819,1.237,0.026,0.000,5
2593,3,530,S,0.735,14.585,1.944,-0.328,-0.612,5
