# Averaging models
Probably the best way to ensure robust inferences and estimate error isn't to use bootstrapping of a single model, but rather to actuall have multiple experimental replicates, ideally on different libraries.

Here we describe how to average model fits across libraries and/or replicates.

## Split data into replicates
We will use our data for the RBD as an earlier examples, but split it into several libraries / replicates.

Specifically, we will fit two different libraries: `avg3muts` and `avg4muts`, which have different barcodes and also different mutation rates (although of course in real life you might sometimes want to average results from different libraries with the same mutation rates).
We will also simulate having two replicates for each library just by drawing bootstrap samples from each library and then dropping duplicates in samples:

In [1]:
import pandas as pd

import polyclonal.polyclonal
import polyclonal.polyclonal_collection


# read data
all_data = pd.read_csv("RBD_variants_escape_noisy.csv", na_filter=None)

# split by library and replicates
libraries = ["avg3muts", "avg4muts"]  # the two libraries to use
concentrations = [0.25, 1, 4]  # use these concentrations
n_replicates = 2  # number of replicates per library

data_by_replicate = {
    (library, replicate + 1): polyclonal.polyclonal_collection.create_bootstrap_sample(
        all_data.query("library == @library").query("concentration in @concentrations"),
        seed=replicate + 1,
    ).drop_duplicates()
    for library in libraries
    for replicate in range(n_replicates)
}

## Fit models to each replicate
We now fit a `Polyclonal` model to each replicate using 3 epitopes:

In [2]:
models_by_replicate = {}
for (library, replicate), data in data_by_replicate.items():
    model = polyclonal.Polyclonal(data_to_fit=data, n_epitopes=3)
    models_by_replicate[(library, replicate)] = model

_ = polyclonal.polyclonal_collection.fit_models(
    models_by_replicate.values(),
    n_threads=2,
)

Now make a data frame with each model and other information describing it:

In [3]:
models_df = (
    pd.Series(models_by_replicate, name="model")
    .rename_axis(["library", "replicate"])
    .reset_index()
)

## Average the models