# Fit model to data
We will fit a `Polyclonal` model to the RBD antibody mix we simulated.

First, we read in that simulated data:

In [1]:
import pandas as pd

import polyclonal

variants_df = pd.read_csv('RBD_variants_escape.csv', na_filter=None)

variants_df

Unnamed: 0,library,barcode,aa_substitutions,n_aa_substitutions,concentration,prob_escape
0,lib_avg2,AAAAAAACGTTTTGTC,,0,0.5,0.005594
1,lib_avg2,AAAAAAACTAAAAGTC,,0,0.5,0.005594
2,lib_avg2,AAAAAGTGTCTAAGCC,,0,0.5,0.005594
3,lib_avg2,AAAAATGGCGCGGTAT,,0,0.5,0.005594
4,lib_avg2,AAAACGAGACATCACC,,0,0.5,0.005594
...,...,...,...,...,...,...
119995,lib_avg3,CGTGGTCTTGTAGCGG,Y508V S530L,2,2.0,0.006302
119996,lib_avg2,CTGGGTTGGAGTGCCC,Y508V T531A,2,2.0,0.010350
119997,lib_avg3,GATGAGGGGCATAGCC,Y508V T531C,2,2.0,0.006213
119998,lib_avg2,GCACGTTGTGATTGGT,Y508W,1,2.0,0.001043


Initialize a `Polyclonal` model with these data, including three epitopes.
We know from [prior work](https://www.nature.com/articles/s41467-021-24435-8) the three most important epitopes and a key mutation in each, so we use this prior knowledge to "seed" initial guesses that assign large escape values to a key mutation in each epitope:

 - E484K for the dominant class 2 epitope
 - G446V for the (usually) next most dominant class 3 epitope
 - K417N for the class 1 epitope

In [2]:
poly_abs = polyclonal.Polyclonal(data_to_fit=variants_df,
                                 activity_wt_df=pd.DataFrame.from_records(
                                         [('epitope 1', 3.0),
                                          ('epitope 2', 2.0),
                                          ('epitope 3', 1.0),
                                          ],
                                         columns=['epitope', 'activity'],
                                         ),
                                 mut_escape_df=pd.DataFrame.from_records(
                                         [('epitope 1', 'E484K', 10.0),
                                          ('epitope 2', 'G446V', 10.0),
                                          ('epitope 3', 'K417N', 10.0),
                                          ],
                                         columns=['epitope', 'mutation', 'escape'],
                                         ),
                                 data_mut_escape_overlap='fill_to_data',
                                 )

In [8]:
opt_res = poly_abs.fit(loss_type='L2',
                       regL2_mut_escape=0.1,
                       scipy_solver='CG')

KeyboardInterrupt: 

In [4]:
opt_res = poly_abs.fit(loss_type='L2')

RuntimeError: optimization failed:
      fun: 2247.882713248024
 hess_inv: <5799x5799 LbfgsInvHessProduct with dtype=float64>
      jac: array([ 1.77842068e+03,  6.56300476e+00,  1.06856533e+00, ...,
       -2.65245035e+00, -1.34150469e-02, -1.95541361e-03])
  message: 'STOP: TOTAL NO. of f AND g EVALUATIONS EXCEEDS LIMIT'
     nfev: 23200
      nit: 2
     njev: 4
   status: 1
  success: False
        x: array([ 3.4653446 , -3.79404652, -6.210072  , ...,  0.14014946,
        0.09939629,  0.07506104])

RuntimeError: optimization failed:
      fun: 7019.902270852954
 hess_inv: <5799x5799 LbfgsInvHessProduct with dtype=float64>
      jac: array([-8.62919474, 43.37616704, 57.84804642, ..., -1.26537998,
       -0.89648893, -0.6778464 ])
  message: 'STOP: TOTAL NO. of f AND g EVALUATIONS EXCEEDS LIMIT'
     nfev: 29000
      nit: 1
     njev: 5
   status: 1
  success: False
        x: array([ 0.52788379, -0.17455746, -0.63922915, ...,  0.00212278,
        0.00185298,  0.00138465])

In [6]:
poly_abs._params.shape

(5799,)