# An Overview of IMPALA Workflow and Options
In this document, we provide a non-exhaustive list of the functionality and user options for IMPALA. Generally, IMPALA provides a codebase for calibrating computer model outputs/simulations to observed data. 

Estimation uses Bayesian Markov Chain Monte Carlo, which has been implemented using a sophisticated sampling method called parallel tempering. The parallel tempering sampling allows IMPALA to navigate very complicated posterior surfaces (see Friedman example), including surfaces with multiple local modes and/or non-identified parameters. 

IMPALA calibration analyses generally proceed through several basic steps:

### Specify Simulator

1. If the computer model you want to calibrate is fast to run, define a function f(theta; X) that takes calibration parameter theta as an input and spits out the computer model output

2. If the computer model is slow to run, generate a library of simulator runs and fit an emulator, which takes theta as an input and spits out an approximation of the true computer model output

3. IMPALA has several material strength models already defined (e.g., Preston-Tonks-Wallace and Johnson-Cook). Check the impala/physics folder for pre-defined material strength models already available in IMPALA. 

### Initialize Impala Model

1. **ModelMaterialStrength**:  class for pre-defined IMPALA material strength models

2. **ModelBassPca_func** or **ModelBassPca_mult**: classes for functional or multivariate emulators fit using the pyBASS library
    
3. **ModelF**: class for user-defined functions f(x), which can also be used to implement IMPALA with non-BASS emulators

### Prepare the Fit

1. **CalibSetup**: initializes an IMPALA calibration object. This is also where you specify parameter bounds and any constraints

2. **addVecExperiments**: define the observed data, corresponding computer model, discrepancy basis (if any), and several noise model prior hyperparameters. Multiple addVecExperiments calls can be used to add different experiments, possibly with different corresponding computer models. Some inputs include:
    * *yobs*: a vector (numpy array) of observed data
    * *model*: an IMPALA model object as defined above. See code documentation for details. 
    * *sd_est*: a list or numpy array of initial values for observation noise standard deviation
    * *s2_df*: a list or numpy array of initial values for s2 Inverse Gamma prior degrees of freedom
    * *s2_ind*: a list or numpy array of indices for s2 value associated with each element of yobs
    * *meas_error_cor*: (optional) correlation matrix for observation measurement errors, default = independent 
    * *D*: (optional) numpy array containing basis functions for discrepancy, possibly including intercept.
    * *discrep_tau*: (optional) fixed prior variance for discrepancy basis coefficients (discrepancy = D @ discrep_vars, discrep_vars ~ N(0,discrep_tau))

3. **setTemperatureLadder**: define how the parallel tempering should be implemented, requiring users to specify an array of exponents that will be applied to the data likelihood. An example specifcation is np.array(1.05 ** np.linspace(0,49,50)), which assigns a grid of 50 temperatures. Generally, more temperatures or a finer grid of temperatures are associated with longer runtime but may also be associated with better movement around the posterior surface for complicated posteriors. 

4. **setMCMC**: define how many MCMC iterations to use for the sampler. Most users can leave these settings at default values, with the expection of nmcmc (the number of iterations), which must be specified. 

5. **setHierPriors**: (optional) define hyperparameteters associated with the hierarchical and clustering calibrations. These generally control the amount of shrinkage toward a common theta across experiments. Please refer to the code documnetation for details. 

6. **setClusterPrior**: (optional) define hyperparameteters associated with the clustering calibration, including the maximum number of clusters (nclustmax) and the rate and shape associated with the Gamma prior on the Dirichlet process concentration parameter, eta.  

### Run MCMC

1. **calibPool**: pooled calibration

2. **calibHier**: hierarchical calibration

3. **calibClust**: clustered calibration

### Evaluate Convergence

1. Look at trace plots, e.g., using **parameter_trace_plot** function

2. Look at pairs plots, e.g., using **pair**s function

3. There are many more convergence diagnostics you can explore. In the future, additional convergence evaluation functions will be added to the IMPALA repo. 

### Evaluate Model Fit

1. Posterior predictions can be compared with the training data to evaluate goodness of fit of the calibrated computer model. See the examples elsewhere in the IMPALA repo for code. 

# Summary
The following figure summarizes the usual IMPALA fitting workflow: 

![something](./images/Impala_Diagram.png)