# Parameter Sweeps

Our parameter sweeps were executed on a high performance computing cluster and are not well suited for reproduction on a personal computer. Nevertheless, we have provided scripts to directly reproduce the simulations themselves, but please note that *executing these scripts would take a very long time to complete.* Additionally, the outputs from each of our parameter sweeps are provided in `data/simulations/parameter_sweeps`. In the sections below, we provide an overview of how our parameter sweeps were constructed and executed. We then walk the user through accessing and visualizing the output from each of our completed data sets.

## Running a New Parameter Sweep

### Constructing a parameter sweep

To see how our parameter sweeps are constructed or to attempt to run one on your own computing cluster, interested users may refer to the `build_sweep.py` script in the `promoters/scripts` directory. This script serves as the basis for constructing each of our parameter sweeps. For example, a sweep of the linear model would be generated by executing `python build_sweep.py -m linear -N 5000 -n 2500` at the command line. The `-m` flag denotes the model of interest, e.g. `linear`, `twostate`, or `hill`. The `-N` flag denotes the number of simulated trajectories per parameter set, and the `-n` flag denotes the number of unique parameter sets. Many more optional arguments are provided via the `promoters.execution.arguments` module. Upon execution, the `build_sweep.py` script creates a "sweep directory" containing all of the contents necessary to execute the parameter sweep. The sweep directory is named using a combination of the model type and the current date and time, resulting in something similiar to `MODEL_YYMMDD_hhmmss`. For simplicity we have renamed the sweep directories in our completed results.

### Sweep directory file structure

A schematic of the sweep directory file structure is provided below. The top level contains a serialized `promoters.sweeps.Sweep` instance under the filename `./batch.pkl`. This instance serves as the focal point for constructing and executing the parameter sweep, then aggregating its results. The sweep directory also contains a `./simulations/` subdirectory containing a serialized `ConditionSimulation` instance for each of the $n$ parameter sets to be simulated. Upon submission to the computing cluster, these simulation instances will be deseralized and executed in parallel. The `./batches/` subdirectory contains text files used to manage the paralellization of job submissions, and the `./log/` subdirectory provides log files.

### Running a parameter sweep

The sweep directory contains a `./scripts` subdirectory that includes a pair of shell scripts. We used the `./scripts/submit.sh` script to submit these jobs to our computing cluster. We have also included a `./scripts/run.sh` script that will execute the entire parameter sweep on a personal computer. **Please note that executing this script will take a very long time.**

## Analyzing a Completed Parameter Sweep

### Aggregating results from a parameter sweep

The results are aggregated from each simulation using the deserialized `batch.pkl` object, which is accessed via the `Sweep.load` method. The results are then aggregated by executing `.aggregate()`, and may optionally be saved to `./data.hdf` in the top level of the sweep directory by executing `.save()`. Any saved results are loaded by default.

In [None]:
import os
from promoters.sweep.sweep import Sweep

# specify a path to manuscript data
MANUSCRIPT_DATA_PATH = "../../manuscript/data"

In [None]:
# specify a path to the parameter sweep directory
sweep_path = os.path.join(MANUSCRIPT_DATA_PATH, "simulations/parameter_sweeps/linear_partial_synthesis")

# load the sweep object
sweep = Sweep.load(sweep_path)


# --- OPTIONAL, only need to do this once --

# aggregates results from each ConditionSimulation instance
#sweep.aggregate()               

# save aggregated results to ./data.hdf
#sweep.save()                    

### Accessing simulations within a sweep

Given a `Sweep` instance, individual simulations can be accessed by simple indexing.

In [None]:
# get the first simulation
simulation = sweep[0]

### Visualizing the results from a parameter sweep

We have provided two separate means of visually summarizing the results of a parameter sweep. These methods provide the user with a high level view of individual simulation outcomes such as error frequency, protein under- and over-expression, or the change in either quantity when metabolic conditions are changed. The approaches are:

##### 1. Multidimensional Projection
Individual simulation outcomes are projected onto 2D planes spanned by paired components of the sampled parameter space. All pairs are visualized, with the projections interpolated onto a continuous surface. This representation helps depict macroscopic trends in simulation outcome as a function of each of the model's key parameters. However, the projection from a high dimensional space onto a 2D grid leads to a very noisy depiction of parameter dependence. Furthermore, this visualization strategy can be cumbersome to interpret.

To create a multidimensional projection, we can initialize the figure using the `Sweep.build_figure` method, which returns a `SweepFigure` instance. The `condition` argument defines the metabolic conditions of interest using a shorthand notation:

    * 'normal': Normal metabolism
    * 'diabetic': Reduced energy metabolism
    * 'hyper_metabolic': Elevated energy metabolism

The `mode` argument defines the outcome of interest (see the definitions in our manuscript). Options are:

    * 'threshold_error': Frequency of developmental errors 
    * 'error': Average protein under- or over-expression

The `relative` argument determines whether or not the simulation outcomes are expressed as differential values relative to normal metabolic conditions. In other words, if `condition='diabetic'` and `relative=False`, error frequencies under conditions of reduced energy metabolism are displayed directly. If `relative=True`, the difference in error frequency between normal conditions and conditions of reduced energy metabolism are shown.

We then use the `SweepFigure.render` method to visualize the multidimensional projection, subject to a number of formatting parameters.

In [None]:
import matplotlib.pyplot as plt

# construct a multidimensional projection of absolute error frequencies under normal metabolic conditions
fig = sweep.build_figure(condition='normal', mode='threshold_error', relative=False)

# render the figure with some specified formatting
fig_kwargs = dict(figsize=(10, 10))
heatmap_kwargs = dict(cmap=plt.cm.copper, bad='k', vmin=0, vmax=1, rasterized=True)
fig.render(density=100, labelsize=12, fig_kwargs=fig_kwargs, heatmap_kwargs=heatmap_kwargs, include_labels=True)

In [None]:
import matplotlib.pyplot as plt

# construct a multidimensional projection of relative error frequencies between normal and slow metabolic conditions
fig = sweep.build_figure(condition='diabetic', mode='threshold_error', relative=True)

# render the figure with some specified formatting
fig_kwargs = dict(figsize=(10, 10))
heatmap_kwargs = dict(cmap=plt.cm.seismic, bad='k', vmin=-1, vmax=1, rasterized=True)
fig.render(density=100, labelsize=12, fig_kwargs=fig_kwargs, heatmap_kwargs=heatmap_kwargs, include_labels=True)

##### 2. 1D Histogram

The distribution of individual simulation outcomes are summarized by a simple histogram. This approach conveys nothing about the underlying parameter dependence, but helps provide a quick overview of global trends.

We can compile a `SweepHistogram` instance using the `Sweep.build_histogram` method. The method accepts the same arguments as `Sweep.build_figure`, as described above. We then render the histograms by executing `SweepHistogram.render`.

In [None]:
# construct a 1-D projection of absolute error frequencies under normal metabolic conditions
fig = sweep.build_histogram(condition='normal', mode='threshold_error', relative=False)
fig.render(xlim=(0,1), cmap=plt.cm.copper, vlim=(0,1),)

In [None]:
# construct a 1-D projection of relative error frequencies between normal and slow metabolic conditions
fig = sweep.build_histogram(condition='diabetic', mode='threshold_error', relative=True)
fig.render(xlim=(-1,1), cmap=plt.cm.seismic, vlim=(-1,1))

# Completed Parameter Sweeps from Manuscript

Each of the parameter sweeps conducted in support of our manuscript are provided below.

#### Linear model in which synthesis is partially reduced
``data/simulations/parameter_sweeps/linear_partial_synthesis``

#### Two-state model in which synthesis is partially reduced
``data/simulations/parameter_sweeps/twostate_partial_synthesis``

#### Hill model in which synthesis is partially reduced
``data/simulations/parameter_sweeps/hill_partial_synthesis``

#### Linear model in which synthesis is partially reduced, and a non-zero basal input level is applied
``data/simulations/parameter_sweeps/linear_partial_synthesis_with_basal_input``

#### Linear model in which synthesis is partially reduced, and both pulse amplitude and duration scale with metabolic conditions
``data/simulations/parameter_sweeps/linear_partial_synthesis_with_input_scaling``

#### Linear model in which repression is partially reduced
``data/simulations/parameter_sweeps/linear_partial_repression``