<a id="top"></a>
# Randomly sample parameters from a range and generate script

This notebook gives a documented methodology for generating scripts to automatically run MECSim over a range of parameters. For cases where the user wishes to examine more than 2 parameters at once then this notebook must be used. The generated script will compare the simulated current response to experimental data and run Bayesian analysis to determine optimal parameters and robust error bars. 

For simulations that require only 2 varying parameters where a 2D plot is required using regular spacing, use `GenerateScript.ipynb`.

To run this notebook you must have [MECSim](http://www.garethkennedy.net/MECSim.html) installed, and have a correctly set-up `Master.sk` skeleton file in the same directory as the MECSim executable. **Ideally use the docker version where this is taken care of automatically.**

For video tutorials and user guides see the [MECSim documentation](http://www.garethkennedy.net/MECSimDocs.html).


## Contents
#### Nominal usage
- <p><a href="#ref_guide">Getting started</a></p>
Quick practical guide to running the `RandomlySampleRange` notebook, where key files go and the overall workflow.
- <p><a href="#ref_paras">Parameter set up</a></p>
Set the names and ranges of all varying parameters that have been set up in `Master.sk`.
- <p><a href="#ref_postproc">Post-processing parameters</a></p>
Choose between simple least-squares comparison of MECSim output to experimental results or Bayesian analysis.


#### Advanced usage
- <p><a href="#ref_weights">Set weights for comparison</a></p>
Set here an array of weights used in results comparison. Useful for focusing on specific frequency ranges and/or ignoring others.
- <p><a href="#ref_file_locations">Set location of the Python files</a></p>
This section does not need to be changed in normal operation.
- <p><a href="#ref_settings">Output settings file</a></p>
Set bandwidth, number of harmonics and whether to use the weights set above.
- <p><a href="#ref_prepscript">Prepare script file</a></p>
- <p><a href="#ref_genscript">Generate script and parameter loop</a></p>


<a id="ref_guide"></a>
## Getting started


For simulating a range of parameters and comparing the results to experiemental results there are a number of files that must be manually edited.

1. The chemical mechanism must be setup using `Master.sk` located in the **`script`** directory
2. The parameters must be setup using either `GenerateScript.ipynb` or `RandomlySampleRange.ipynb` notebooks in the **`python`** directory
3. The notebook (or equivalent py file) must be run to generate the script `run_mecsim_script.sh` and `Settings.inp` text file in the **`script`** directory
4. The script must be run from the base `./` directory or using `run_docker_win_cmd_script.bat` for the docker running in Windows (default)

After running the notebooks (step 3) the directories contain the following files. Python and Jupyter notebooks in the **`python_dir`** (default is `./python/`) should include:
1. `GenerateScript.ipynb` (manually edited)
2. `RandomlySampleRange.ipynb` (manually edited)
3. `HarmonicSplitter.py`
4. `CompareSmoothed.py`
5. `BayesianAnalysis.py` (optional)
6. `SurfacePlotter.py` (optional)
The script directory **`script_dir`** (default is `./script/`) will include two additional files made by the notebooks, so should now include:
1. `Master.sk`
2. `run_mecsim_script.sh`
3. `Settings.inp`

The structure of how MECSim is used with the analytics codes is shown below. Green indicates points where the user must either edit notebooks and text files or run the docker commands. 

![title](MECSim_ScriptingChart.PNG)


**NOTE: the skeleton input parameter file for MECSim, `Master.sk`, must be setup before running the script output from this python script.**


### Load required libraries


In [None]:
import numpy as np

<a id="ref_paras"></a>
## Parameters

Back to <a href="#top">top</a>.

### Set names and ranges of variables

Name mappings and ranges for each variable. These are used with the master template (skeleton file) called ``Master.sk`` to replace the strings labels like "$name" with the required values. The bash script generated by this code will run MECSim using each set of parameters written to `Master.inp`.

For the user, ensure that the x1, x2... names are correct both here and in `Master.sk`, as well as the ranges.

Rather than a sweep which requires a ``del_x`` the random sample requires the total number of simulations to run.

In [None]:
# total number of simulations to run
n_simulations = 100

In [None]:
# setup parameters
x_para_name = []
x_para_min = []
x_para_max = []
x_para_log = []

#### copy this for as many parameters as required ####
x_para_name.append('$kzero')
x_para_min.append(0.1e-3)
x_para_max.append(2.0e-2)
x_para_log.append(False)
#### end of code chunk to copy

#### copy this for as many parameters as required ####
x_para_name.append('$Ezero')
x_para_min.append(-0.15)
x_para_max.append(0.15)
x_para_log.append(False)
#### end of code chunk to copy


### Set fixed parameters

Often the user would like some fixed parameters in the skeleton file (`Master.sk`). For example where the resistance is known in a given experiment or you wish to hold a particular parameter as constant while investigating the effect of other parameters.

In [None]:
# setup fixed parameters
x_fixed_name = []
x_fixed_value = []

#### copy this for as many fixed parameters as required ####
x_fixed_name.append('$R')
x_fixed_value.append(100)
#### end of code chunk to copy

#### copy this for as many fixed parameters as required ####
x_fixed_name.append('$alpha')
x_fixed_value.append(0.5)
#### end of code chunk to copy

#### copy this for as many fixed parameters as required ####
x_fixed_name.append('$cap0')
x_fixed_value.append(1.0e-6)
#### end of code chunk to copy

### Experiment parameters

Set the name of the file containing the experimental data to compare with the MECSim simulation current responses.

The raw experimental data file must have the same format as MECSim output, which is based on the output from a potentiostat. An example file is given in `scripts/MECSim_Example.txt`.

Back to <a href="#top">top</a>.

In [None]:
Experimental_filename = 'MECSim_Example.txt'

<a id="ref_postproc"></a>
## Post-processing parameters

Select automatic post processing to be done on the `results.txt` output file. Will be added to the end of the bash script made by this code

Two options:

1. output a surface plot using the raw sum of square values in the results file
2. use Bayesian analysis on the results file automatically 

All figures and statistics are sent to output directory.

Back to <a href="#top">top</a>.

In [None]:
# select whether surface plotter to be used
plot_surfaces = False
# select whether Bayesian statistical analysis to be used
bayesian_analysis = True
# have you run a script generating results before? do you want this run to append the results to the existing ones?
results_exists = False
# check if Master.inp file is valid and stop loop if it is invalid
check_master = True

## Script output levels

Book keeping for how often the script should output over the entire time of the loop (*output_frequency*) and how much output is stored during the loop.

Also control whether the best fit parameters are output to `output/Master.inp`.

In [None]:
# number of outputs during script run (e.g. output 10 times over the n_sim)
output_freqency=10
# output warnings from MECSim to screen during script run (e.g. Thermodynamically superfluous reactions)
show_warnings = True
# automatically stop the script running if MECSim encounters an error (e.g. R<0 or concentration errors)
exit_on_error = True
# store the log files from each simulation
store_log_files = False
# output best fit Master.inp from posterior (if Bayesian is done)
output_best_fit = True

In [None]:
# ensure that output frequency is <= total number of simulations to run
output_freqency = min(output_freqency, n_simulations)

## Operating system - file format

Output text files in windows use different end of line characters to Mac/Unix systems. This will correct for these differences so that output text files are able to be read by Windows software such as DigiPot. If unsure leave as *True*.

In [None]:
using_windows = True

<a id="ref_weights"></a>
### Sum of squares comparison function

For each harmonic the relative sum of squares difference is calculated by
$$
S_j = \frac{ \sum_k^n \left( i^{exp}_k - i^{sim}_k \right)^2 }{ \sum_k^n \left( i^{exp}_k \right)^2 }
$$
where $n$ is the total number of current ($i$), time and voltage points in the smoothed experimental ($exp$) and simiulated ($sim$) data for a given harmonic. Note that "harmonics" refer to the harmonics of a particular ac signal (e.g. 5, 10, 15, 20 Hz etc) as well as the dc ramp or "0$^{th}$ harmonic". 

The metric used in `CompareSmoothed.py` is to take the smoothed harmonics created by `HarmonicSplitter.py` for both the experimental and simulated data (a function of parameters run by MECSim), calculate the relative sum of squares difference for each harmonic ($S_j$) and combine them to a single metric $S_m$ via
$$
S_m = \sum_{j=0}^{n_{harm}} w_j S_j
$$
where $n_{harm}$ is the number of harmonics, $j=0$ is the dc component, $w_j$ is the weight given to harmonic $j$ specified in the `Settings.inp` file. The weights ($w_j$) for each harmonic (and dc component) are set as a vector of any sum, or left as the unweighted default of $w_j = 1$.

#### Options for comparison

The relative sums of squares are combined using the weights to calculate the final metric denoted as $S_m$ above if *output_single_metric = True* in either `GenerateScript.ipynb` or `RandomlySampleRange.ipynb`. If *output_single_metric = False* then the sum of squares values ($S_j$) are output as a comma separated string.



#### Set weights

Select whether you wish to use custom weights by *output_single_metric = True*, otherwise all $S_j$ will be output and separated by commas.

Alter the following cell for custom weights. Ensure that this is the same length as the number of harmonics (plus dc)

**Note: for purely dc cases set *number_harmonics = 0***

Back to <a href="#top">top</a>.

In [None]:
# use custom weights? default is no and use uniform weights for the comparison
output_single_metric = False

# number of harmonics (excluding the dc component - harmonic = 0)
number_harmonics = 6
frequency_bandwidth = 1.    # only 1 needed for simulations

# set weights as a numpy array with same length as number_harmonics + 1 (default is all even)
weights = np.repeat(1.0, number_harmonics+1)

# sample way to change it: can also use functions
#weights = np.array([0.5,1,1,0.5,0.3,0.1,0.05])

## Experienced users only from hereon

### Set location of the python files

Location is relative to the script that will be running. This script is required to be in the same location as the MECSim.exe file - as in the Docker image setup. 

**Note: these generally don't need to be changed and should not be changed!**

Back to <a href="#top">top</a>.

In [None]:
# python dir contains all .py files
python_dir = 'python/'
# script dir contains script.sh (output here), Master.sk (user prepared) and Settings.inp (output here)
script_dir = 'script/'
# Master.inp dir - important for docker container vs local run
input_dir = 'input/'
# output dir for results
output_dir = 'output/'

# default is running in notebook:
# if running in docker then the container structure requires this be external/
external_dir = 'external/'
# location of parent directory: typically this file will be in python/ so the parent dir is '../'
parent_dir = '../'

### File names

Manually change the name of the results file, default is `results.txt`. Also set if this file already exists, for example if a previous script was run on a different region of parameter space.

Can also change the default script name from `run_mecsim_script.sh`, although this is not recommended.

In [None]:
# results filename
results_name = 'results.txt'
# change default script name here (not recommended)
script_name = 'run_mecsim_script.sh'

### Simulation file settings

Change the default name of the simulation output file (not recommended)

Back to <a href="#top">top</a>.

In [None]:
Simulation_output_filename = 'MECSimOutput_Pot.txt'

### Comparison file FFT settings

Set whether the experimental data file set above needs to have a Fast Fourier Transform (FFT) applied to it (default is True), and if so what filename to use.


In [None]:
Experimental_FFT_output_filename = 'ExpSmoothed.txt'
needs_fft_conversion = True

Check that the number of weights for the number of harmonics is correct

In [None]:
if(number_harmonics+1 != len(weights)):
    print("WARNING: Found", len(weights), "weights when there are", 
          number_harmonics+1, "harmonics (including dc)")
    if(number_harmonics+1<len(weights)):
        print("         Too many weights entered - clipping")
        weights = weights[:number_harmonics+1]
    else:
        print("         Not enough weights entered - filling with zeros")
        weights.resize(number_harmonics+1)

Convert np array to csv string (in case not python reading it in!)

In [None]:
txt_weights = ','.join(map(str, weights))

### Script method

In [None]:
method_type = 'random'

<a id="ref_settings"></a>
## Output settings file

This file encodes the simulation output filename (shouldn't change), number of harmonics, bandwidth frequency and the settings for the weights.

Back to <a href="#top">top</a>.

In [None]:
f = open(parent_dir+script_dir+'Settings.inp', 'w')
f.write(Simulation_output_filename + "\t# simulation output filename\n")
f.write(str(number_harmonics) + "\t# number of harmonics\n")
f.write(str(frequency_bandwidth) + "\t# bandwidth frequency (Hz)\n")
iUseSingleMetric = 0
if(output_single_metric or output_single_metric==1):
    iUseSingleMetric = 1
f.write(str(iUseSingleMetric) + "\t# 1=use single output metric value (else=0 and each harmonic treated separately\n")
f.write(txt_weights)
f.close()

<a id="ref_prepscript"></a>
## Prepare script file

Depending on the method type selected then output a text file in bash script format for running MECSim with the analysis tools.

First set any by hand parameters. For example if you have a constant e0val=0.2 but want to keep the skeleton file general with $e0val in there. 

Note that you'll need to be careful to integrate these into the script generation yourself.

Back to <a href="#top">top</a>.

<a id="ref_genscript"></a>
## Generate script and parameter loop

### Random sampling method

First apply `HarmonicSplitter.py` to split the experimental data into harmonics (dc and ac) then smooth the resultant current responses.

Setup a loop for the number of samples requested, each time doing:
1. Use `ReturnRandomExpFormat.py` for each input parameter with associated range
2. Take the randomly generated input parameters and pass them to the **MECSim** executable
3. Use `HarmonicSplitter.py` to split and smooth the harmonics
4. Use `CompareSmoothed.py` to compare the smoothed harmonics between the simulated current response and the experimental data.
5. Comparison is calculated as a metric (default uses the sum of squares form each harmonic) which is either a composite of all harmonic data (using weights) or a list of values, one for each harmonic
6. Append the x1, x2, ... and metric (S) values to a single file ("results_name" set above)

After the loop:
1. All input parameters (x1, x2...) as well as the sum of squared metric (S, either 1 or more values) is now stored in "results_name" (typically `results.txt`)
2. `BayesianAnalysis.py` is run on this results file to determine the probabilities of each set of parameters given that the true fit to the experimental data exists somewhere in the chosen parameter range.
3. `BayesianAnalysis.py` will output the posterior probabilities (`posterior.txt`) as well as the optimal values with error bars (`opt_parameters.txt`) and plots (png and pdf) depending on the settings in `BayesianAnalysis.py`.
4. Additional plots of the sum of squares surfaces themselves are output by `SurfacePlotter.py` if requested.

**This script should be renamed to `run_mecsim_script.sh` and copied to `script/` along with `Settings.inp` created above.**

Back to <a href="#top">top</a>.

In [None]:
if(method_type=='random'):
    quote_symbol = '"'
    print('Using random sampling method to write to: ' + parent_dir+script_dir+script_name)
    with open(parent_dir+script_dir+script_name, "w") as text_file:
        text_file.write("#!/bin/bash\n")
        # initialize a timer
        text_file.write("start=$(date +%s)\n")
        # setup skeleton file with fixed parameters filled
        text_file.write("cp {0}Master.sk {0}Master_with_fixed.sk\n".format(script_dir))
        for i in range(len(x_fixed_name)):
            text_file.write("sed -i 's/{0}/'{1}'/g' {2}Master_with_fixed.sk\n"
                            .format(x_fixed_name[i], x_fixed_value[i], script_dir))
            text_file.write("cp -p {0}Master_with_fixed.sk {1}{0}\n"
                            .format(script_dir, external_dir))
        # process harmonics for experimental data - if requested
        if(needs_fft_conversion):
            text_file.write("cp {0}{1} {2}{3}\n".format(script_dir, Experimental_filename, 
                                                        output_dir, Simulation_output_filename))
            text_file.write("python {0}HarmonicSplitter.py\n".format(python_dir))
            text_file.write("mv {0}Smoothed.txt {0}{1} \n".format(output_dir, 
                                 Experimental_FFT_output_filename))
            # output total run time to here - complicated write to get awk statement in the bash script correct
            text_file.write("end=$(date +%s)\n")
            text_file.write("seconds=$((end-start))\n")
            text_line1 = "awk -v t=$seconds 'BEGIN{t=int(t*1000)"
            time_text = "%d:%02d:%02d {0}, t/3600000, t/60000%60, t/1000%60".format(quote_symbol)
            text_line2 = "; printf {0}Time taken converting experimental data: {1}{2}'".format(quote_symbol, 
                                                                                               time_text, '}')
            text_file.write("{0}{1}; echo\n".format(text_line1, text_line2))
        # setup parameter ranges
        text_file.write("i=0\n")
        text_file.write("imax={0}\n".format(n_simulations))
        text_file.write("counter_output=$((imax/{0}))\n".format(output_freqency))
        # output summary
        text_file.write("echo \n")
        text_file.write("echo 'Random sample run with n_sim = {0}'\n".format(n_simulations))
        text_file.write("echo \n")
        # write header for results output file - else will append
        if(not results_exists):
            paraNames = ','.join(x_para_name)
            text_file.write("echo '{0},S' > {1}{2}\n".format(paraNames, output_dir, results_name))
        # construct loop over parameters
        text_file.write("while [ $i -le $imax ]\n")
        text_file.write("do\n")
        text_file.write("  i=$((i+1))\n")
        text_file.write("  cp {0}Master.sk {1}Master.inp\n".format(script_dir, input_dir))
        # variable parameters
        for i in range(len(x_para_name)):
            if(x_para_log[i]):
                x_para_text = 'x=$(python ' + python_dir + 'ReturnRandomExpFormat.py ' + str(x_para_min[i]) + ' ' + str(x_para_max[i]) + ' True)'
            else:
                x_para_text = 'x=$(python ' + python_dir + 'ReturnRandomExpFormat.py ' + str(x_para_min[i]) + ' ' + str(x_para_max[i]) + ' False)'
            text_file.write("  {0}\n".format(x_para_text))
            text_file.write("  sed -i 's/{0}/'$x'/g' {1}Master.inp\n".format(x_para_name[i], input_dir))
            # store parameter text
            if(i==0):
                text_file.write("  paraString=$x\n")
            else:
                text_file.write("  paraString=$paraString,$x\n")
        # fixed parameters (preferentially fill variables before fixed)
        for i in range(len(x_fixed_name)):
            text_file.write("  sed -i 's/{0}/'{1}'/g' {2}Master.inp\n".format(x_fixed_name[i], 
                                                                              x_fixed_value[i], input_dir))
        # check the first Master.inp made for constistency
        if(check_master):
            text_file.write("  if ((i==1)); then\n")
            text_file.write("    python {0}InputChecker.py > checker.txt\n".format(python_dir))
            text_file.write("    if [ $(grep 'INPUT VALID False' checker.txt | wc -l) -ne 0 ]; then\n")
            text_file.write("      cat checker.txt\n")
            text_file.write("      cp -p checker.txt {0}Master.inp {1}{2}\n"
                            .format(input_dir, external_dir, output_dir))
            text_file.write("      exit\n    fi\n  fi\n".format(output_dir))
        text_file.write("  ./MECSim.exe 2>errors.txt\n")
        if(show_warnings):
            text_file.write("  [ $(grep 'WARNING' {0}log.txt | wc -l) != '0' ] && grep 'WARNING' {0}log.txt\n".format(output_dir))
        if(exit_on_error):
            text_file.write("  if [ $(grep 'ERROR' {0}/log.txt | wc -l) -ne 0 ]; then\n".format(output_dir))
            text_file.write("    grep 'ERROR' {0}log.txt\n".format(output_dir))
            text_file.write("    exit\n  fi\n".format(output_dir))
        if(store_log_files):
            text_file.write("  mv {0}log.txt {0}log_$i.txt\n".format(output_dir))
        text_file.write("  python {0}HarmonicSplitter.py\n".format(python_dir))
        text_file.write("  z=$(python {0}CompareSmoothed.py)\n".format(python_dir))
        text_file.write("  echo $paraString,$z >> {0}{1}\n".format(output_dir, results_name))
        # continuously copy out results - internal container's output directory is empty 
        text_file.write("  cp -p {0}Master.inp {1}\n".format(input_dir, output_dir))
        text_file.write("  cp -p {0}* {1}{0}\n".format(output_dir, external_dir))
        # check if time write is required
        text_file.write("  if ((i%counter_output==0)); then\n")
        text_file.write("    end=$(date +%s)\n")
        text_file.write("    seconds=$((end-start))\n")
        text_file.write("    echo 'Completed:' $((100*i/imax))'%' \n") # no \n as want time taken on line
        text_line1 = "awk -v t=$seconds 'BEGIN{t=int(t*1000)"
        time_text = "%d:%02d:%02d {0}, t/3600000, t/60000%60, t/1000%60".format(quote_symbol)
        text_line2 = "; printf {0}Time taken: {1}{2}'".format(quote_symbol, time_text, '}')
        text_file.write("    {0}{1}; echo\n".format(text_line1, text_line2))
        text_file.write("  fi\n")
        text_file.write("done\n")
        text_file.write("\n")
        text_file.write("echo \n")
        # convert to dos if using a windows
        if(using_windows):
            # convert text files in output directory in docker
            text_file.write("unix2dos {0}*.txt\n".format(output_dir))
            # copy out the now converted files
            text_file.write("cp -p {0}* {1}{0}\n".format(output_dir, external_dir))
        # run Bayesian analysis (*.txt) and plotter (produces bayesian_plot.pdf and png) with results_name
        if(bayesian_analysis):
            text_file.write("python {0}BayesianAnalysis.py {1} {2}{3} {4}{5}\n".format(python_dir, 
                                 results_name, external_dir, output_dir, external_dir, script_dir))
            # given have posterior.txt file - make the best fit Master.inp 
            if(output_best_fit):
                text_file.write("python {0}OutputBestFit.py\n".format(python_dir))
        # run sum of squares plotter (also uses results file name as argument)
        if(plot_surfaces):
            text_file.write("python {0}SurfacePlotter.py {1} {2}{3} {4}{5}\n".format(python_dir, 
                                 results_name, external_dir, output_dir, external_dir, script_dir))
        # copy out the results from analysis scripts
        text_file.write("cp -p {0}* {1}{0}\n".format(output_dir, external_dir))
        # output final run time
        text_file.write("end=$(date +%s)\n")
        text_file.write("seconds=$((end-start))\n")
        text_line1 = "awk -v t=$seconds 'BEGIN{t=int(t*1000)"
        time_text = "%d:%02d:%02d {0}, t/3600000, t/60000%60, t/1000%60".format(quote_symbol)
        text_line2 = "; printf {0}Time taken: {1}{2}'".format(quote_symbol, time_text, '}')
        text_file.write("{0}{1}; echo\n".format(text_line1, text_line2))
        text_file.write("\n")

### Output total number of simulations

In [None]:
print("Total number of simulations to be run: " + str(n_simulations))