# Modular Virus Simulation

## Required libraries

In [1]:
using Markdown
using InteractiveUtils
using Random, Distributions, StatsBase, DataFrames, Plots
using CSV, Tables

## Constants and Parameters

### Fixed Parameters and Constants
Let's start with the physical constants that we are not seeking to change:

In [2]:
# Universal physical constants and fixed data
const kt_boltzmann = 0.001987204118 * (273.15 + 37)
const ΔΔG = Normal(1.0, 1.7)

Normal{Float64}(μ=1.0, σ=1.7)

And then let's set up the gene-level architecture. In this simulation we have assigned genes to four modules. In each module we imagine a different genetic/phenotypic architecture with implications for epistasis.

| Module Name | Architecture | Type of Epistasis | Variance Penalty |
|-------------|---------------------|-------------------|-------------|
| auxiliary | independent additive contributions | arithmetic mean | none |
| assembly | stoichiometric balance | geometric mean | moderate |
| replication | linear with rate limiting steps | minimum | high |
| host interaction | jointly necessary contributions | product | maximal |

The overall fitness is calculated as an equal-weighted product of the output from these modules resulting in a variance penalty at organism level and motivated by the idea that all modules are necessary (although some auxiliary genes might be more accurately described as non-essential under optimal conditions).

Now let's define these module sizes and characteristics:

In [3]:
Base.@kwdef struct EpistasisParams
    # Module configuration
    module_sizes::Dict{Symbol, Int} = Dict(
        :auxiliary => 3, 
        :assembly => 3, 
        :replication => 3, 
        :host_interaction => 3
    )
    
    # Calculated properties
    module_names::Vector{Symbol} = collect(keys(module_sizes))
    G::Int = sum(values(module_sizes)) # number_of_genes
end

# Constructor function (required to calculate G = number of genes)
function EpistasisParams(; module_sizes = Dict(:auxiliary => 3, :assembly => 3, :replication => 3, :host_interaction => 3),
                         module_names = collect(keys(module_sizes)))
    G = sum(values(module_sizes))
    return EpistasisParams(module_sizes, module_names, G)
end

EpistasisParams

In [4]:
# Instantiate with equal module sizes (the default)
equal_epi_config = EpistasisParams()

# Custom values reflecting phiX174 (unequal module sizes)
ΦX174_epi_config = EpistasisParams(
    module_sizes = Dict(
        :auxiliary => 2, # A* (blocks super-inf), K (optimises burst size) - overlaps, non-essential
        :assembly => 3, # B (internal scaffolding), D (external scaffolding), F (major capsid) - but H involved in stoichiometry
        :replication => 2, # A (rolling circle init), J (ssDNA binding)
        :host_interaction => 4 # C (DNA packaging), E (host cell lysis), G (major spike), H (DNA pilot/minor spike)
    )
)

# Custom values to remove epistasis and make everything a product
null_epi_config = EpistasisParams(
    module_sizes = Dict(
        :auxiliary => 1, 
        :assembly => 1, 
        :replication => 1, 
        :host_interaction => 8
    )
)

EpistasisParams(Dict(:replication => 1, :assembly => 1, :host_interaction => 8, :auxiliary => 1), [:replication, :assembly, :host_interaction, :auxiliary], 11)

### Variable Parameters
And now for the parameters that we can change in the simulation:

In [5]:
Base.@kwdef struct SimulationParams
    # Control parameters
    sim_length::Int = 500
    F_init::Float64 = -5.0
    # Sweep parameters
    U_default::Poisson = Poisson(5.0)
    L_default::Float64 = 0.1
    N_default::Int = 5000
    K_default::Int = 10000
    R_default::Int = 9
end

SimulationParams

In [6]:
sim_config = SimulationParams()  # All defaults
#sim_config = SimulationParams(L_default = 0.05, N_default = 2000)  # Custom values

SimulationParams(500, -5.0, Poisson{Float64}(λ=5.0), 0.1, 5000, 10000, 9)

## Defining the Virus

We represent the virus as a Julia Struct. It contains 2 dictionaries (where the keys are module names and the values are lists, one for each module). These dictionaries are for mutation counts (μ_counts) and for ΔG values (ΔG_values). We also see an atomic entry for fitness.

In [7]:
mutable struct ModularVirus
    μ_counts::Dict{Symbol, Vector{Int64}}
    ΔG_values::Dict{Symbol, Vector{Float64}}
    fitness::Float64
end

We have this convenience function for initializing a new modular virus and we preload our dictionaries with zero entries for mutation numbers and the initial value for free energy, while setting fitness to (nearly) 1:

In [8]:
# constructor for clean initialization
function ModularVirus(sim_config::SimulationParams, epi_config::EpistasisParams, start_fitness::Float64)
    μ_counts = Dict(name => zeros(Int64, epi_config.module_sizes[name]) for name in epi_config.module_names)
    ΔG_values = Dict(name => fill(sim_config.F_init, epi_config.module_sizes[name]) for name in epi_config.module_names)
    return ModularVirus(μ_counts, ΔG_values, start_fitness)
end

ModularVirus

### Testing
Create a test virus:

In [9]:
Quentin = ModularVirus(sim_config, equal_epi_config, 1.0)

ModularVirus(Dict(:replication => [0, 0, 0], :assembly => [0, 0, 0], :host_interaction => [0, 0, 0], :auxiliary => [0, 0, 0]), Dict(:replication => [-5.0, -5.0, -5.0], :assembly => [-5.0, -5.0, -5.0], :host_interaction => [-5.0, -5.0, -5.0], :auxiliary => [-5.0, -5.0, -5.0]), 1.0)

Force add a mutation:

In [10]:
for name in equal_epi_config.module_names
    Quentin.μ_counts[name][1] = 1
    Quentin.ΔG_values[name][1] = -0.4
end
Quentin

ModularVirus(Dict(:replication => [1, 0, 0], :assembly => [1, 0, 0], :host_interaction => [1, 0, 0], :auxiliary => [1, 0, 0]), Dict(:replication => [-0.4, -5.0, -5.0], :assembly => [-0.4, -5.0, -5.0], :host_interaction => [-0.4, -5.0, -5.0], :auxiliary => [-0.4, -5.0, -5.0]), 1.0)

## Epistasis and Fitness Calculations

Here we have functions for instantiating epistasis in different ways for each module:

In [11]:
function auxiliary_epistasis(auxiliary_ΔGs::Vector{Float64})
    # Additive contributions
    stabilities = [1 / (1 + ℯ^(ΔG/kt_boltzmann)) for ΔG in auxiliary_ΔGs]
    # Arithmetic mean benefits from positives
    return mean(stabilities)
end

function assembly_epistasis(assembly_ΔGs::Vector{Float64})
    # Stoichiometric balance: variance in stabilities hurts fitness
    stabilities = [1 / (1 + ℯ^(ΔG/kt_boltzmann)) for ΔG in assembly_ΔGs]
    # Geometric mean penalizes imbalance
    return prod(stabilities)^(1/length(stabilities))
end

function replication_epistasis(replication_ΔGs::Vector{Float64})
    # Rate-limiting: weakest link dominates
    stabilities = [1 / (1 + ℯ^(ΔG/kt_boltzmann)) for ΔG in replication_ΔGs]
    return minimum(stabilities)
end

function host_module_epistasis(host_ΔGs::Vector{Float64})
    # Independent effects but negatives propagate
    stabilities = [1 / (1 + ℯ^(ΔG/kt_boltzmann)) for ΔG in host_ΔGs]
    return prod(stabilities)
end

host_module_epistasis (generic function with 1 method)

Now we can generate a struct-modifying update fitness function:

In [12]:
function update_fitness!(virus::ModularVirus)
    auxiliary_contrib = auxiliary_epistasis(virus.ΔG_values[:auxiliary])
    assembly_contrib = assembly_epistasis(virus.ΔG_values[:assembly])
    replication_contrib = replication_epistasis(virus.ΔG_values[:replication])
    host_contrib = host_module_epistasis(virus.ΔG_values[:host_interaction])
    
    virus.fitness = auxiliary_contrib * assembly_contrib * replication_contrib * host_contrib
end

update_fitness! (generic function with 1 method)

### Testing
Try it and please do this carefully if zeroing out any module - the effects could be exotic:

In [13]:
println("Auxiliary: ", auxiliary_epistasis(Quentin.ΔG_values[:auxiliary]))
println("Assembly: ", assembly_epistasis(Quentin.ΔG_values[:assembly]))
println("Replication: ", replication_epistasis(Quentin.ΔG_values[:replication]))
println("Host Interaction: ", host_module_epistasis(Quentin.ΔG_values[:host_interaction]))

Auxiliary: 0.8853953603063478
Assembly: 0.8690692667366479
Replication: 0.6567854267448675
Host Interaction: 0.6563918441227488


In [14]:
update_fitness!(Quentin)

0.33172508788624777

In [15]:
Quentin.fitness

0.33172508788624777

## Making New Mutations

Enumerate and apply new mutations to a virus:

In [16]:
function mutate!(virus::ModularVirus, sim_config::SimulationParams, epi_config::EpistasisParams) 
    number_of_mutations = only(rand(sim_config.U_default, 1))
    ΔΔG_values = rand(ΔΔG, number_of_mutations)  # note that ΔΔG is a global
    
    # We choose modules proportionally to their size **and ordered by** module_names vector
    # care must be taken as keys(module_sizes) != module_names
    module_probs = Weights([epi_config.module_sizes[name] for name in epi_config.module_names] ./ epi_config.G)
    
    for i in 1:number_of_mutations
        # Select module based on proportional targeting
        module_choice = sample(epi_config.module_names, module_probs)
        
        # Select gene within module
        module_size = epi_config.module_sizes[module_choice]
        gene_index = rand(1:module_size)
        
        # Apply mutation to appropriate dictionary [and module therein][and gene]
        virus.μ_counts[module_choice][gene_index] += 1
        virus.ΔG_values[module_choice][gene_index] += ΔΔG_values[i]
        
        # Lethal mutation check
        if rand() < sim_config.L_default
            virus.fitness = 0
            return virus.fitness
        end
    end
    
    # Update fitness if virus is still viable
    (virus.fitness > 0) && (update_fitness!(virus))
    
    return virus.fitness
end

mutate! (generic function with 1 method)

## Reproduction

Here is a reproduction function:

In [17]:
# Creates new offspring virus by copying parent + its mutations
function reproduce(parent::ModularVirus, sim_config::SimulationParams, epi_config::EpistasisParams)
    sprog = deepcopy(parent)
    mutate!(sprog, sim_config, epi_config)
    return sprog
end

reproduce (generic function with 1 method)

## Creating a Population

Here is a convenience function to make a new population of viruses:

In [18]:
# Creates initial virus population
function initialize_population(sim_config::SimulationParams, epi_config::EpistasisParams)
    start_fitness = (1 / (1 + ℯ^(sim_config.F_init/kt_boltzmann)))^epi_config.G
    return [ModularVirus(sim_config, epi_config, start_fitness) for _ in 1:sim_config.N_default]
end

initialize_population (generic function with 1 method)

## Helper Functions

Here are some key helper functions that facilitate selection (get weights) or the selection of an integer number of offspring (probabilistic round):

In [19]:
# Used to implement fitness-proportional selection
function get_weights(populace)
    fitness_values = [v.fitness for v in populace]
    total = sum(fitness_values)
    fitness_values ./= total  # in-place division to avoid new array
    return Weights(fitness_values)
end

# Round a floating-point number to an integer
function probabilistic_round(number)
    base = floor(Int, number)
    return base + (rand() < number - base)
    # bracketed boolean expression evaluates to 0 or 1
end

# Helper function to get all ΔG values from a ModularVirus
function get_all_ΔG(virus::ModularVirus, epi_config::EpistasisParams)
    all_ΔG = Float64[]
    for module_name in epi_config.module_names
        append!(all_ΔG, virus.ΔG_values[module_name])
    end
    return all_ΔG
end

# Helper function to get all mutation counts from a ModularVirus  
function get_all_mutations(virus::ModularVirus, epi_config::EpistasisParams)
    return sum(sum(virus.μ_counts[module_name]) for module_name in epi_config.module_names)
end

# Helper function to get module-specific mean statistics
function get_module_stats(populace, module_name::Symbol)
    mut_counts = [sum(v.μ_counts[module_name]) for v in populace]
    ΔG_values = [mean(v.ΔG_values[module_name]) for v in populace]
    return mean(mut_counts), mean(ΔG_values)
end

get_module_stats (generic function with 1 method)

## Report and Plotting Functions

### Report Initialization and Filling Functions

Please note that the module names are hard-coded into these functions. Should the user choose an `epi_config` with different module names this function would need to be updated (or abstracted).

In [20]:
# Creates empty DataFrame with module-specific columns
function initialize_report()
    report = DataFrame(
        # Main metrics
        generation = Int[],
        psiz = Int[], 
        q1fit = Float64[], 
        meanfit = Float64[],
        q2fit = Float64[], 
        maxfit = Float64[], 
        minfree = Float64[],
        meanfree = Float64[], 
        maxfree = Float64[], 
        minmut = Float64[],
        meanmut = Float64[], 
        maxmut = Float64[],
        # Module-specific mutation counts
        auxiliary_meanmut = Float64[],
        assembly_meanmut = Float64[],
        replication_meanmut = Float64[],
        host_meanmut = Float64[],
        # Module-specific free energies  
        auxiliary_meanfree = Float64[],
        assembly_meanfree = Float64[],
        replication_meanfree = Float64[],
        host_meanfree = Float64[]
    )
    return report
end

# Updates DataFrame with both original and module-specific metrics
function report_update!(populace, report, epi_config::EpistasisParams, generation::Int)
    # Original metrics
    fitness_values = [v.fitness for v in populace]
    all_ΔG_per_virus = [get_all_ΔG(v, epi_config) for v in populace]
    all_mutations_per_virus = [get_all_mutations(v, epi_config) for v in populace]
    
    # Module-specific metrics
    aux_mut, aux_free = get_module_stats(populace, :auxiliary)
    asm_mut, asm_free = get_module_stats(populace, :assembly)
    rep_mut, rep_free = get_module_stats(populace, :replication)
    host_mut, host_free = get_module_stats(populace, :host_interaction)
    
    # Precompute ΔG statistics for efficiency
    ΔG_mins = [minimum(ΔG_list) for ΔG_list in all_ΔG_per_virus]
    ΔG_means = [mean(ΔG_list) for ΔG_list in all_ΔG_per_virus]
    ΔG_maxs = [maximum(ΔG_list) for ΔG_list in all_ΔG_per_virus]
    
    push!(report, (
        generation = generation,
        psiz = length(populace),
        q1fit = quantile(fitness_values, 0.25),
        meanfit = mean(fitness_values),
        q2fit = median(fitness_values),
        maxfit = maximum(fitness_values),
        minfree = mean(ΔG_mins),
        meanfree = mean(ΔG_means),
        maxfree = mean(ΔG_maxs),
        minmut = minimum(all_mutations_per_virus),
        meanmut = mean(all_mutations_per_virus),
        maxmut = maximum(all_mutations_per_virus),
        auxiliary_meanmut = aux_mut,
        assembly_meanmut = asm_mut,
        replication_meanmut = rep_mut,
        host_meanmut = host_mut,
        auxiliary_meanfree = aux_free,
        assembly_meanfree = asm_free,
        replication_meanfree = rep_free,
        host_meanfree = host_free
    ))
end

report_update! (generic function with 1 method)

### Plotting Function

In [21]:
#creates plots including module-specific panels
function plot_simulation(report)
    abscissa = 1:size(report, 1)
    
    p1 = plot(abscissa, report.psiz, 
        ylims = (0, maximum(report.psiz)),
        label = "pop size", linewidth = 3, title = "A) Population Size")
        
    p2 = plot(abscissa, [report.q1fit report.meanfit report.q2fit report.maxfit],
        label = ["Q1 fitness" "mean fitness" "median fitness" "max fitness"], 
        linewidth = 3, title = "B) Fitness Distribution")
        
    p3 = plot(abscissa, [report.minfree report.meanfree report.maxfree],
        label = ["min ΔG" "mean ΔG" "max ΔG"],
        linewidth = 3, title = "C) Free Energy Distribution")
        
    p4 = plot(abscissa, [report.minmut report.meanmut report.maxmut],
        label = ["min mutations" "mean mutations" "max mutations"],
        linewidth = 3, title = "D) Total Mutations")
        
    p5 = plot(abscissa, [report.replication_meanmut report.assembly_meanmut report.host_meanmut],
        label = ["replication" "assembly" "host interaction"],
        linewidth = 3, title = "E) Mutations by Module")
        
    p6 = plot(abscissa, [report.replication_meanfree report.assembly_meanfree report.host_meanfree],
        label = ["replication" "assembly" "host interaction"],
        linewidth = 3, title = "F) Mean ΔG by Module")
    
    plot(p1, p2, p3, p4, p5, p6, 
        titleloc = :left, 
        titlefont = font(12), 
        layout = (3, 2), 
        size = (1000, 1000))
end

plot_simulation (generic function with 1 method)

## Simulation Proper

Let's start with a function to replicate the next generation:

In [22]:
# Simulates one generation of the population. fitness pushed to next generation
function synchronized_generation(populace, sim_config::SimulationParams, epi_config::EpistasisParams) 
    next_generation = ModularVirus[]
    # equivalent to Array{ModularVirus,1}() and eliminates type checking later
    
    for parent in populace
        offspring_count = probabilistic_round(sim_config.R_default * parent.fitness)
        for r in 1:offspring_count
            child = reproduce(parent, sim_config, epi_config)
            # Only add viable offspring (short-circuit evaluation)
            (child.fitness > 0) && push!(next_generation, child)
        end
    end
    
    return next_generation
end

synchronized_generation (generic function with 1 method)

Now let's generate a function that yields a complete simulation:

In [23]:
# Runs main sim and collects data into DataFrame
function synchronized_simulation(sim_config::SimulationParams, epi_config::EpistasisParams; 
                                report_frequency = 20)
    # Initialize population and reporting
    population = initialize_population(sim_config, epi_config)
    report = initialize_report()
    
    generation = 0
    max_generations = sim_config.sim_length
    
    while generation < max_generations
        generation += 1
        
        # Progress reporting
        if generation % report_frequency == 0
            println("Generation: $generation, Population size: $(length(population))")
        end
        
        # Collect data before selection/reproduction
        report_update!(population, report, epi_config, generation)
        
        # Generate next generation
        population = synchronized_generation(population, sim_config, epi_config)
        population_size = length(population)
        
        # Apply carrying capacity constraint
        if population_size > sim_config.K_default
            population = sample(population, sim_config.K_default, replace = false)
        elseif population_size == 0
            println("Population extinct at generation $generation")
            break
        end
    end
    
    # Final report update if simulation completed
    if length(population) > 0
        report_update!(population, report, epi_config, generation)
    end
    
    return report, population
end

synchronized_simulation (generic function with 1 method)

In [24]:
# Convenience function with default parameters
function run_simulation(; sim_config = SimulationParams(), epi_config = EpistasisParams(),
                        report_frequency = 20)
    return synchronized_simulation(sim_config, epi_config; report_frequency = report_frequency)
end

run_simulation (generic function with 1 method)

# Run the Simulation

Please uncomment if you wish to run (commented to focus on parameter sweep below):

In [25]:
#=
# All defaults
report, pop = run_simulation()

# Plot the results
plot_simulation(report)
=#

In [26]:
#=
# Custom configs
custom_sim = SimulationParams(L_default = 0.27, sim_length = 400)
report, pop = run_simulation(sim_config = custom_sim, epi_config = equal_epi_config)

# Plot the results
plot_simulation(report)
=#

In [27]:
#report

In [28]:
#=
# Another example with custom parameters
custom_sim = SimulationParams(L_default = 0.02, R_default = 10, N_default = 2000, K_default = 4000, sim_length = 200)
report, pop = run_simulation(sim_config = custom_sim, epi_config = ΦX174_epi_config, report_frequency = 10)

# Plot the results
plot_simulation(report)
=#

# Parameter Sweeps

This is working parameter sweep code:

In [29]:
function run_parameter_sweep(base_dir::String, epi_config::EpistasisParams; 
                           num_reps = 5,
                           R_range = 0:2:10,
                           U_range = 0:0.5:5.0, 
                           L_range = 0.0:0.2:1.0)
    
    mkpath(base_dir)
    
    # Create master results DataFrame for aggregation
    all_results = DataFrame()
    
    for R in R_range
        for U in U_range
            for L in L_range
                
                # Create parameter-specific directory
                param_dir = joinpath(base_dir, "R$(R)_U$(U)_L$(L)")
                mkpath(param_dir)
                
                println("Running parameter combination: R=$R, U=$U, L=$L")
                
                for rep in 1:num_reps
                    # Generate unique seed for this replicate
                    seed = rand(Int)
                    
                    # Create simulation config with current parameters
                    sweep_sim_config = SimulationParams(
                        R_default = R,
                        U_default = Poisson(U),
                        L_default = L
                    )
                    
                    # Run simulation
                    Random.seed!(seed)
                    report, final_pop = synchronized_simulation(sweep_sim_config, epi_config; report_frequency = 999999)

                    # Add metadata to report
                    # Insert multiple columns at the beginning
                    insertcols!(report, 1, 
                        :R => R,
                        :U => U, 
                        :L => L,
                        :N => sweep_sim_config.N_default,
                        :K => sweep_sim_config.K_default,
                        :replicate => rep,
                        :seed => seed
                    )
                    
                    # Save individual replicate files
                    csv_filename = joinpath(param_dir, "rep$(rep)_seed$(seed).csv")
                    png_filename = joinpath(param_dir, "rep$(rep)_seed$(seed).png")
                    
                    CSV.write(csv_filename, report)
                    
                    # Save plot
                    p = plot_simulation(report)
                    savefig(p, png_filename)
                    
                    # Add to master dataset
                    append!(all_results, report)
                    
                    println("  Completed replicate $rep (seed: $seed)")
                end
            end
        end
    end
    
    # Save master results
    master_csv = joinpath(base_dir, "master_results.csv")
    CSV.write(master_csv, all_results)
    
    println("Parameter sweep complete. Master results saved to: $master_csv")
    return all_results
end

run_parameter_sweep (generic function with 1 method)

Here's the code to run with a full, serially executed parameter sweep but not the code is commented out for now because I'd like to create a parallel execution option for efficiency:

In [30]:
#results = run_parameter_sweep("simulation_results", ΦX174_epi_config; num_reps = 3)

# Parameter Sweeps with Parallel Execution

Let's start with some system checks:

In Julia REPL do this before running this notebook:
```
using IJulia
IJulia.installkernel("Julia 8 threads", env=Dict("JULIA_NUM_THREADS"=>"8"))
```
And make sure to select the appropriate kernel to run this notebook.

Note I have a 12 thread machine (M2 Pro), but I am also experimenting with 16 threads on my M1 Ultra.

In [31]:
# Check what you actually got:
println("Threads: ", Threads.nthreads())

Threads: 8


In [32]:
# Before running, check available memory
println("Memory info: ")
println(Sys.total_memory() / 1e9, " GB total")
println(Sys.free_memory() / 1e9, " GB free")

Memory info: 
34.359738368 GB total
4.742381568 GB free


In [33]:
using Base.Threads
using ProgressMeter # making progress bar version - in progress

function run_parameter_sweep_parallel(base_dir::String, epi_config::EpistasisParams; 
                                    num_reps = 5,
                                    R_range = 0:2:10,
                                    U_range = 0:0.5:5.0, 
                                    L_range = 0.0:0.2:1.0)
    
    mkpath(base_dir)
    
    # Pre-generate all parameter combinations and seeds
    param_combinations = []
    for R in R_range, U in U_range, L in L_range, rep in 1:num_reps
        seed = rand(Int)
        push!(param_combinations, (R=R, U=U, L=L, rep=rep, seed=seed))
    end

    total_sims = length(param_combinations)
    println("Running $total_sims simulations on $(nthreads()) threads...")
    
    # Thread-safe storage for results
    all_results = Vector{DataFrame}(undef, total_sims)

    # Create progress bar
    progress = Progress(total_sims, desc="Simulations: ", barlen=50, color=:green)
    
    # Parallel execution
    @threads for i in 1:total_sims
        params = param_combinations[i]
        
        # Create parameter-specific directory (thread-safe)
        param_dir = joinpath(base_dir, "R$(params.R)_U$(params.U)_L$(params.L)")
        mkpath(param_dir)
        
        # Create simulation config
        sweep_sim_config = SimulationParams(
            R_default = params.R,
            U_default = Poisson(params.U),
            L_default = params.L
        )
        
        # Set seed for this specific simulation
        Random.seed!(params.seed)
        
        # Run simulation
        report, final_pop = synchronized_simulation(sweep_sim_config, epi_config; report_frequency = 999999)
        
        # Add metadata
        insertcols!(report, 1, 
            :R => params.R,
            :U => params.U, 
            :L => params.L,
            :N => sim_config.N_default,
            :K => sim_config.K_default,
            :replicate => params.rep,
            :seed => params.seed
        )
        
        # Save CSV file
        csv_filename = joinpath(param_dir, "rep$(params.rep)_seed$(params.seed).csv")        
        CSV.write(csv_filename, report)
        
        # Store result for later aggregation
        all_results[i] = report
        
        # Update progress bar (thread-safe)
        next!(progress)
        #println("Completed: R=$(params.R), U=$(params.U), L=$(params.L), rep=$(params.rep) on thread $(threadid())")
    end
    
    # Combine all results
    master_results = vcat(all_results...)
    
    # Save master results
    master_csv = joinpath(base_dir, "master_results.csv")
    CSV.write(master_csv, master_results)
    
    println("\nParameter sweep complete! Master results saved to: $master_csv")
    return master_results
end

run_parameter_sweep_parallel (generic function with 1 method)

Note that I've had to comment out the plot generation as this appeared not to be safe in parallel execution.

Now run parameter sweep!

In [34]:
#results = run_parameter_sweep_parallel("simulation_results", ΦX174_epi_config; num_reps = 3)

# Try a small test with 8 threads
results = run_parameter_sweep_parallel("test_run", ΦX174_epi_config; 
                                      num_reps = 1,
                                      R_range = [1, 10],      # 2 values
                                      U_range = [1.0, 5.0],   # 2 values  
                                      L_range = [0.0])        # 1 value

Running 4 simulations on 8 threads...
Population extinct at generation 13
Population extinct at generation 53


[32mSimulations: 100%|██████████████████████████████████████████████████| Time: 0:02:05[39m



Parameter sweep complete! Master results saved to: test_run/master_results.csv


Row,R,U,L,N,K,replicate,seed,generation,psiz,q1fit,meanfit,q2fit,maxfit,minfree,meanfree,maxfree,minmut,meanmut,maxmut,auxiliary_meanmut,assembly_meanmut,replication_meanmut,host_meanmut,auxiliary_meanfree,assembly_meanfree,replication_meanfree,host_meanfree
Unnamed: 0_level_1,Int64,Float64,Float64,Int64,Int64,Int64,Int64,Int64,Int64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64
1,1,1.0,0.0,5000,10000,1,-3244438032551929959,1,5000,0.996709,0.996709,0.996709,0.996709,-5.0,-5.0,-5.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-5.0,-5.0,-5.0,-5.0
2,1,1.0,0.0,5000,10000,1,-3244438032551929959,2,4987,0.992357,0.974478,0.997868,0.998444,-5.27081,-4.90507,-3.89564,0.0,1.02647,6.0,0.179266,0.285542,0.182073,0.379587,-4.90544,-4.90398,-4.90938,-4.90354
3,1,1.0,0.0,5000,10000,1,-3244438032551929959,3,4857,0.975924,0.956925,0.994744,0.998484,-5.46453,-4.8209,-3.22102,0.0,2.00124,10.0,0.352481,0.55384,0.368128,0.726786,-4.81059,-4.81471,-4.82836,-4.82696
4,1,1.0,0.0,5000,10000,1,-3244438032551929959,4,4635,0.956743,0.941461,0.989787,0.998586,-5.62443,-4.7452,-2.79461,0.0,2.93657,10.0,0.520173,0.815318,0.53959,1.06149,-4.73962,-4.72506,-4.7578,-4.75679
5,1,1.0,0.0,5000,10000,1,-3244438032551929959,5,4346,0.935487,0.926332,0.983435,0.998641,-5.77405,-4.6782,-2.46642,0.0,3.87115,13.0,0.704556,1.06512,0.706397,1.39508,-4.65855,-4.65514,-4.68916,-4.69985
6,1,1.0,0.0,5000,10000,1,-3244438032551929959,6,3991,0.91905,0.915122,0.977721,0.998641,-5.90756,-4.62524,-2.26833,0.0,4.76272,14.0,0.872463,1.30393,0.865698,1.72062,-4.59564,-4.59415,-4.65081,-4.65058
7,1,1.0,0.0,5000,10000,1,-3244438032551929959,7,3659,0.904656,0.905111,0.971102,0.998769,-6.0196,-4.5706,-2.07621,0.0,5.65236,18.0,1.01831,1.56901,1.03116,2.03389,-4.53572,-4.53724,-4.5946,-4.60106
8,1,1.0,0.0,5000,10000,1,-3244438032551929959,8,3332,0.883613,0.888708,0.963111,0.998867,-6.11586,-4.51375,-1.91903,0.0,6.55552,16.0,1.18337,1.82863,1.19778,2.34574,-4.47225,-4.47456,-4.54801,-4.54677
9,1,1.0,0.0,5000,10000,1,-3244438032551929959,9,2938,0.861376,0.87884,0.956143,0.998867,-6.19955,-4.46852,-1.79375,0.0,7.39312,18.0,1.33799,2.07114,1.34445,2.63955,-4.43539,-4.41095,-4.50542,-4.50981
10,1,1.0,0.0,5000,10000,1,-3244438032551929959,10,2563,0.843953,0.867158,0.950645,0.998692,-6.29199,-4.43455,-1.69615,1.0,8.21225,19.0,1.48537,2.31409,1.49863,2.91416,-4.40256,-4.36416,-4.46516,-4.48804


Now let's try a medium-sized test:

In [35]:
#=
results = run_parameter_sweep_parallel("medium_test", ΦX174_epi_config; 
                                      num_reps = 2,
                                      R_range = 0:5:10,      # 3 values  
                                      U_range = 0:2.5:5.0,   # 3 values
                                      L_range = 0.0:0.5:1.0) # 3 values
=#

Here's the full sweep (not yet run):

In [36]:
#=
results = run_parameter_sweep_parallel("full_sweep", ΦX174_epi_config; 
                                      num_reps = 5,
                                      R_range = 0:2:10,
                                      U_range = 0:0.5:5.0, 
                                      L_range = 0.0:0.2:1.0)
=#