# Ensemble Size and Speed Benchmarking

`Ensembles` are specifically designed for optimal usability, memory usage, and computational speed. In this tutorial we explore the size and speed related characteristics of `Ensembles` compared to using the equivalent individual models. We aim to begin to answer the following questions: 
- How much RAM does an ensemble use when working with it compared to working with the equivalent individual models?
- How much memory is used to store ensembles compared to the equivalent individual models?
- How long does it take to run FBA for all members of an ensemble compared to the equivalent individual models?

## Ensemble memory requirements during use and when saved

`Ensembles` are structured to minimize the amount of RAM required when loaded and when being saved. One of the major challenges when working with ensembles of models is having all of the models readily available in RAM while conducting analyses. With efficient packaging of the features that are different between members of an ensemble, we were able to significantly reduce the amount of RAM and hard drive space required for working with ensembles of models. 

In [1]:
import sys
import os
import psutil
import medusa
from medusa.test import create_test_ensemble

In [2]:
# RAM required to load in a 1000 member ensemble

# Check initial RAM usage
RAM_before = psutil.Process(os.getpid()).memory_info()[0]/1024**2 # Units = MB

# Load in test ensemble from file
ensemble = create_test_ensemble("Staphylococcus aureus")

# Check RAM usage after loading in ensemble
RAM_after = psutil.Process(os.getpid()).memory_info()[0]/1024**2 # Units = MB
RAM_used = RAM_after - RAM_before
# Print RAM usage increase due to loading ensemble
print("%.2f" % (RAM_used), "MB")

63.58 MB


In [3]:
# The test S. aureus model has 1000 members
print(len(ensemble.members),'Members')

1000 Members


In [4]:
# RAM required to load a single individual model

from copy import deepcopy
# Check initial RAM usage
RAM_before = psutil.Process(os.getpid()).memory_info()[0]/1024**2 # Units = MB

# Deepcopy base model to create new instance of model in RAM
extracted_base_model_copy = deepcopy(ensemble.base_model)

# Check RAM usage after loading in ensemble
RAM_after = psutil.Process(os.getpid()).memory_info()[0]/1024**2 # Units = MB
RAM_used = RAM_after - RAM_before
# Print RAM usage increase due to loading ensemble
print("%.2f" % (RAM_used), "MB")

19.23 MB


In [5]:
# If we were to load the individual base model as 1000 unique
# model variables we would use 1000x as much RAM:
RAM_used_for_1000_individual_model_variables = RAM_used * 1000
print("%.2f" % (RAM_used_for_1000_individual_model_variables), 'MB or')
print("%.2f" % (RAM_used_for_1000_individual_model_variables/1024.0), 'GB')

19230.47 MB or
18.78 GB


In [6]:
# Pickle the ensemble and extracted base model
import pickle
path = "../medusa/test/data/benchmarking/"
pickle.dump(ensemble, open(path+"Staphylococcus_aureus_ensemble1000.pickle","wb"))
pickle.dump(extracted_base_model_copy, open(path+"Staphylococcus_aureus_base_model.pickle","wb"))

In [7]:
# Check for file size of ensemble
file_path = "../medusa/test/data/benchmarking/Staphylococcus_aureus_ensemble1000.pickle"
if os.path.isfile(file_path):
    file_info = os.stat(file_path)
    mb = file_info.st_size/(1024.0**2) # Convert from bytes to MB
    print("%.2f %s" % (mb, 'MB for a 1000 member ensemble'))
else:
    print("File path doesn't point to file.")

6.61 MB for a 1000 member ensemble


In [8]:
# Check for file size of extracted base model
file_path = "../medusa/test/data/benchmarking/Staphylococcus_aureus_base_model.pickle"
if os.path.isfile(file_path):
    file_info = os.stat(file_path)
    mb = file_info.st_size/(1024.0**2) # Convert from bytes to MB
    print("%.2f %s" % (mb, 'MB per model'))
else:
    print("File path doesn't point to file.")

print("%.2f" % (mb*1000),'MB for 1000 individual model files.')
print("%.2f" % (mb*1000/1024),'GB for 1000 individual model files.')

1.17 MB per model
1171.96 MB for 1000 individual model files.
1.14 GB for 1000 individual model files.


## Flux analysis speed testing

Running FBA requires a relatively short amount of time to for a single model, however when working with ensembles of 1000s of models, the simple optimization problems can add up to significant amounts of time. Here we explore the expected timeframes for an ensemble and how that compares to using the equivalent number of individual models. It is important to note that during this benchmarking, we assume that the computer being used is capable to loading all individual modelings into the RAM, this may not be the case for many laptop computers. 

In [9]:
import time
from medusa.flux_analysis import flux_balance

In [10]:
# Time required to run FBA on a 1000 member ensemble using the innate Medusa functions.
runtimes = {}
for num_processes in [1,2,4,8]:
    t0 = time.time()
    flux_balance.optimize_ensemble(ensemble, num_processes = num_processes)
    t1 = time.time()
    runtimes[num_processes] = t1-t0
    print(str(num_processes) + ' processors: ' + str(t1-t0) + ' seconds for entire ensemble')

1 processors: 142.41587114334106 seconds for entire ensemble
2 processors: 79.16171908378601 seconds for entire ensemble
4 processors: 44.92253303527832 seconds for entire ensemble
8 processors: 34.65370845794678 seconds for entire ensemble


In [11]:
# Time required to run FBA on 1000 individual models using a single processor.
# This is the equivalent time that would be required if all 1000 models were pre-loaded in RAM.

t_total = 0
for member in ensemble.members:
    # Set the member state 
    ensemble.set_state(member.id)
    # Start the timer to capture only time required to run FBA on each model
    t0 = time.time()
    solution = ensemble.base_model.optimize()
    t1 = time.time()
    t_total = t1-t0 + t_total
print("%.2f" % (t_total) ,'seconds for 1000 models')

79.50 seconds for 1000 models


Loading individual models is about twice as fast as using Medusa ensembles (ignoring the time it takes to load all of the models), however requires about 300 times as much RAM. 

In [12]:
# Time required to run FBA on 1000 individual models with a complete solver reset 
# before each optimization problem is solved. 

# Load fresh version of model with blank solver state
fresh_base_model = pickle.load(open("../medusa/test/data/benchmarking/Staphylococcus_aureus_base_model.pickle","rb"))
# Determine how long it takes to run FBA on one individual model
t0 = time.time()
fresh_base_model.optimize()
t1 = time.time()
t_total = t1-t0
# Calculate how long it would take to run FBA on 1000 unique individual models
print("%.2f" % (t_total*1000), 'seconds for 1000 models')

436.84 seconds for 1000 models
