SimEDC is a discrete-event simulator that characterizes the reliability of an erasure-coded data center via simulation. It reports the reliability metrics based on the configurable inputs of the data center topology, erasure codes, redundancy placements, failure/repair patterns of different subsystems obtained from statistical models or production traces.
We build SimEDC on the High-Fidelity Reliability Simulator (HFRS) developed by Kevin Greenan.
Please install mpmath, numpy and scipy.
python simedc.py
-n <code_n> [--code_n <code_n>]
-k <code_k> [--code_k <code_k>]
-t <code_type> [--code_type <code_type>]
-T <place_type> [--place_type <place_type>]
-g <chunk_rack_config [--chunk_rack_config <chunk_rack_config>]
For more details, please run python simedc.py -h
.
The default mode is regular simulation without enabling importance sampling. To enable importance sampling, please specify the type of simulator (sim_type) as "unifbfb" and configure two parameters, probability of failure biasing (fb_prob) and beta (beta). For example,
./simedc.py -A unifbfb -f 0.5 -b 0.095 -i 2 -p 1 -t rs -n 9 -k 6 -T flat
.
Set a data center with 16 racks and 8 nodes per rack.
-
RS(9,6) in flat placement
python simedc.py -n 9 -k 6 -t rs -T flat
Results:
- PDL = 0.000000e+00
- RE = 0.0%
- NOMDL (bytes/byte) = 0.000000e+00
- BR = 4.430000e-04
-
RS(9,6) in hierarchical placement
python simedc.py -n 9 -k 6 -t rs -T hie -g 3,3,3
Results:
- PDL = 0.000000e+00
- RE = 0.0%
- NOMDL (bytes/byte) = 0.000000e+00
- BR = 4.175000e-04
-
simedc.py: the main command line interface of SimEDC
-
README: this file
-
lib: the library of SimEDC
-
simulation.py: contains class Simulation and its functions
-
regular_simulation.py: contains class RegularSimulation which is inherited from class Simulation
-
is_simulation.py: contains class UnifBFBSimulation which is inherited from class Simulation
-
network.py: contains class Network and its functions to keep track of the network bandwidth
-
placement.py: contains class Placement, including
- different erasure codes (i.e., Reed-Solomon Code, Locally Repairable Codes, and Double Regenerating Codes)
- different placement policies (i.e., flat placement and hierarchical placement)
-
smp_data_structures.py: contains
- class Disk, Node and Rack and their functions
- class Weibull and its functions
-
state.py: encapsulates the system state
-
bm_ops.py: contains functions of bitmap for different subsystems
-
sim_analysis_functions.py: contains class Samples which encapsulates a set of statistics operations
-
tracelib: the library for using traces
-
trace.py: contains class Parser and Trace to parse traces and obtain node failure/repair events (i.e., node permanent failures, node transient failures/repairs)
- data: contains trace.csv
-
Please email to Mi Zhang (mzhang@cse.cuhk.edu.hk) if you have any questions.
Mi Zhang, Shujie Han, and Patrick P.C. Lee.
"A Simulation Analysis of Reliability in Erasure-Coded Data Centers". SRDS 2017