Skip to content
This repository has been archived by the owner on Feb 27, 2024. It is now read-only.

strymon-system/megaphone

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Megaphone: Latency-conscious state migration

This is the repository for Megaphone's implementation on timely dataflow together with a set of benchmarks. Megaphone is described in "Megaphone: Latency-conscious state migration for distributed streaming dataflows", read it on https://arxiv.org/abs/1812.01371.

At time of writing, the code has been tested with Rust 1.30 stable. To install Rust, visit rustup.org or obtain Rust via other means.

Code organization

The repository contains a generic implementation of a scaling mechanism for timely dataflow as well as some benchmark programs. The source code to the mechanism is in src, the benchmarks can be found in nexmark.

Benchmarks

We provide two kinds of benchmarks: A modified word-count (word_count.rs) and the NEXMark benchmarking suite (timely.rs, differential.rs). The NEXMark data generator is based on the data generator by Nicolas Hafner, original available on https://github.com/Shinmera/bsc-thesis/blob/master/benchmarks/src/nexmark.rs

  • Word count counts (key, val)-pairs where both are 64-bit integer types. Hash-count uses a HashMap to store the values. Key-count uses the key as an index into an array.

Parameters

All benchmarks share a set of parameters.

  • A rate determines how many records are produced per second per timely worker.
  • A duration sets for how long an experiment is supposed to run.
  • A migration defines a migration to be performed. Here, either one of the predefined migrations can be selected, or a filename containing a migration plan can be provided.
  • The counting benchmarks have a domain to adjust the size of data they store. During initialization, all keys from the domain are set to a default value.
  • The counting benchmarks have different backends to select between hash- and key-count as well as their native implementations.
  • The NEXMark timely implementation executes a set of queries, which can be selected from q0 through q8 and q0-flex to q8-flex, where the first is the native timely implementation, and the second -flex implementations use Megaphone. (NEXMark differential is not based on Megaphone.)

Benchmarks should always be executed with Rust's release mode by adding --release to cargo's options.

To run a key-count experiment with four workers in a single process, a rate of 1000000 keys per second and a domain of 10 million, execute:

$ cargo run --release --bin word_count -- --backend vec --rate 1000000 --duration 30 --migration none --domain 10000000 -- -w4

Parameters after the second separator (--) are passed to timely dataflow.

This will produce output similar to the following:

    Finished release [optimized + debuginfo] target(s) in 0.03s
     Running `target/release/word_count --rate 1000000 --duration 30 --migration none --domain 10000000 --backend vec -- -w4`
backend	Vector
statm_RSS	11207	1191936
bin_shift	8
instructions	(0, [Map([0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0,
max_number: 65536
max_number: 65536
max_number: 65536
max_number: 65536
loading_done	3	2442
loading_done	0	2442
loading_done	2	2442
loading_done	1	2442
statm_RSS	500093138	143286272
statm_RSS	1000177255	143323136
statm_RSS	1500263478	143335424
statm_RSS	2000348862	143282176
statm_RSS	2500427146	143306752
statm_RSS	3000510903	143294464
statm_RSS	3500589805	143343616
statm_RSS	4000674417	143368192
statm_RSS	4500758623	143294464
statm_RSS	5000845824	143458304
statm_RSS	5500932658	143421440
statm_RSS	6001016329	143392768
statm_RSS	6500100357	143343616
statm_RSS	7000184078	143400960
statm_RSS	7500267736	143466496
statm_RSS	8000352564	143433728
statm_RSS	8500437906	143450112
statm_RSS	9000521337	143454208
statm_RSS	9500604198	143351808
statm_RSS	10000687008	143433728
statm_RSS	10500769241	143429632
statm_RSS	11000856297	143405056
statm_RSS	11500943404	143400960
statm_RSS	12001026310	143478784
statm_RSS	12500103922	143413248
statm_RSS	13000189306	143388672
statm_RSS	13500267298	143368192
statm_RSS	14000350281	143462400
statm_RSS	14500435223	143462400
statm_RSS	15000518534	143392768
statm_RSS	15500602761	143429632
statm_RSS	16000685690	143441920
statm_RSS	16500769981	143450112
statm_RSS	17000856149	143441920
statm_RSS	17500941156	143417344
statm_RSS	18001024801	143446016
statm_RSS	18500108779	143458304
statm_RSS	19000195481	143495168
statm_RSS	19500280029	143446016
statm_RSS	20000364139	143519744
statm_RSS	20500449183	143474688
statm_RSS	21000534631	143384576
statm_RSS	21500619807	143429632
statm_RSS	22000705706	143437824
statm_RSS	22500783532	143503360
statm_RSS	23000872966	143425536
statm_RSS	23500953048	143405056
statm_RSS	24001037920	143466496
statm_RSS	24500123394	143462400
statm_RSS	25000208318	143564800
statm_RSS	25500293209	143499264
statm_RSS	26000379004	143486976
statm_RSS	26500462933	143511552
statm_RSS	27000548406	143482880
statm_RSS	27500631475	143454208
statm_RSS	28000716668	143470592
statm_RSS	28500802035	143454208
statm_RSS	29000891828	143413248
statm_RSS	29500977316	143523840
statm_RSS	30001057364	143482880
latency_ccdf	122880	1	21
latency_ccdf	126976	0.9999997980769251	74
latency_ccdf	131072	0.9999990865384704	219
latency_ccdf	139264	0.9999969807692598	35299
latency_ccdf	147456	0.999657567310985	296241
latency_ccdf	155648	0.996809096184528	636340
latency_ccdf	163840	0.9906904423972073	793091
latency_ccdf	172032	0.9830645674705331	830141
latency_ccdf	180224	0.9750824425472842	838164
latency_ccdf	188416	0.9670231733940079	842592
latency_ccdf	196608	0.9589213273180641	846156
latency_ccdf	204800	0.9507852120116806	848651
latency_ccdf	212992	0.9426251063209125	850364
latency_ccdf	221184	0.9344485294764564	850920
latency_ccdf	229376	0.9262666064782057	851531
latency_ccdf	237568	0.9180788084800114	851461
latency_ccdf	245760	0.9098916835587338	851930
latency_ccdf	253952	0.9017000490221149	851918
latency_ccdf	262144	0.8935085298701103	851825
latency_ccdf	278528	0.8853179049488663	1703577
latency_ccdf	294912	0.8689373570294485	1703895
latency_ccdf	311296	0.8525537514177524	1703857
latency_ccdf	327680	0.8361705111906682	1703741
latency_ccdf	344064	0.8197883863481886	1703852
latency_ccdf	360448	0.803405194198027	1703642
latency_ccdf	376832	0.7870240212786151	1703881
latency_ccdf	393216	0.7706405502823024	1703869
latency_ccdf	409600	0.7542571946706039	1703739
latency_ccdf	425984	0.7378750890588934	1704149
latency_ccdf	442368	0.7214890411395285	1703737
latency_ccdf	458752	0.705106954758587	1703815
latency_ccdf	475136	0.6887241183776527	1703952
latency_ccdf	491520	0.6723399646890388	1703639
latency_ccdf	507904	0.6559588206157806	1704081
latency_ccdf	524288	0.6395734265425632	1704028
latency_ccdf	557056	0.6231885420847255	3407857
latency_ccdf	589824	0.5904206866305703	3407803
latency_ccdf	622592	0.5576533504071793	3407744
latency_ccdf	655360	0.5248865814914752	3408032
latency_ccdf	688128	0.4921170433450284	3407816
latency_ccdf	720896	0.4593495821216386	3407890
latency_ccdf	753664	0.42658140935979416	3407847
latency_ccdf	786432	0.39381365005948415	3407831
latency_ccdf	819200	0.3610460446053265	3407956
latency_ccdf	851968	0.3282772372281035	3407980
latency_ccdf	884736	0.29550819908165193	3407769
latency_ccdf	917504	0.2627411897813347	3407624
latency_ccdf	950272	0.22997557471177332	3408205
latency_ccdf	983040	0.1972043731038041	3407834
latency_ccdf	1015808	0.16443673880349288	3407742
latency_ccdf	1048576	0.1316699891185578	3407766
latency_ccdf	1114112	0.09890300866439415	6815950
latency_ccdf	1179648	0.033365028525336266	3445192
latency_ccdf	1245184	0.00023818269001747413	22930
latency_ccdf	1310720	0.00001770192290671228	758
latency_ccdf	1376256	0.000010413461438332101	524
latency_ccdf	1441792	0.000005374999948317308	314
latency_ccdf	1507328	0.000002355769208117604	242
latency_ccdf	1572864	0.000000028846153568786986	2
latency_ccdf	1638400	0.000000009615384522928995	0
summary_timeline	0	442368	720896	983040	1245184	1245184	1310720	1376256
summary_timeline	250000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	500000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	750000000	425984	688128	950272	1179648	1245184	1310720	1310720
summary_timeline	1000000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	1250000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	1500000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	1750000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	2000000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	2250000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	2500000000	442368	720896	950272	1179648	1245184	1310720	1507328
summary_timeline	2750000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	3000000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	3250000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	3500000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	3750000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	4000000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	4250000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	4500000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	4750000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	5000000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	5250000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	5500000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	5750000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	6000000000	425984	688128	950272	1179648	1245184	1245184	1245184
summary_timeline	6250000000	425984	688128	950272	1179648	1245184	1245184	1245184
summary_timeline	6500000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	6750000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	7000000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	7250000000	425984	688128	950272	1179648	1245184	1245184	1245184
summary_timeline	7500000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	7750000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	8000000000	425984	688128	950272	1179648	1245184	1245184	1245184
summary_timeline	8250000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	8500000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	8750000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	9000000000	425984	688128	950272	1179648	1245184	1245184	1245184
summary_timeline	9250000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	9500000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	9750000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	10000000000	425984	688128	950272	1179648	1245184	1245184	1245184
summary_timeline	10250000000	425984	688128	950272	1179648	1245184	1245184	1245184
summary_timeline	10500000000	425984	688128	950272	1179648	1245184	1376256	1638400
summary_timeline	10750000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	11000000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	11250000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	11500000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	11750000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	12000000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	12250000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	12500000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	12750000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	13000000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	13250000000	425984	688128	950272	1179648	1245184	1245184	1376256
summary_timeline	13500000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	13750000000	425984	688128	950272	1179648	1245184	1310720	1376256
summary_timeline	14000000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	14250000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	14500000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	14750000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	15000000000	425984	688128	950272	1179648	1245184	1245184	1245184
summary_timeline	15250000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	15500000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	15750000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	16000000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	16250000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	16500000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	16750000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	17000000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	17250000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	17500000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	17750000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	18000000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	18250000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	18500000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	18750000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	19000000000	425984	688128	950272	1179648	1245184	1245184	1245184
summary_timeline	19250000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	19500000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	19750000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	20000000000	425984	688128	950272	1179648	1245184	1245184	1245184
summary_timeline	20250000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	20500000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	20750000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	21000000000	425984	688128	950272	1179648	1245184	1245184	1245184
summary_timeline	21250000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	21500000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	21750000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	22000000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	22250000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	22500000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	22750000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	23000000000	425984	688128	950272	1179648	1245184	1245184	1245184
summary_timeline	23250000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	23500000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	23750000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	24000000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	24250000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	24500000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	24750000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	25000000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	25250000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	25500000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	25750000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	26000000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	26250000000	425984	688128	950272	1179648	1245184	1245184	1245184
summary_timeline	26500000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	26750000000	425984	720896	950272	1179648	1245184	1310720	1376256
summary_timeline	27000000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	27250000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	27500000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	27750000000	425984	688128	950272	1179648	1245184	1245184	1376256
summary_timeline	28000000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	28250000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	28500000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	28750000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	29000000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	29250000000	425984	688128	950272	1179648	1245184	1376256	1638400
summary_timeline	29500000000	425984	688128	950272	1179648	1245184	1245184	1310720
summary_timeline	29750000000	425984	688128	950272	1179648	1245184	6291456	6553600

The output is a list of tab-separated values on stdout.

  • backend Vector: What kind of backend has been selected.
  • bin_shift: The number of bins (log).
  • statm_RSS 13000189306 143388672: RSS memory consumption of 143388672 bytes at time 13000189306ns since start of application.
  • latency_ccdf 229376 0.9262666064782057 851531: latency CCDF value, latency 229376ns, smaller than 0.9262666064782057% of all measurements, 851531 measurements.
  • summary_timeline 1250000000 425984 688128 950272 1179648 1245184 1245184 1310720: Some percentiles at time 1250000000ns: 25%, 50%, 75%, 99%, 99.9%, max in nanoseconds.

Running the automated benchmarks

Benchmarks are located in experiments/nexmark and include a driver harness to run timely computations remotely with multiple processes on different machines. Also, some word-count-like experiments can be executed. The harness is located in experiments/nexmark and consists of tools to run a set of benchmarks and to plot the results. The quality of the code is debatable but it should serve as a starting point for anyone interested in Megaphone's performance.

All benchmarks are contained in bench.py and require Python 3 to run.

Directory structure

The benchmark will produce the following directories:

  • setups contains a timely dataflow hostfile and a migration pattern for each experiment.
  • results stores stdout/stderr for each experiment and process.
  • charts is the target for all plots.

In general, the harness creates a subdirectory inside each of the above directories with the current GIT revision as name to aid reproducibility.

Preparation

We recommend running them from a virtual environment to avoid clobbering the system. Setup the virtual environment using: virtualenv -p python3 venv source venv/bin/activate pip install -r requirements.txt

The source code to Megaphone (this repository) must be available locally on all machines where the benchmark is to be executed. By default, the harness assumes it is the same directory as it is on the local machine. Override the cluster directory by setting the environment variable CLUSTERPATH. The user name on the remote machines is $USER by default but can be configured by setting CLUSTERUSER to the desired value. The remote machines are configured with the SERVER symbol and have the numbers 1, 2, 3... appended.

Running experiments

To run a predefined set of exhaustive experiments, run run_paper.sh after exporting the evironment symbols as described above. This will run a set of benchmarks, if possible in parallel, on a set of remote machines. The name of the benchmarks corresponds to functions defined in bench.py. Note that the current set of benchmarks require around 2.5 days to complete!

Plotting results

Assuming no parameters are changed, the script plot_all.sh can be used to visualize the results. When changing parameters to the benchmark, care must be taken to update the parameters here as well.