C implementation of FRAMES[0], data-driven windows
The executables takes several inputs to customize the framing operation on the input file, here is the ordered list of the program arguments (all required):
- (string)
input file path
path to the csv file representing the events to be processed, should be formatted as "ts, key, value" - (int)
frame type
THRESHOLD = 0 | DELTA = 1 | AGGREGATE = 2 - (int)
report policy
ON CLOSE = 0 | ON UPDATE = 1 - (int)
order policy
IN ORDER = 0 | OUT OF ORDER = 1 - (int)
buffer type
SINGLE BUFFER = 0 | MULTI BUFFER = 1 - (int)
X
SINGLE BUFFER: after X frames created / MULTI BUFFER: after X ms passed / X = -1 to not evict frames - (int)
Y
evict the older Y frames
- (string)
input file path
path to the csv file representing the events to be processed, should be formatted as "ts, key, value" - (int)
frame type
THRESHOLD = 0 | DELTA = 1 | AGGREGATE = 2
Several define
inside the single/multi_buffer.c file are used to customize the execution:
THRESHOLD
threshold of the event's value being evaluated for the Threshold Frames constructionDELTA
a Delta Frame is emitted whenever the delta between the minimum and maximum value of "value" becomes greater than this parametersAGGREGATE
specifies the aggregation function for the Aggregate Frames construction (AVG = 0 | SUM = 1)AGGREGATE THRESHOLD
reports a new frame if the aggregate value becomes greater than this parameter
MAX CHARS
Max admitted characters in a line representing the eventMAX FRAMES
Max size of multi-buffer, pay attention to choosing an eviction policy that does not cause overflow
DEBUG
set totrue
to print the current frame when the report policy is satisfied
To perform the evaluation, we measure the execution time of the SECRET[1] methods while framing the input stream of events, we also save the number of tuples and frames created until that moment to measure the algorithmic complexity of the program. To customize an evaluation, it is possible to specify which tests will be executed: you can find in the /evaluation
floder two .ini
configuration file, one for single and one for multi buffer, list the commands set for all the configuration that you want to run following the Usage instructions, then run the python evaluation script corresponding to the chosen buffer structure in the main folder. The output files will be saved in the evaluation/results
folder, in a format that can be processed on the jupyther notebook available in the evaluation
folder.
In the resources
folder you can find the dataset used to perform the evaluation, it is also available a script ooo_generator/file_generator.py
to create an input csv file containing out of order events. Use the config.json
file to configure a new input stream file, created from an existing one, with some delayed events, pay attention to include the name of the new file (with .csv extension) in the output_dir
field.
A technical report of this project is available in the docs
folder
[0] Grossniklaus, Michael & Maier, David & Miller, James & Moorthy, Sharmadha & Tufte, Kristin. (2016). Frames: Data-driven Windows. 13-24. 10.1145/2933267.2933304.
[1] Botan, Irina & Derakhshan, Roozbeh & Dindar, Nihal & Haas, Laura & Miller, Renée & Tatbul, Nesime. (2010). SECRET: A Model for Analysis of the Execution Semantics of Stream Processing Systems. PVLDB. 3. 232-243.