Skip to content
Idleness optimisation in bag-of-tasks processing through cloud functions
Python Shell
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
reference-data-faas
reference-data-simulation
various-tests
.gitignore
README
faasconsumer.py
faasexperiment.sh
faasmeasure.py
faasproducer.py
faasproxy.py
gettestdata.doc
gettestdata.ds1.urls
gettestdata.ds2.urls
gettestdata.sh
shrinktestdata.sh
strategies.py

README

This little experiment attempts to find out whether the "idleness optimisation" problem in FaaS would have a feasible and practical solution. In short, the problem statement is that since cloud functions are billed in 100ms intervals, there may be a loss if cloud functions execute just a little longer. The idea is when the function processes many units of data, then processing another one (and another one, if needed...) would yield a situation where the execution time comes close to the billing interval, thus reducing the loss at the expense of reduced parallelisation.

The datasets 'ds1' and 'ds2' consist of some nice photos (28 and 73, respectively) which are retrieved by the script 'gettestdata.sh' which defaults to the larger dataset 'ds2'. As the files are quite big, the script 'shrinktestdata.sh' produces a smaller variant. See 'gettestdata.doc' for details.

Instructions:
- run ./gettestdata.sh, it will fetch some sample data, nice pictures of Poland and Switzerland
- run python3 strategies.py, it will take some time... to repeatedly blur all images
- statistics will be output into strategies.csv and some more on the terminal directly
- you can compare strategies.csv against the existing reference results strategies.*.csv

Interpretation:
- the saving should be close to the theoretic maximum while the maxthread value should be as small as possible

When using the FaaS execution instead of the simulation, the local script 'faasproducer.py' uploads the chosen dataset into a queue in the cloud (AWS SQS), and the cloud function script 'faasconsumer.py' then runs as Lambda to fake-process the data. (Actual processing would require the installation of Scipy into the function bundle.) The FaaS execution is currently incomplete. The cloud function script 'faasproxy.py' adds monetary information. The local script 'faasmeasure.py' invokes the proxy and finally retrieves statistics.

Reproducibility:
0. only once: ./gettestdata.sh ds2; ./shrinktestdata.sh _testdata.ds2
1. python3 faasproducer.py _testdata.ds2-small/
   73 data units; theoretic max savings boundary = 36.50s
   ... (wait)
2. python3 faasmeasure.py 1 False
3. # inspect data/faasmeasure.1.False.log
4. # restart with 1. and tune parameters in 2., e.g. 3 False
You can’t perform that action at this time.