A cloud simulation framework.
- Simulating a cloud clients, provides and resource usage.
- Processing Azure's Public Dataset to produce client's data.
- Managing multiple datasets with different characteristics.
- Managing multiple data entries for each dataset.
- A concurrent job manager that is made to run a simulation on all the data entries in parallel.
- A simulation result manager.
- Includes various statistics methods.
Install the package in develop mode:
python setup.py develop --user
Then, download and setup Azure's Public Dataset via the provided scripts.
First, setup the framework configuration file:
cd azure-tools
python generate_azure_data.py --init
This will create a configuration file in ~/.cloudsim
.
Edit this file manually to choose where to store the dataset and the framework data.
Once the configuration is set, continue to download the dataset and preprocess it. This might take a while to preform dependent on your internet connection and your CPU speed.
cd azure-tools
python generate_azure_data.py --download --convert --choose-random-ids --cpu-data
To break it down, this command do the following:
- Download the dataset (approximately 85GB).
- Convert the main data file (
vmtable.csv.gz
) to a faster loading format (HDF). - Choose a random set of clients' IDs to work with. The default is 24,576 clients that have at least a day worth of data.
- For these client, fetch the CPU data from the dataset and create a CPU data file for each client individually.