**Table of contents**<a id='toc0_'></a>    
- [Importing Libraries](#toc1_)    
- [Reading a Result](#toc2_)    
- [Reding Example of Audible](#toc3_)    
- [Reading Example of CLT](#toc4_)    
- [Reading Example of oversubscription-oracle](#toc5_)    
- [Reporting Average Utilization and Violation Rate](#toc6_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

# <a id='toc1_'></a>[Importing Libraries](#toc0_)

In [1]:
import pandas as pd
import numpy as np

# <a id='toc2_'></a>[Reading a Result](#toc0_)

The `result_df` dataframe stores information on server **usage** and potential **carry_over** (resulting from resource shortage) throughout the steady-state phase. Each row within the dataframe is representative of an individual server. The steady-state time frame is established at 2016, indicating that the metrics for utilization and carry-over are recorded at 2016 distinct time points towards the end of the simulation. The **deployed_time** column includes a list of tuples, each consisting of a the simulation time point when the VM was deployed, terminated and the VMID, for VMs that influence steady-state usage. This means every VM active during the entire steady state period or a portion of it is documented in this list for each server.


Additional columns in the dataframe are algorithm-dependent. For comprehensive insights into each column unique to the algorithm, we delve into the results of executing each example in `run_simulator.ipynb`, as detailed below.

In [2]:
# reading result function
def read_result(location):
    simulation_param_dict = np.load(f'{location}_params.npy', allow_pickle = True).reshape(1, )[0]
    result_df = pd.read_feather(f'{location}.feather')

    print('Result for the following simulation setting has been retrieved:\n', simulation_param_dict['params'])
    print(f"In this simulation {simulation_param_dict['len_dropped_vmids']} VM(s) had been rejected for placement.")
    return result_df, simulation_param_dict

# <a id='toc3_'></a>[Reding Example of Audible](#toc0_)

In [3]:
location = 'results/audible/small_1_audible_2021_burstable_2_86400_0.95_est_worst-fit_usage_10_48_0.01_0_True_2016'
result_df, simulation_param_dict = read_result(location)
result_df.head(2)

Result for the following simulation setting has been retrieved:
 {'rand_seed': 1, 'algorithm_name': 'audible', 'ds_name': '2021_burstable', 'num_arrival_vms_per_time_idx': 2, 'time_bound': 86400, 'first_model': 0.95, 'prediction_type': 'est', 'lb_name': 'worst-fit_usage', 'number_of_servers': 10, 'server_capacity': 48, 'acceptable_violation': 0.01, 'retreat_num_samples': 0, 'drop': True, 'steady_state_time': 2016}
In this simulation 0 VM(s) had been rejected for placement.


Unnamed: 0,usage,carry_over,deployed_times
0,"[16.949999999999985, 15.539999999999981, 15.29...","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[[1571, 86400, 111221], [8714, 86400, 54333], ..."
1,"[16.859999999999985, 15.099999999999982, 13.67...","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[[2637, 86400, 529971], [14489, 86400, 922238]..."


# <a id='toc4_'></a>[Reading Example of CLT](#toc0_)

Besides the standard columns present in result dataframes, the dataframe specific to the CLT algorithm features additional columns: **variance** and **mean**. These columns record the variance and mean values of the Gaussian distribution that models the aggregated server usage across each time point in the steady-state period for each of the servers.

In [4]:
location = 'results/CLT/small_1_CLT_2021_burstable_2_86400_0.95_est_worst-fit_usage_10_48_0.01_0_True_2016'
result_df, simulation_param_dict = read_result(location)
result_df.head(2)

Result for the following simulation setting has been retrieved:
 {'rand_seed': 1, 'algorithm_name': 'CLT', 'ds_name': '2021_burstable', 'num_arrival_vms_per_time_idx': 2, 'time_bound': 86400, 'first_model': 0.95, 'prediction_type': 'est', 'lb_name': 'worst-fit_usage', 'number_of_servers': 10, 'server_capacity': 48, 'acceptable_violation': 0.01, 'retreat_num_samples': 0, 'drop': True, 'steady_state_time': 2016}
In this simulation 0 VM(s) had been rejected for placement.


Unnamed: 0,usage,variance,mean,carry_over,deployed_times
0,"[16.949999999999985, 15.539999999999981, 15.29...","[2.517226809413581, 2.517226809413581, 2.51722...","[17.50222474747475, 17.50222474747475, 17.5022...","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[[1571, 86400, 111221], [8714, 86400, 54333], ..."
1,"[16.859999999999985, 15.099999999999982, 13.67...","[3.9740902683935326, 3.9740902683935326, 3.589...","[20.561474747474755, 20.561474747474755, 19.57...","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[[2637, 86400, 529971], [14489, 86400, 922238]..."


# <a id='toc5_'></a>[Reading Example of oversubscription-oracle](#toc0_)

In addition to the regular columns in result dataframes, the dataframe for the oversubscription-oracle algorithm includes an extra column: **mean**. This column reflects the total allocated CPU, based on the 'first_model' algorithm parameter, at every point in the steady state for each server. For instance, if 'first_model' is set to '2X', the column would display the sum of 2X the baseline for colocated VMs at each simulation point.

In [5]:
location = 'results/oversubscription-oracle/small_1_oversubscription-oracle_2021_burstable_2_86400_0.5X_oracle_worst-fit_usage_10_48_0.01_0_True_2016'
result_df, simulation_param_dict = read_result(location)
result_df.head(2)

Result for the following simulation setting has been retrieved:
 {'rand_seed': 1, 'algorithm_name': 'oversubscription-oracle', 'ds_name': '2021_burstable', 'num_arrival_vms_per_time_idx': 2, 'time_bound': 86400, 'first_model': '0.5X', 'prediction_type': 'oracle', 'lb_name': 'worst-fit_usage', 'number_of_servers': 10, 'server_capacity': 48, 'acceptable_violation': 0.01, 'retreat_num_samples': 0, 'drop': True, 'steady_state_time': 2016}
In this simulation 0 VM(s) had been rejected for placement.


Unnamed: 0,usage,mean,carry_over,deployed_times
0,"[16.949999999999985, 15.539999999999981, 15.29...","[27.47500000000001, 27.47500000000001, 27.4750...","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[[1571, 86400, 111221], [8714, 86400, 54333], ..."
1,"[16.859999999999985, 15.099999999999982, 13.67...","[33.28500000000002, 33.28500000000002, 32.8350...","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[[2637, 86400, 529971], [14489, 86400, 922238]..."


The dataframe corresponding to the rc algorithm does not contain any extra columns; therefore, we have chosen not to include it here.

# <a id='toc6_'></a>[Reporting Average Utilization and Violation Rate](#toc0_)

the following function is used ot report the utilization and violation rate for each experiment result (metrics are defined in the paper in section 5.3). 

- Server utilization: The average CPU utilization in the steady state for each server.
- Server capacity violation Rate: The fraction of all steady state points with a server capacity violation(BVM CPU demand exceeded server capacity) for each server.

In [6]:
def report_usage_violation(result_df, server_capacity, acceptable_violation, steady_state_time):
        avg_usage = np.mean(result_df['usage'].apply(np.mean))*100/server_capacity
        print('Average utilization (%) accross all servers:', avg_usage)
        num_servers_with_severe_violation = np.count_nonzero(result_df['usage'].apply(lambda u: 1 if np.sum(u>=server_capacity)/steady_state_time >= acceptable_violation else 0))
        print('Number of servers with violation more than {}% in the last week is {}'.format(acceptable_violation, num_servers_with_severe_violation) )
        avg_violation_rate = np.mean(result_df['usage'].apply(lambda u: 100*np.sum(u>=server_capacity)/steady_state_time))
        print('Average violation rate is {}%'.format(avg_violation_rate) )
        return avg_usage, num_servers_with_severe_violation, avg_violation_rate

In [7]:
server_capacity = int(simulation_param_dict["params"]['server_capacity'])
acceptable_violation = float(simulation_param_dict["params"]["acceptable_violation"])
steady_state_time = int(simulation_param_dict["params"]['steady_state_time'])
report_usage_violation(result_df, server_capacity, acceptable_violation, steady_state_time)

Average utilization (%) accross all servers: 32.84840546461638
Number of servers with violation more than 0.01% in the last week is 0
Average violation rate is 0.0%


(32.84840546461638, 0, 0.0)