# Storage class choices

Author: Sławomir Górawski

This notebook contains code supplementing my Master's thesis, "Exploring cloud application architectures: how architectural choices impact the scale-cost dynamics". It is used to calculate the costs of cloud architectures, depending on various parameters that can be customized.

This notebook corresponds to case study 4.2, "Storage class choices". For explanations of how the calculations work, please refer to the thesis.

---

How to run (in Google Colab):

1. Click "Connect" in the top-right corner. (You may be asked to log in to your Google account, this is ok, the service should be free.)
2. Select "Runtime" > "Run everything". If it doesn't work, run every cell one by one, top to bottom, using the ▶ button.
3. On the bottom, there should be inputs for parameters. Adjust them to your liking and click "Run Interact". This should give you results as a table and a chart. You can change the parameters and click the same button to re-run with the new parameters.



In [None]:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import pandas as pd
import ipywidgets as widgets
from IPython.display import display, Markdown

In [None]:
# Taken from https://cloud.google.com/storage/pricing
# Data for region Warsaw (europe-central2) as of October 2024.
# All prices in USD per GB, monthly.
STANDARD_STORAGE_PRICE = 0.023
COLDLINE_STORAGE_PRICE = 0.006
ARCHIVAL_STORAGE_PRICE = 0.0025

In [None]:
def calculate(total_data_tb: str, coldline_ratio: float, archival_ratio: float, log_scale: bool):
    index = [int(v.strip()) for v in total_data_tb.split(',') if v]

    column_descs = {
        'opt1': 'Standard storage only cost [$/mo]',
        'opt2': 'Standard and coldline storage cost [$/mo]',
        'opt3': 'Standard, coldline and archival storage cost [$/mo]',
    }

    # Calculate the results and put them into a DataFrame

    df = pd.DataFrame(columns=list(column_descs.keys()), index=index)

    for data_tb in index:
        total_data_gb = data_tb * 1000
        opt1_total_cost = total_data_gb * STANDARD_STORAGE_PRICE

        opt2_standard_cost = total_data_gb * (1 - coldline_ratio) * STANDARD_STORAGE_PRICE
        opt2_coldline_cost = total_data_gb * coldline_ratio * COLDLINE_STORAGE_PRICE
        opt2_total_cost = opt2_standard_cost + opt2_coldline_cost

        opt3_standard_cost = total_data_gb * (1 - coldline_ratio) * STANDARD_STORAGE_PRICE
        opt3_coldline_cost = total_data_gb * (coldline_ratio - archival_ratio) * COLDLINE_STORAGE_PRICE
        opt3_archival_cost = total_data_gb * archival_ratio * ARCHIVAL_STORAGE_PRICE
        opt3_total_cost = opt3_standard_cost + opt3_coldline_cost + opt3_archival_cost

        df.loc[data_tb] = [opt1_total_cost, opt2_total_cost, opt3_total_cost]

    display(df.rename_axis('Total data [TB]').rename(columns=column_descs))

    # Plot the results

    # Define the width of the bars
    bar_width = 0.2

    # Set the positions of the bars on the x-axis
    index_positions = np.arange(len(df))

    # Create the figure and axes
    plt.figure(figsize=(8,6))

    # Plot the bars for both columns
    plt.bar(index_positions, df['opt1'], bar_width, label=column_descs['opt1'], color='lightgray', edgecolor='black', hatch='/')
    plt.bar(index_positions + bar_width, df['opt2'], bar_width, label=column_descs['opt2'], color='gray', edgecolor='black', hatch='.')
    plt.bar(index_positions + 2 * bar_width, df['opt3'], bar_width, label=column_descs['opt3'], color='darkgray', edgecolor='black', hatch='x')

    # Add labels and title
    plt.xlabel('Total data [TB]')
    plt.ylabel('Total costs [$/mo]' + (' (log scale)' if log_scale else ''))
    plt.title('Storage classes: monthly costs comparison')

    # Add tick marks for the index
    plt.xticks(index_positions + bar_width, df.index)

    if log_scale:
        # Set the y-axis to logarithmic scale
        plt.yscale('log')

    # Use plain decimal format for the y-axis labels
    ax = plt.gca()  # Get current axis
    ax.yaxis.set_major_formatter(ticker.ScalarFormatter())
    ax.yaxis.get_major_formatter().set_scientific(False)
    ax.ticklabel_format(axis='y', style='plain')  # Ensure plain decimal format

    # Add legend
    plt.legend()

    # Display the chart
    plt.show()

total_data_tb_widget = widgets.Text(value='10,100,1000', description='Data [TB]', placeholder='Add values, comma separated')
coldline_data_ratio_widget = widgets.BoundedFloatText(value=0.9, min=0, max=1, description='Coldine ratio')
archival_data_ratio_widget = widgets.BoundedFloatText(value=0.5, min=0, max=1, description='Archival ratio')
chart_log_scale_widget = widgets.Checkbox(value=True, description='Log scale (for the chart)')

display(Markdown('''
## Inputs

Adjust the values below and click "Run Interact" to run (or re-run) the calculation.

Parameters:

* Data [TB]: Values to run the calculation for, as a comma-separated list (e.g. `10,100,1000`).
* Coldine ratio: The amount of data that can be put in coldline storage, as a fraction, so e.g. `0.9` for 90%.
* Archival ratio: The amount of data that can be put in archival storage, as a fraction, so e.g. `0.5` for 50%.

Note: the archival ratio is subtracted from the coldline ratio,
so for values of coldine 0.9 and archival 0.5, the proportions of data will be:
0.1 in standard storage, 0.4 in coldline, 0.5 in archival.

Warning: The inputs may be locale-dependent, so you can try with a comma if a dot doesn't seem to work (`0,9` instead of `0.9`).
'''))

widgets.interact_manual(
    calculate,
    total_data_tb=total_data_tb_widget,
    coldline_ratio=coldline_data_ratio_widget,
    archival_ratio=archival_data_ratio_widget,
    log_scale=chart_log_scale_widget,
);


## Inputs

Adjust the values below and click "Run Interact" to run (or re-run) the calculation.

Parameters:

* Data [TB]: Values to run the calculation for, as a comma-separated list (e.g. `10,100,1000`).
* Coldine ratio: The amount of data that can be put in coldline storage, as a fraction, so e.g. `0.9` for 90%.
* Archival ratio: The amount of data that can be put in archival storage, as a fraction, so e.g. `0.5` for 50%.

Note: the archival ratio is subtracted from the coldline ratio,
so for values of coldine 0.9 and archival 0.5, the proportions of data will be:
0.1 in standard storage, 0.4 in coldline, 0.5 in archival.

Warning: The inputs may be locale-dependent, so you can try with a comma if a dot doesn't seem to work (`0,9` instead of `0.9`).


interactive(children=(Text(value='10,100,1000', description='Data [TB]', placeholder='Add values, comma separa…