# Data Transfer Costs
Author: Sławomir Górawski

This notebook contains code supplementing my Master's thesis, "Exploring cloud application architectures: how architectural choices impact the scale-cost dynamics". It is used to calculate the costs of cloud architectures, depending on various parameters that can be customized.

This notebook corresponds to case study 4.3, "Data Transfer Costs". For explanations of how the calculations work, please refer to the thesis.

---

How to run (in Google Colab):

1. Click "Connect" > "Connect to a hosted runtime" in the top-right corner. (You may be asked to log in to your Google account, this is ok, the service should be free.)
2. Select "Runtime" > "Run everything". If it doesn't work, run every cell one by one, top to bottom, using the ▶ button.
3. On the bottom, there should be inputs for parameters. Adjust them to your liking and click "Run Interact". This should give you results as a table and a chart. You can change the parameters and click the same button to re-run with the new parameters.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import pandas as pd
import ipywidgets as widgets
from IPython.display import display, Markdown

In [3]:
# Taken from https://cloud.google.com/vpc/network-pricing
# All prices in USD/GB, monthly
# Europe or US
DATA_TRANSFER_COST_WITHIN_REGION = 0.02
# Averaged out of different region combinations, e.g. for EU - US it's less, for EU - South America it's more
DATA_TRANSFER_COST_BETWEEN_REGIONS = 0.1
# TODO: Maybe parametrize the regions involved?

In [6]:
def calculate_duplicated_cost(data_gb: int, num_regions: int, single_service_cost: int):
    total_services_cost = num_regions * single_service_cost
    data_transfer_costs = data_gb * DATA_TRANSFER_COST_WITHIN_REGION
    return total_services_cost + data_transfer_costs


def calculate_centralized_cost(data_gb: int, single_service_cost: int):
    data_transfer_costs = data_gb * DATA_TRANSFER_COST_BETWEEN_REGIONS
    return single_service_cost + data_transfer_costs

In [8]:
def calculate(data_tb_csv: str, num_regions: int, single_service_cost: int, log_scale: bool):
    index = [int(v.strip()) for v in data_tb_csv.split(',') if v]

    column_descs = {
        'duplicated': 'Total duplicated cost [$/mo]',
        'centralized': 'Total centralized cost [$/mo]',
    }

    # Calculate the results and put them into a DataFrame

    df = pd.DataFrame(columns=list(column_descs.keys()), index=index)

    for data_tb in index:
        data_gb = data_tb * 1000
        duplicated_cost = calculate_duplicated_cost(data_gb, num_regions, single_service_cost)
        centralized_cost = calculate_centralized_cost(data_gb, single_service_cost)

        df.loc[data_tb] = [duplicated_cost, centralized_cost]

    display(df.rename_axis('Data [TB]').rename(columns=column_descs))

    # Plot the results

    # Define the width of the bars
    bar_width = 0.2

    # Set the positions of the bars on the x-axis
    index_positions = np.arange(len(df))

    # Create the figure and axes
    plt.figure(figsize=(8,6))

    # Plot the bars for both columns
    plt.bar(index_positions, df['duplicated'], bar_width, label=column_descs['duplicated'], color='lightgray', edgecolor='black', hatch='/')
    plt.bar(index_positions + bar_width, df['centralized'], bar_width, label=column_descs['centralized'], color='gray', edgecolor='black', hatch='.')

    # Add labels and title
    plt.xlabel('Data [TB]')
    plt.ylabel('Total costs [$/mo]' + (' (log scale)' if log_scale else ''))
    plt.title('Duplicated vs centralized: monthly costs comparison')

    # Add tick marks for the index
    plt.xticks(index_positions + bar_width / 2, df.index)

    if log_scale:
        # Set the y-axis to logarithmic scale
        plt.yscale('log')

    # Use plain decimal format for the y-axis labels
    ax = plt.gca()  # Get current axis
    ax.yaxis.set_major_formatter(ticker.ScalarFormatter())
    ax.yaxis.get_major_formatter().set_scientific(False)
    ax.ticklabel_format(axis='y', style='plain')  # Ensure plain decimal format

    # Add legend
    plt.legend()

    # Display the chart
    plt.show()


data_tb_csv_widget = widgets.Text(value='1,10,100,1000', description='Data [TB]', placeholder='Add values, comma separated')
num_regions_widget = widgets.BoundedIntText(value=3, min=1, max=20, description='No. regions')
single_service_cost_widget = widgets.BoundedIntText(value=10_000, min=0, max=100_000, description='Service cost')
chart_log_scale_widget = widgets.Checkbox(value=True, description='Log scale (for the chart)')

display(Markdown('''
## Inputs

Adjust the values below and click "Run Interact" to run (or re-run) the calculation.

Parameters:

* Data [TB]: Amount of data in transfer values to run the calculation for, in TB, as a comma-separated list (e.g. `1,10,100,1000`).
* No. regions: Number of regions involved.
* Service cost: Monthly operation cost of a _single_ service, in USD.

Warning: The inputs may be locale-dependent, so you can try with a comma if a dot doesn't seem to work (`0,9` instead of `0.9`).
'''))

widgets.interact_manual(
    calculate,
    data_tb_csv=data_tb_csv_widget,
    num_regions=num_regions_widget,
    single_service_cost=single_service_cost_widget,
    log_scale=chart_log_scale_widget,
);


## Inputs

Adjust the values below and click "Run Interact" to run (or re-run) the calculation.

Parameters:

* Data [TB]: Amount of data in transfer values to run the calculation for, in TB, as a comma-separated list (e.g. `1,10,100,1000`).
* No. regions: Number of regions involved.
* Service cost: Monthly operation cost of a _single_ service, in USD.

Warning: The inputs may be locale-dependent, so you can try with a comma if a dot doesn't seem to work (`0,9` instead of `0.9`).


interactive(children=(Text(value='1,10,100,1000', description='Data [TB]', placeholder='Add values, comma sepa…