# Estimating Computational Resources for MPAS Simulations  

This notebook calculates the **computational resources required** for MPAS (Model for Prediction Across Scales) simulations based on user-defined parameters. It provides estimates for:  

- **Core-Hours per Time Step per Grid Point** – A metric to assess computational efficiency.  
- **Total Core-Hours for a Simulation** – The total computing cost based on simulatio


## Calculating Core-Hours per Time Step per Grid Point

This section focuses on determining the computational efficiency of our numerical simulations. We will calculate the core-hours consumed per time step per grid point, a metric that provides valuable insight into the performance of our atmospheric or climate models.

The `calculate_core_hours_per_time_step_per_grid_point` function below is designed to analyze simulation results and derive this efficiency metric. It is particularly relevant for simulations that utilize the MPAS (Model for Prediction Across Scales) mesh.

The calculations performed here are based on parameters similar to those used in the DYAMOND3 run conducted by Bill Skamarock. This run employed a 3.75-km MPAS mesh with 127 vertical levels and was executed on 200 Derecho nodes, each with 128 MPI processes. These predefined values can be used, or values from other simulations can be substituted. Using the parameters here, the computed value is approximately 2.56 × 10⁻⁹ core-hours per timestep. For comparison, in 2018 Michael Duda reported the Cheyenne computer's efficiency as approximately 3.95 × 10⁻⁹ core-hours per timestep.

We also define a dictionary, `mesh_dict`, that stores mesh configurations for various resolutions and refinement patterns. This dictionary will be used to access grid parameters for our calculations.

In [1]:
import numpy as np

In [2]:
def calculate_core_hours_per_time_step_per_grid_point(total_core_hours_used, time_step_seconds, num_grid_columns, num_vertical_levels, simulation_days):
    """
    Calculates the core-hours consumed per time step per grid point from simulation results.

    This function determines the computational efficiency of a simulation by calculating
    the core-hours used per time step per grid point. It is particularly useful for
    analyzing the performance of atmospheric or climate models, such as those based on
    the MPAS mesh.

    This calculation is based on parameters similar to the DYAMOND3 run conducted by
    Bill Skamarock, which utilized a 3.75-km MPAS mesh with 127 vertical levels,
    run on 200 Derecho nodes with 128 MPI processes per node.

    Args:
        total_core_hours_used: The total core-hours consumed by the simulation.
        time_step_seconds: The model time step in seconds.
        num_grid_columns: The number of horizontal grid columns in the MPAS mesh.
        num_vertical_levels: The number of vertical levels per grid column (e.g., 127).
        simulation_days: The total length of the simulation in days.

    Returns:
        The core-hours consumed per time step per grid point.
    """

    total_seconds = simulation_days * 24 * 3600
    total_time_steps = total_seconds / time_step_seconds
    total_grid_points = grid_columns * vertical_levels
    core_hours_per_time_step_per_grid_point = total_core_hours_used / (total_time_steps * total_grid_points)
    return core_hours_per_time_step_per_grid_point

mesh_dict = {
    "3km": [3, 65_536_002],  
    "3.75km": [3.75, 41_943_042],  
    "7.5km": [7.5, 10_485_762], 
    "15km": [15, 2_621_442],  
    "30km": [30, 655_362],  
    "120km": [120, 40_962],  
    "15-3km-ell": [3, 8_060_930], 
    "15-3km-cir": [3, 6_488_066] 
}

# mesh_dict: Dictionary storing MPAS mesh configurations.
# Keys: Mesh resolution or refinement pattern (e.g., "3km", "15-3km-ell").
# Values: Lists containing [grid spacing (km), number of horizontal grid columns].
# Refinement meshes ("15-3km-ell", "15-3km-cir") use 3km resolution within a larger 15km mesh.

dx = '3.75km'
total_core_hours_used_for_example = 236_000
time_step = 20  # seconds
grid_columns = mesh_dict['3.75km'][1]  # nCells in MPAS terminology
vertical_levels = 127                  # nVertLevels in MPAS terminology
simulation_length = 4  # days

core_hours_per_time_step_per_grid_point_calculated = calculate_core_hours_per_time_step_per_grid_point(
    total_core_hours_used_for_example, time_step, grid_columns, vertical_levels, simulation_length
)

print(f"Calculated core hours per time step per grid point : {core_hours_per_time_step_per_grid_point_calculated}")


Calculated core hours per time step per grid point : 2.5639208763925117e-09


## Calculating Total Core-Hours for a Numerical Simulation

This code cell calculates the total core-hours required for a numerical simulation based on provided parameters. It utilizes a function, `calculate_core_hours`, which takes into account the time step, grid dimensions, simulation duration, and a pre-calculated value representing the core-hours consumed per time step per grid point.

**Function: `calculate_core_hours`**

This function performs the following steps:

1.  **Calculates Total Simulation Time:** It converts the simulation length from days to seconds.
2.  **Calculates Total Time Steps:** It determines the total number of time steps by dividing the total simulation time by the model time step.
3.  **Calculates Total Grid Points:** It calculates the total number of grid points by multiplying the number of horizontal grid columns by the number of vertical levels.
4.  **Calculates Total Core-Hours:** It computes the total core-hours by multiplying the total time steps, total grid points, and the pre-calculated core-hours per time step per grid point.

**Example Usage:**

The code then demonstrates an example usage of the function. It sets the simulation parameters:

* `time_step`: The model time step in seconds (usually this is dx * 6).
* `dx`: The horizontal grid spacing in kilometers, retrieved from the `mesh_dict`.
* `grid_columns`: The number of horizontal grid columns, also retrieved from `mesh_dict`.
* `vertical_levels`: The number of vertical levels per grid column.
* `simulation_length`: The total simulation length in days.
* `core_hours_per_time_step_per_grid_point`: A pre-calculated value representing the computational efficiency.

The `calculate_core_hours` function is then called with these parameters, and the result is printed in a user-friendly format, including explanations of each parameter and the final core-hours calculation, rounded to two decimal places and expressed in millions of Derecho core-hours. The `numpy` library is used for the rounding.

In [3]:
import numpy as np  # Import numpy for rounding

def calculate_core_hours(time_step_seconds, num_grid_columns, num_vertical_levels, simulation_days, core_hours_per_time_step_per_grid_point):
    """
    Calculates the total core-hours required for a numerical simulation.

    This function computes the total core-hours needed based on simulation parameters,
    including the time step, grid dimensions, simulation duration, and a pre-calculated
    value for core-hours per time step per grid point.

    Args:
        time_step_seconds: The model time step in seconds.
        num_grid_columns: The number of horizontal grid columns.
        num_vertical_levels: The number of vertical levels per grid column.
        simulation_days: The total length of the simulation in days.
        core_hours_per_time_step_per_grid_point: The core-hours consumed per time step
            per grid point. This value should be pre-calculated using a method like
            `calculate_core_hours_per_time_step_per_grid_point` (see preceding notebook cell).

    Returns:
        The total core-hours required for the simulation.
    """

    # Calculate total seconds in the simulation
    total_seconds = simulation_days * 24 * 3600

    # Calculate total number of time steps
    total_time_steps = total_seconds / time_step_seconds

    # Calculate total number of grid points
    total_grid_points = num_grid_columns * num_vertical_levels

    # Calculate total core hours
    total_core_hours = total_time_steps * total_grid_points * core_hours_per_time_step_per_grid_point

    return total_core_hours

# Example usage:
mesh =  "3.75km" 
time_step = 20  # seconds
dx = mesh_dict[mesh][0]
grid_columns = mesh_dict[mesh][1]
vertical_levels = 127
simulation_length = 356  # days
core_hours_per_time_step_per_grid_point = 2.5639208763925117e-09  # Taken from example above, can be made a variable: core_hours_per_time_step_per_grid_point_calculated

total_core_hours = calculate_core_hours(time_step, grid_columns, vertical_levels, simulation_length, core_hours_per_time_step_per_grid_point)

print('\n---- Total core hours needed for the following simulation parameters: ----')
print(f'dx (horizontal grid spacing of MPAS mesh) = {dx} km')
print(f'nCells (number of horizontal grid columns) = {grid_columns}')
print(f'dt (time step) = {time_step} s')
print(f'simulation length = {simulation_length} days')
print('--------')
print(f'{np.round((total_core_hours / 1.E6), 2)} Million Derecho core-hours')


---- Total core hours needed for the following simulation parameters: ----
dx (horizontal grid spacing of MPAS mesh) = 3.75 km
nCells (number of horizontal grid columns) = 41943042
dt (time step) = 20 s
simulation length = 356 days
--------
21.0 Million Derecho core-hours


## Calculating Simulation Storage Requirements

This code cell calculates the estimated disk space needed to store the output from a numerical simulation, separating the requirements for 2D and 3D variables.

**Variables:**

* `nvars_2d`: The number of 2-dimensional variables being saved.
* `nvars_3d`: The number of 3-dimensional variables being saved.
* `outputs_per_day_2d`: The number of times 2D variables are outputted per day.
* `outputs_per_day_3d`: The number of times 3D variables are outputted per day.
* `nCells`: The number of horizontal grid columns (assumed to be defined in a previous cell).
* `num_vertical_levels`: The number of vertical levels of the 3D grid (assumed to be previously defined).
* `length_in_days`: The total length of the simulation in days (assumed to be defined in a previous cell).

**Calculations:**

* **2D Storage:** The storage requirement for 2D variables is calculated by multiplying the number of 2D variables, the number of horizontal grid columns, the simulation length, and the output frequency, and then multiplying by 32 (bits) to account for single-precision floating-point storage.
* **3D Storage:** The storage requirement for 3D variables is calculated similarly, but also includes the number of vertical levels.
* **Conversion to Terabytes (TB):** The resulting storage values (in bits) are converted to terabytes by multiplying by `1.25e-13`.

**Output:**

The code then prints the estimated disk space needed for both 2D and 3D fields, rounded to the nearest terabyte.

**Note:**

This calculation assumes that all variables are stored in single-precision (32-bit) floating-point format. Ensure that `nCells`, `num_vertical_levels`, and `length_in_days` are defined in the notebook before running this cell.

In [7]:
nvars_2d = 15  # Number of 2-dimensional variables
nvars_3d = 9   # Number of 3-dimensional variables
outputs_per_day_2d = 24  # Number of 2D output times per day
outputs_per_day_3d = 8   # Number of 3D output times per day

# Assuming grid_columns, vertical_levels, and simulation_length are defined from previous cells.
storage_2d = (nvars_2d * grid_columns * simulation_length * outputs_per_day_2d) * 32  # 32 bits (single precision)
storage_3d = (nvars_3d * grid_columns * vertical_levels * simulation_length * outputs_per_day_3d) * 32 

print('Storage needed for 2D fields:', round(storage_2d * 1.25e-13, 2), 'TB')
print('Storage needed for 3D fields:', round(storage_3d * 1.25e-13, 2), 'TB')
print('--------')
print(f'Total storage required: {np.round((storage_2d * 1.25e-13 +storage_3d * 1.25e-13), 2)} TB')

Storage needed for 2D fields: 21.5 TB
Storage needed for 3D fields: 546.14 TB
--------
Total storage required: 567.64 TB
