### Notebook 2: Check Correctness of the Different Implementations

This is the second notebook of the series. It does not perform any stencil calculations itself. Instead, it expects that all results have already been calculated and written to the `data` folder. If the first notebook ran successfully, this should be the case now.

In this notebook, we verify that all implementations are correct. To this end, each executable was run with a reference configuration, that is, a fixed combination of `nx`, `ny` and `num_iter` parameters. We expect the results to match. However, there will be small deviations due to rounding errors caused by different machine instructions generated by different compilers.

In [None]:
print('notebook_02: started.')

#### Helper Functions

First, we define a couple of convenient functions for later use.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

This function reads a 3D-field file. The binary formats used by the Fortran and C++ code are identical, so the same function can be used in both cases.

In [None]:
def read_field_from_file(filename):
    (rank, bits, h, x, y, z) = np.fromfile(filename, dtype=np.int32, count=6)
    if rank != 3:
        raise NotImplementedError
    offset = (3 + rank) * 32 // bits
    dtype = np.float32 if bits == 32 else np.float64
    field = np.fromfile(filename, dtype=dtype, count=x * y * z + offset)
    field = field[offset:]
    field = np.reshape(field, [z, y, x])
    field = np.moveaxis(field, [0, 1, 2], [2, 1, 0])
    return field

This function compares the values of the field along the z-axis. Since the initial data and the calculation are identical along the z-axis, we expect the same values after the diffusion has been applied. If there are differences, it means that either the calculation is not deterministic or there is a problem with the implementation.

In [None]:
def check_consistency_of_z_axis(field):
    if field.shape[2] == 1:
        return
    diff = np.diff(field, axis=2)
    diff = np.max(diff)
    if diff != 0.0:
        raise RuntimeError('validation failed')

This function first checks that the calculation is deterministic along the z-axis. It then removes all but the first z-component, as the others are redundant for further analysis.

In [None]:
def collapse_z_axis(field):
    check_consistency_of_z_axis(field)
    return field[:, :, 0]

This function calculates the difference between two fields and then aggregates the maximum error value along the z-axis.

In [None]:
def get_diff_two_fields(field1, field2):
    diff = np.abs(field1 - field2)
    diff = np.max(np.abs(diff))
    return diff

This function checks the absolute and relative error between two fields. We use the `allclose()` function from `numpy`, which is designed to do exactly this kind of comparison, where we expect small differences due to the different compilers used, but the error should not exceed a certain threshold.

Note that this function is not symmetric and the second parameter must be the reference value.

In [None]:
def validate_field(field, reference, rtol=1.e-5, atol=1.e-8):
    check = np.allclose(field, reference, rtol=rtol, atol=atol)
    if not check:
        raise RuntimeError('validation failed')
    return True

This function expects a 2-dimensional field and creates a simple plot.

In [None]:
def plot_field(field):
    fig, ax = plt.subplots(figsize=(6, 6))
    im = ax.imshow(field[:, :])
    fig.colorbar(im, ax=ax)
    plt.show()

#### Read Field from Files

In [None]:
print('notebook_02: reading result fields from files ...')

Now we read the binary files produced by the first notebook, which contain the fields calculated by the different implementations based on the reference parameters.

In [None]:
field_openacc = read_field_from_file('./data/field_openacc.fld')
field_cpp = read_field_from_file('./data/field_cpp.fld')
field_cuda_shared = read_field_from_file('./data/field_cuda_shared.fld')
field_cuda_noshared = read_field_from_file('./data/field_cuda_noshared.fld')

We expect that for each implementation the values are at least close along the z-axis, since the stencil computation is invariant along the z-axis. It turns out that all implementations are in fact deterministic, and the values are not only close, but identical. The following function checks this property and then drops all but one z-component.

In [None]:
field_openacc = collapse_z_axis(field_openacc)
field_cpp = collapse_z_axis(field_cpp)
field_cuda_shared = collapse_z_axis(field_cuda_shared)
field_cuda_noshared = collapse_z_axis(field_cuda_noshared)

#### Validate Results

We now have three 2-dimensional fields, one for each implementation. We first validate the result graphically.

In [None]:
print('notebook_02: creating plots ...')

We create a plot that shows all three fields side by side. This plot is for demonstration purposes only. Because certain bugs in the code can cause small discrepancies that are not visible in these plots, the actual validation must be done numerically.

In [None]:
fig, axs = plt.subplots(1, 4, figsize=(24, 6))
im0 = axs[0].imshow(field_openacc)
im1 = axs[1].imshow(field_cpp)
im2 = axs[2].imshow(field_cuda_shared)
im2 = axs[3].imshow(field_cuda_noshared)
axs[0].set_title('OpenACC')
axs[1].set_title('Comparison of Output Fields\n\nC++')
axs[2].set_title('CUDA Shared Memory')
axs[3].set_title('CUDA Direct Calculation')
fig.colorbar(im0, ax = axs, shrink=0.82, pad=0.02)
plt.show()

Let us first calculate the maximum absolute deviation between the fields. We use the C++ code as a reference and compare the other two against it.

In [None]:
diff_openacc = get_diff_two_fields(field_openacc, field_cpp)
diff_cuda_shared = get_diff_two_fields(field_cuda_shared, field_cpp)
diff_cuda_noshared = get_diff_two_fields(field_cuda_noshared, field_cpp)
print(f'notebook_02: maximal absolute deviation for OpenACC     : {diff_openacc:.2e}')
print(f'notebook_02: maximal absolute deviation for CUDA Shared : {diff_cuda_shared:.2e}')
print(f'notebook_02: maximal absolute deviation for CUDA Direct : {diff_cuda_noshared:.2e}')

Now we validate the result numerically. Since the calculation was done using 32-bit floats, we can expect an absolute error of at least the machine precision for 32-bit floats, which is $\varepsilon:=2^{-24}\approx6\cdot10^{-8}$. For certain calculations, the deviation can be as large as $\sqrt{\varepsilon}\approx2.4\cdot10^{-4}$. But all our tests have shown that the actual deviation for our stencil calculation is never larger than $10^{-7}$. So we set the value of `atol` to $10^{-7}$.

In [None]:
if validate_field(field_openacc, field_cpp, atol=1.e-7):
    print('notebook_02: OpenACC implementation verified.')
if validate_field(field_cuda_shared, field_cpp, atol=1.e-7):
    print('notebook_02: CUDA Shared implementation verified.')
if validate_field(field_cuda_noshared, field_cpp, atol=1.e-7):
    print('notebook_02: CUDA Direct implementation verified.')

In [None]:
print('notebook_02: completed.')