# Fortran refresher

## Introduction

Fortran was developed in the 1950's as a purpose built language for STEM applications. It is a good language to use for high performance computing because it is **easy** to understand, highly performant, and comes with a first-rate multi-dimensional array implementation. Furthermore, the language has access to both thread-level and accelerator-assisted paralleism via extensions such as OpenMP and OpenACC, as well as access to process-level parallelism with a message passing library.

It is true that older versions of the Fortran standard (i.e Fortran 77 and earlier) have archaic features and idiosyncrasies that may discourage folks with training in modern coding techniques. However, like other standard programming languages such as C++, the standard is under active development, and modern Fortran programs can even be expressed in object-oriented coding styles. Some of the benefits include:

* Longevity, Fortran will be around as a standard for a long time.
* First-rate multi-dimensional array implementation. Arrays of up to seven dimensions are supported, with easy **numpy-like** array access.
* High performance. Compilers can optimize Fortran code so that it runs very quickly.
* Compiler-optimized memory management. Compilers are free to organise and manage memory allocations in optimal ways.

Some potential drawbacks include:

* Objects are passed to subroutines and functions **by reference** as the default. This potentially has impacts for memory safety.
* Poor IO functionality, string handling and concatenation.
* Memory alignment complexities when interfacing Fortran with C
* No central website for the Fortran standard. The site [Fortran Lang](https://fortran-lang.org/) is a good resource for news and updates on Fortran.
* Finding expertise in Fortran is difficult

## Teaching method

This teaching module does not aim to provide an **exhaustive** introduction to Fortran. Instead we aim to cover concepts that are helpful when using Fortran code with libraries such as the [Hipfort](https://rocm.docs.amd.com/projects/hipfort/en/latest/) interface to AMD's [HIP](https://rocm.docs.amd.com/projects/HIP/en/latest/) GPU library and runtime. We do this using a simple tensor addition example. If **A**, **B**, and **C** are tensors of rank 1, then at each index $i$ the following equation holds true:

$$
\textbf{C}(i) = \textbf{A}(i) + \textbf{B}(i)
$$

We solve this problem a number of ways by example, using Fortran programs that progressively demonstrate features available to the Fortran language. Programmers from other languages will then readily be able to map concepts to their language of expertise. We choose to follow the modern 2008 standard and disregard archaic language features such as fixed source form. 

## Example code

In the program [tensoradd_simple.f90](tensoradd_simple.f90) is a simple example of a Fortran program that peforms tensor addition. Let's open the file by clicking the link above and go through the source line by line. 

## Program

Every Fortran program has one (and only one) **program** construct followed by the name of the program. 

```Fortran
program tensoradd
```

at the end of the program there is a corresponding **end** statement to signify the end of the program. 

```Fortran
end program tensoradd
```

It is good practice to also **include the name** of that which is being ended, in this case we are ending **program tensoradd**.

## Comments

Comments in Fortran start with a `!`, anything beyond the `!` is normally disregarded by the compiler if it doesn't form part of some directive like an OpenMP or OpenACC construct. The [FORD](https://forddocs.readthedocs.io/en/latest/) documentation generator uses a double `!!` to signify comments that should be in the documentation. 

```Fortran
   !! Program to compute a 1D tensor addition
   !! Written by Dr. Toby Potter and Dr. Joseph Schoonover
```

Other comments are to provide assistance to other programmers, most likely your **future self**. In literate programming, attributed to [Donald Knuth](https://en.wikipedia.org/wiki/Literate_programming), it is useful to think of comments as an explanation to the compiler of what the comment is saying.

## Import external modules with `use`

A `module` is a collection of variables and routines that can be imported from any program, function, or subroutine. The `use` statement can bring in functions and variables that are defined in Fortran modules. Here we put a `use` statement to import the module `iso_fortran_env`, a well known and helpful library of useful data types and definitions.

```Fortran
   ! Add this to use the standard fortran environment module
    use iso_fortran_env
```

We will discuss constructing modules further in a later topic.

## Variable names

In Fortran all variables must be declared **before any other statements**. This means declaring all the variables we are going to use at the beginning, as opposed to when they are needed. Variable names must begin with a letter and may contain letters `[a-zA-Z]`, numbers `[0-9]`, or an underscore `_`. Fortran is also **case insensitive** by default, meaning that uppercase and lowercase code is interpreted the same way unless you set a compiler flag.

## Implicit (weak) typing

By default Fortran also has implicit typing. This means that if a variable starts with `i, j, k, l, m, n` then a variable is interpreted as a (32-bit) integer, otherwise it is interpreted a (32-bit) floating point number. It is good practice to make sure that no implicit typing occurs with the `implicit none` statement.

```Fortran
    ! Add this to make sure that all variables must be declared
    ! and the compiler performs no type inferencing based on the 
    ! on the first letter of variable names
    implicit none
```

## Basic Fortran data types

There are a number of basic data types in Fortran, shown in the table below. Each data type has a default number of bytes that it uses, but has options for varying the number of bytes employed. 

| Data type | default number of bytes |
|:--:|:--:|
| integer | 4 |
| real | 4 |
| complex | 2x4 |
| logical | 4 |
| character | 1 |

Compiler flags can be used to vary the number of bytes used as the default for integers and reals, however this is **not a good idea** when data is being transferred between systems, such as to a GPU, or between routines that are written and compiled in another language. For these instances it is better to use predetermined types with a fixed number of bytes. The [iso_fortran_env](https://fortranwiki.org/fortran/show/iso_fortran_env) module that we included at the beginning defines a number of standard data types with a **fixed number of bytes**. It is advisable to use these data types, particularly when having a standard number of bits to represent your data is a priority.

## Declaring variables

All variables must be declared after any `use` or `implicit none` statements, and before any other statements. Declaring a variable is done by specifying the variable type, with any options followed by `::` then the variable name and an optionally an equals sign `=` for variable initialisation. Here we create an integer `N` with a value of 16. The option `parameter` tells the compiler that the variable is fixed as a parameter and is effectively read-only.

```Fortran
integer, parameter :: N=16
```

Similarly, we define a `eps_mult` as a real with a value of 2.0.

```Fortran
real :: eps_mult = 2.0
```

Logical values are either `.true.` or `.false.` the value `success` contains wether or not the validation was successful. Here we set it to `.true.` unlesss proven otherwise.

```Fortran
logical :: success = .true.
```

### Static arrays

Arrays/tensors may be declared **statically** on the stack or allocated **dynamically** on the heap. Arrays of up to 7 dimensions are specified by putting comma-separated tuple of dimensions after the variable name. Here we declare a character array (string) of 10 characters to hold the base filename.

```Fortran
    ! Base filename
    character(len=10) :: bname = 'array_'
```

Then we declare 3 arrays **A_h**, **B_h**, and **C_h** of type **real**, each with 1 dimension and a length of N.

```Fortran
    ! Declare the tensors to use
    ! Memory for these will be allocated on the stack
    real :: A_h(N), B_h(N), C_h(N)
```

#### Array indexing convention

Fortran array indices start at 1 by default. These can be modified at declaration/allocation by using a colon `:` to specify the lower and upper bounds of each dimension of the array. For example, in order for the tensor indices to start with index 0 we could have done this instead.

```Fortran
    ! Declare the tensors to use
    ! Memory for these will be allocated on the stack
    real :: A_h(0:N), B_h(0:N), C_h(0:N)
```

## Call functions and subroutines

Fortran has **functions** and **subroutines**. Functions take any number of input variables and return 1 output variable. The output of functions can be used on the right hand side of an assignment statement. For example, the call to the built-in **spacing** function returns the value of floating point spacing from the input floating point value to the next value. Here we use it in the validation step to help calculate the upper and lower bounds on a validation result.

```Fortran
upper = scratch + eps_mult*spacing(abs(scratch))
lower = scratch - eps_mult*spacing(abs(scratch))
```

**Subroutines** are like functions but they do not return a value. Here we use the built-in subroutine **random_number** to fill the tensors **A_h** and **B_h** with random numbers. All subroutines are called with the `call` statement.

```Fortran
    ! Fill arrays with random numbers using the
    ! Fortran intrinsic function "random_number"
    call random_number(A_h)
    call random_number(B_h)
```

> Note: For research applications of random number generation it is recommended to investigate the random number generator further and make sure that you have a high quality one.

Variables passed to functions and subroutines are generally passed **by reference** as the default. This means a reference/pointer to the variable is passed in and it is possible to modify the input arguments from the function or subroutine. For example the call to **random_number** above modifies the tensors **A_h** in **B_h** in place. To avoid ambiguity it is **best practice** to design functions so they do not modify the input arguments and use subroutines when the contents of input arguments are to be modified.

## Do loops and array access

Loops are made using the `do` construct. Here we loop over `i` from 1 to N inclusive and set elements in **C_h** using elements in **A_h** and **B_h**. Indexing into arrays is done by appending a comma-separated tuple of coordinates to the array name.

```Fortran
do i=1,N
    ! Kernel math
    C_h(i) = A_h(i) + B_h(i)
end do
```

Unlike C, which allows the loop variable to be declared in the loop construct we must declare all variables at the beginning of the program, function, or subroutine. For non-deterministic loops there is the corresponding `do while` loop. The above could then be expressed as:

```Fortran
i=1
do while (i<=N)
    ! Kernel math
    C_h(i) = A_h(i) + B_h(i)
    i = i + 1
end do
```

In Fortran we can also perform bulk operations on portions of arrays using the colon operator `:` in a way that is similar to Numpy. We can achieve the same operation on the tensors using this method: 

```Fortran
    ! Could also do it this way, (best practice)
    C_h(:) = A_h(:) + B_h(: )
```

Or we can even do it this way, though for readability it is not preferred because we lose a sense of what rank the tensors are.

```Fortran
    ! Or even this way (not best practice)
    C_h = A_h + B_h
```

### Loop nests and multi-dimensional indexing

Loops can of course be nested. Nesting loops is a common way to cover multi-dimensional arrays. If this was a two dimensional problem it would look something like this:

```Fortran
do j=1,M
    do j=1,N
        ! Kernel math
        C_h(i,j) = A_h(i,j) + B_h(i,j)
    end do
end do
```

Or using indexing operations the above can be written as:

```Fortran
C_h(:,:) = A_h(:,:) + B_h(:,: )
```

#### Array ordering

Arrays in Fortran are arranged in **column-major** format. This means that array elements are **contiguous** along the **first** dimension of the array. In CPU code this means that you will be able to make best re-use of cache lines if the **innermost loop** traverses the first dimension of the arrays. In GPU code it is good for performance to make sure that **neighbouring threads** (usually along the first dimension of the Grid) are associated with neighbouring elements along the contiguous dimension of an array. In kernel code on a GPU you are free to then iterate along any of the other array dimensions, and neighbouring threads can share cache lines read in from memory. 

## Code validation

It is **vital** to have checks and comprehensive tests to make sure your code is running correctly. One way we to do this is to loop over the elements of the computed solution and make sure that it is within an accepted range of a **known solution**. Here we specify that the computed answer must lie within `eps_mult` floating point representations of the validation answer.

```Fortran
    ! Check the answer
    do i=1,N

        ! Compute the answer on the CPU
        scratch = A_h(i) + B_h(i)

        ! Get upper and lower bounds on the computed solution
        ! the "spacing" intrinsic function gets the 
        ! floating point spacing from one number to the next
        upper = scratch + eps_mult*spacing(abs(scratch))
        lower = scratch - eps_mult*spacing(abs(scratch))

        ! Check to see if the number 
        ! is in floating point range of the answer
        if  ( .not. ((lower <= C_h(i)) .and. (C_h(i) <= upper))) then
            ! Demonstrate line continuation
            write(*,*) 'Error, calculated answer at index i = ', &
                i, ' was not in range'
            success = .false.
        end if

    end do
```

This step is a little redundant on CPU code but it will be useful when we perform the tensor addition on another device such as a GPU. 

## If statements and flow control 

The **if**, **then**, **else if**, **else**, **end if** sequence help with making decisions in code. Here we use the logical **.not.** , **.and.** operators in combination with an **if** statement to determine if **C_h(i)** is in the range (lower, upper) inclusive. The boolean expression for any **if** statement must be enclosed in parentheses `()`.

```Fortran
    if  ( .not. ((lower <= C_h(i)) .and. (C_h(i) <= upper))) then
        ! Demonstrate line continuation
        write(*,*) 'Error, calculated answer at index i = ', &
            i, ' was not in range'
        success = .false.
    end if
```

## IO in Fortran

### Printing to standard output

Sometimes we just need to print a value in a Fortran program or write something to disk for validation purposes. The **print** function can write to standard output. We use it in the code at the end:

```Fortran
    print *, 'Tensor addition passed validation.'
```

You can specify a format string instead of the `*` and have any number of comma separated variables following it. The format string determines how the following variables are printed, but working through all of those options is left to other reference material. The default format string `*` is usually good enough for debugging purposes.

Another function that is useful for IO is **write**. This can output to any number of outputs including files and standard output. Write takes two arguments. The first is the destination for the output and the second is the format string. Using a `*` for the write destination is equivalent to **standard output**. Using `*` for the format string is the default formatter, which is good enough for debugging purposes. In the file [tensoradd_simple.f90](tensoradd_simple.f90) we use `write` to print an error string in case the program did not pass validation.

```Fortran
write(*,*) 'Error, calculated answer at index i = ', &
                i, ' was not in range'
```

### Printing to standard error

The `iso_fortran_env` module that we included at the start of the program gives us the io units `input_unit`, `output_unit`, `error_unit` for standard input, standard output, and standard error. For example, instead of standard output `*` we can write to `error_unit` instead.

```Fortran
write(error_unit,*) 'Error, calculated answer at index i = ', &
                i, ' was not in range'
```

### Line continuation

You may have noticed with the **write** statements above that we continued a line with the `&` character. Unlike C and C++, Fortran doesn't have an end of statement character such as `;` in C, so by default the compiler interprets a new line to mean a new statement. You can modify this and spill a statement to the next line by using the **&** line continuation character.

### File IO

Sometimes you might need to write variables to disk. Since compute architectures differ in the way they represent numbers, it is **never good practice** to share data using a simple binary dump of variables to file storage. For sharing data use a self-describing file format like [HDF5](https://docs.hdfgroup.org/archive/support/HDF5/doc/fortran/index.html). For diagnostic and debugging purposes dumping program data straight to a file may be helpful. Fortran uses **integers as file handles**, and here we **open** three files for writing the tensors **A_h**, **B_h**, and **C_h** to file storage in binary format.

```Fortran
    ! Open the files for writing
    open(10, file=trim(bname)//'A_h.dat', &
        form='unformatted', status='new', access='stream')
    open(11, file=trim(bname)//'B_h.dat', &
        form='unformatted', status='new', access='stream')
    open(12, file=trim(bname)//'C_h.dat', &
        form='unformatted', status='new', access='stream')
```

The `file` option describes the file to open. The `trim` command returns a copy of `bname` without any trailing blanks and we use the string concatenation operator `//` to construct a filename to output for each array. The further options `form='unformatted'` and `access='stream'` are equivalent to a binary file stream in C++, and the `status='new'` option ensures we don't overwrite an existing file. Once the files are open we can use `write` to write a copy of the arrays to file storage.

```Fortran
    ! Write the contents of the arrays to the open files 
    write(10) A_h(:)
    write(11) B_h(:)
    write(12) C_h(: )
```

Once the write is complete it is **good practice** to then close the open files. 

```Fortran
    ! Close the files
    close(10)
    close(11)
    close(12 )
```

## Run the example

The example application should have already been compiled. Here we source the `env` script to set paths and then run the `tensoradd_simple` application. Notice the creation of the `.dat` files during execution of the program.

In [1]:
!source ../env; tensoradd_simple
!ls

 Tensor addition passed validation.
array_A_h.dat	       CMakeLists.txt		  tensoradd_function.f90
array_B_h.dat	       Exercise.ipynb		  tensoradd_module.f90
array_C_h.dat	       Fortran.ipynb		  tensoradd_pointer.f90
c_functions.cpp        README.md		  tensoradd_simple.f90
chessboard_answer.f90  tensoradd_allocatable.f90  tensor_lib_c.f90
chessboard.f90	       tensoradd_cfun.f90	  tensor_lib.f90


In [2]:
!rm *.dat

## Dynamic memory allocation

Thus far we have been allocating storage for tensors **A_h**, **B_h**, and **C_h** statically. This memory is allocated on the stack when parts of the program runs where that memory is declared. By default this area of memory storage is quite small in comparison to the available memory size, and the size of static allocations **must be known** at compile time. Another area in which memory can be allocated is the heap. With this form of storage, memory can be dynamically allocated and de-allocated from a much larger pool, and the size of the allocations don't need to be determined at compile time.

### Allocatable arrays

One way to allocate arrays dynamically is to define arrays with the `allocatable` option and defer the size to a later allocation. 

#### Example program with allocatable arrays

In the file [tensoradd_allocatable.f90](tensoradd_allocatable.f90) is tensor addition performed with allocatable arrays. We declare **A_h**, **B_h**, and **C_h** as allocatable arrays with this line of code:

#### Declaration

```Fortran
! Define allocatable arrays for the tensors
! Memory for these will be allocated on the heap
real, allocatable, dimension(:) :: A_h, B_h, C_h
```

The `dimension` directive specifies how many dimensions there are in the array. A single colon for a dimension means that we defer the size of the array to later. If the arrays had more dimensions, (i.e 2) we just put more colons `:` into the dimension option, like this:

```Fortran
! Define allocatable arrays for the tensors
! Memory for these will be allocated on the heap
real, allocatable, dimension(:,:) :: A_h, B_h, C_h
```

#### Allocation

The tensors may then be dynamically allocated using the `allocate` statement. Here we allocate memory for tensors **A_h**, **B_h**, and **C_h** in one go.

```Fortran
   ! Allocate tensors on the heap and check for errors
    allocate(A_h(N), B_h(N), C_h(N), stat=ierr)
```

If we wanted to change the upper and lower bounds we can set them in the allocate statement in a similar way that we did with static arrays.

```Fortran
   ! Allocate tensors on the heap and check for errors
    allocate(A_h(0:N), B_h(0:N), C_h(0:N), stat=ierr)
```

Higher dimensional arrays are allocated by comma separating each dimension. If they were two dimensional then allocation looks something like this.

```Fortran
   ! Allocate tensors on the heap and check for errors
    allocate(A_h(M,N), B_h(M,N), C_h(M,N), stat=ierr)
```

Allocation may fail for reasons like there was not enough memory available, so it is always a good idea to check the status of the allocation. Note that we define the integer `ierr` and pass it in to `allocate` through the `stat` option. Then we can check for a bad allocation and `stop` the program if it failed.

```Fortran
    if (ierr /= 0) then
        write(*,*) 'Error, array allocation failed with error code = ', ierr
        stop 
    end if
```

#### Cleanup

When an array is allocated on the heap it is always good practice to de-allocate  (clean it up) when it is no longer needed.

```Fortran
! Always free heap memory when you no longer need it
deallocate(A_h, B_h, C_h)
```

#### Run tensoradd_allocatable

If we run the application `tensoradd_allocatable` we get the same answer as before:

In [3]:
!source ../env; tensoradd_allocatable

 Tensor addition passed validation.


### Fortran pointers

Fortran also has a pointer. Pointers in Fortran are declared with the `pointer` option in addition to a data type. Unlike void pointers in C, Fortran pointers must only "point" to objects of the same data type, and there is a restriction in that pointers can only "point" to objects with the `target` attribute set as well as memory pointed at by other pointers. Using Fortran pointers is usually not best practice because their misuse can introduce memory safety bugs, however they are useful for interopability with C code in libraries such as HIP. 

Using the `allocate` command we can allocate memory through a Fortran pointer and treat the pointer as an array. This memory automatically has the `target` attribute set. 

#### Example program with pointers

In the code [tensoradd_pointer.f90](tensoradd_pointer.f90) we declare **A_h**, **B_h**, and **C_h** as pointers to one dimensional allocations of memory as follows:

#### Declaration

```Fortran
! Define pointers to memory, initialise to null() for safety
real, pointer, dimension(:) :: A_h => null(), B_h => null(), C_h => null()
```

When pointers don't need to point to something it is good practice to set them to null. Here we initialize the pointers to `null()` at declaration.

#### Allocation

The `allocate` statement can allocate memory for a pointer. As before we use the `allocate` command to allocate arrays of `N` reals for **A_h**, **B_h**, and **C_h**.

```Fortran
! Allocate tensors on the heap and check for errors
allocate(A_h(N), B_h(N), C_h(N), stat=ierr)

if (ierr /= 0) then
    write(*,*) 'Error, array allocation failed with error code = ', ierr
    stop 
end if
```

These pointers now "point" at or are associated with the array allocations. We can use the `associated` function to check if a pointer is pointing to something.

```Fortran
if (associated(A_h)) then
    print *, "A_h is associated"
end if
```

Once the alloat

#### Pointer remapping

A powerful feature of Fortran pointers is that they can be used to "point" at a portion of a 1D allocation and even "upgrade" the dimensions and bounds of that array access. This is called **pointer remapping**. Remapping targets can only be of rank 1 so that contiguous access is upheld. We demonstrate this in [tensoradd_pointer.f90](tensoradd_pointer.f90) by creating a 2D pointer `D_h`.

```Fortran
   ! Demonstrate pointer remapping with D_h
    real, pointer, dimension(:,:) :: D_h
```

Then we can point the 2D pointer at a 1D allocation. In doing so we must set the bounds of the allocation. With the code below we change the bounds of `D_h` to be (1:N/2, 1:N/2) and point it to the same memory pointed to by `A_h`:

```Fortran
    ! Demonstrate pointer remapping.
    ! Point a 4x4 2D pointer at the allocated memory
    D_h(1:(N/4), 1:(N/4)) => A_h
```

Access to `D_h` is then treated the same way as any array of size (N/2, N/2). Next, we again change the size of the 2D pointer to 2x2 and point it at the last four elements of the `A_h` allocation.

```Fortran    
    ! Point a 2D pointer of size 2x2 
    ! at the last 4 elements of A_h
    D_h(0:1, 0:1) => A_h((N-4):N)
```

When performing pointer remapping one has to be careful that the mapped pointer does not access memory that is beyond the allocation of the target object.

#### De-allocation

As with allocatable arrays we can use the `deallocate` function to release memory that the pointers **A_h**, **B_h**, **C_h** are associated with.

```Fortran
! Always free heap memory when you no longer need it
deallocate(A_h, B_h, C_h)
```

#### Safety issues with pointers

As mentioned before, using pointers carries the risk of introducing **memory safety bugs**. For example once we deallocated **A_h**, then access via the associated pointer **D_h** becomes undefined because the memory it points to **no longer exists**. If we tried to use **D_h** after the underlying memory allocation disappears, then it would result in a memory access violation. If we tried to deallocate `D_h` after `A_h` then it would result in an error. Lets imagine that this code has been worked on for many years and has gotten rather complex. What function or subroutine is responsible for deallocating **A_h**?. When would we know that **A_h** is no longer safe to use? We could have also repointed **A_h** at something else, leaving the original allocation there but no longer accessible. Allocated memory that is no longer accessible is known as a **memory leak**.

For these reasons it is best practice to use Fortran pointers only when they are needed, and try to write your code so that dynamic memory allocation/deallocation is performed only in a specific part of your code.

#### Run tensoradd_pointer

Run the compiled application `tensoradd_pointer` and check that it passes validation.

In [1]:
!source ../env; tensoradd_pointer

../env: line 26: module: command not found
 Tensor addition passed validation.


## How to create functions and subroutines

Functions and subroutines allow for software reuse. 

### Example program with functions

In the code [tensoradd_function.f90](tensoradd_function.f90) we move the processes of running tensor addition and checking the answer to a subroutine called **launch_kernel** and a function called **check**. These functions may be in another source file, however we have chosen to include them at the beginning of the file [tensoradd_function.f90](tensoradd_function.f90). 

### Functions

Functions are intended to take any number of arguments and produce one argument as the result. It is good practice to build functions so they **do not modify** their input arguments. Here we have a function that takes as arguments the pointers `A`, `B`, `C`, the integer `N`, and the epsilon multiplier `eps_mult` and produces a `result` or return value in the variable `success`, indicating wether or not validation was successful.

```Fortran
function check(A, B, C, N, eps_mult) result(success)
    !! Function to check to if a tensor addition operation was successful

    real, pointer, dimension(:), intent(in) :: A, B, C
        !! Pointers to memory passed in

    integer, intent(in) :: N
        !! N is the total number of elements

    real, intent(in) :: eps_mult
        !! Epsilon multiplier, how many floating point spacings
        !! can the computed answer be from our benchmark answer?

    ! Scratch variables
    real :: scratch, upper, lower

    ! Loop index
    integer  :: i

    ! Set the return type of the function
    logical :: success = .true.

    ! Loop over all indices and check tensor addition
    do i=1, N
        scratch = A(i) + B(i)
        ! Spacing is an intrinsic function to get the spacing from
        ! one floating point representation to the next
        upper = scratch + eps_mult*abs(spacing(scratch))
        lower = scratch - eps_mult*abs(spacing(scratch))
        if (.not. ( (lower<=C(i)) .and. (C(i)<=upper) ) ) then
            write(*,*) "Error, tensor addition did not work at index = ", i
            success = .false.
            return
        end if
    end do

    ! We got to here because we didn't return on failure
    write(*,*) 'Tensor addition passed validation.'
    
end function check
```

Functions are defined by the `function` keyword, a comma-separated tuple of arguments, and a `result` statement containing the return variable.

```Fortran
function check(A, B, C, N, eps_mult) result(success)
```

The data type of each argument **must be declared**. Here we declare the data types of the arguments.

```Fortran
    real, pointer, dimension(:), intent(in) :: A, B, C
        !! Pointers to memory passed in

    integer, intent(in) :: N
        !! N is the total number of elements

    real, intent(in) :: eps_mult
        !! Epsilon multiplier, how many floating point spacings
        !! can the computed answer be from our benchmark answer?
```

Arguments in Fortran are generally passed **by reference**, so modifying an argument within a function or subroutine will modify the variable passed in. Notice the `intent` option on each argument declaration. The `intent` option has three choices, `in`, `out`, and `inout` to signify that the variable passed is intended to be read-only, write-only, or read-write. It is **good practice** to provide the intent option on arguments, not only for readability but for compiler optimization as well.  

During the execution of a function the result variable **must** be declared and then set when the function is called. Within the function we define `success` as a logical. The variable is set to `.true.` by default and then set to `.false.` if validation fails at any point i along **C**.

```Fortran
    ! Set the return type of the function
    logical :: success
    success = .true.

    ! Loop over all indices and check tensor addition
    do i=1, N
        scratch = A(i) + B(i)
        ! Spacing is an intrinsic function to get the spacing from
        ! one floating point representation to the next
        upper = scratch + eps_mult*abs(spacing(scratch))
        lower = scratch - eps_mult*abs(spacing(scratch))
        if (.not. ( (lower<=C(i)) .and. (C(i)<=upper) ) ) then
            write(*,*) "Error, tensor addition did not work at index = ", i
            success = .false.
            return
        end if
    end do
```

Functions must end with the `end function` keyword. It is good practice to also include the name of that which is being ended. 

### Subroutines

Subroutines are similar to functions, however they have **no return value** and are intended to be used for situations where the input arguments may be modified. In the file [tensoradd_function.f90](tensoradd_function.f90) we define a subroutine to perform tensor addition (kernel) math over every point in the input arrays **A** and **B**, putting the result in **C**. The rules and best practices for subroutine arguments are similar to that of functions, however we do not need to be concerned about declaring and setting a return type.

```Fortran
! Run kernel math to perform tensor addition at 
! every point in input tensors A and B. 
! Put the result in C
subroutine launch_kernel(A, B, C, N)
    !! Run a tensor addition kernel over the elements of A, B, and C

    real, pointer, dimension(:), intent(inout) :: A, B, C
        !! Pointers to memory allocations

    integer, intent(in) :: N
        !! Total length of the tensors

    integer :: i
        !! Index into tensors

    ! Now run the kernel math at every point in the array
    do i=1,N
        C(i) = A(i) + B(i)
    end do
    
end subroutine launch_kernel
```

### Interfaces

Interfaces are usually only necessary when an external function or subroutine (such as a C function) is being called from a Fortran program. However since we **pass Fortran pointers** to `check` and `launch_kernel`, and these are defined **outside the program** we also need to define an `interface` to them from within the body of the program. An interface is just the definition of what the functions and subroutines are, as well as the data types of the arguments and function return types. In the file [tensoradd_function.f90](tensoradd_function.f90) the interface for `check` and `launch_kernel` is declared along with variable declarations.

```Fortran
    interface
        subroutine launch_kernel(A, B, C, N)
            real, pointer, dimension(:), intent(in) :: A, B, C
            integer, intent(in) :: N
        end subroutine launch_kernel

        function check(A, B, C, N, eps_mult) result(success)
            real, pointer, dimension(:), intent(in) :: A, B, C
            integer, intent(in) :: N
            real, intent(in) :: eps_mult
            logical :: success
        end function check
    end interface
```

Constructing an interface for external function is rather tedious and error prone. We could have avoided this by placing the code for `launch_kernel` and `check` after a `contains` statement at the end of the main program.

```Fortran
    contains

    ! The subroutine "launch_kernel" and the checking function "check"
    ! could also have gone here after the "contains" statement.
    ! Then the interface would not be required

end program tensoradd
```    

### Calling a function and subroutine

Once this is complete then from the main program we can call the `launch_kernel` subroutine to run tensor addition kernel math over the array allocations and run the `check` function to make sure the kernel passes validtion.

```Fortran
! Call the tensor addition kernel for each element of the array
call launch_kernel(A_h, B_h, C_h, N)

! Call the check function to check the answer
success = check(A_h, B_h, C_h, N, eps_mult)
```

### Run the example code

Run the compiled application `tensoradd_function` and check that it passes validation.

In [5]:
!source ../env; tensoradd_function

 Tensor addition passed validation.


## Modules

Modules are a way to keep data and the procedures that work on the data contained in one location of your code. A module may be made available to any running part of the program, so the variables and routines in a module are global in nature. Modules can be defined in any file. 

### Example program with modules

In [tensor_lib.f90](tensor_lib.f90) is a Fortran module that contains the tensors **A_h**, **B_h**, and **C_h** and everything needed to work on those tensors. If you open that file we can go through it line by line:

### Definition

Modules begin with the `module` statement followed by the name of the module:

```Fortran
module tensor_lib
    !! Library module to work with tensors
    !! Written by Dr. Toby Potter and Dr. Joseph Schoonover
```

As with subroutines and functions, a module is ended with the `end module` statement. 

```Fortran
end module tensor_lib
```

### Module variables

At the beginning of a module you can put any number of `use` statements followed by declarations of variables that are accessible to every function or subroutine in the module. Here we declare a logical variable `allocd` to hold wether or not the memory for **A_h**, **B_h** and **C_h** has been allocated.

```Fortran
    implicit none

    ! Have we already allocated memory?
    logical :: allocd = .false.

    ! Number of elements in the vectors
    integer :: N

    ! Pointers to memory on the host
    real, pointer, dimension(:) :: A_h => null(), B_h => null(), C_h => null()
```

### Make objects and routines private

By default, variables and routines in the module are publicly visible from the outside (i.e. when imported from other modules). Using the `private` statement we put any variables we'd like to protect after a `private` statement, like this:

```Fortran
    ! Declare private anything that we would not like 
    ! to make public
    private :: allocd, N, kernel
```

### Module functions and subroutines

Following a `contains` statement you are free to define any functions or subroutines that is part of the module. In the rest of the module we define:

* Function `check` to check the result of tensor addition
* Subroutine called `init_mem` to allocate memory for the tensors
* Subroutine `kernel` to perform kernel math at a single index `i` in the tensors **A_h**, **B_h**, and **C_h**
* Subroutine `launch_kernel` to launch an instance of `kernel` at every point along the tensors.
* Subroutine called `free_mem` to de-allocate memory in the tensors.

With this design, the code for allocating and de-allocating memory is in one place, and one can allocate and de-allocate memory as many times as needed.

### Use the created module

In the file [tensoradd_module.f90](tensoradd_module.f90) we import the module using the `use` statement. Here we use the `only` option to only import the `init_mem`, `init_mem`, `free_mem` and `launch_kernel` subroutines as well as the variables `A_h` and `B_h`. During import we can rename anything using the `=>` operator. Here we say that `init_mem` from the module must be used via the name `alloc_mem` in the program.

```Fortran
    ! The "only" helps to know where things came from
    ! can use the "=>" operator to use things in modules as something else
    ! use tensor_lib, init_mem => alloc_mem
    use tensor_lib, only : check, alloc_mem => init_mem, &
        free_mem, launch_kernel, A_h, B_h
```

Now each of the these subroutines and variables are available for use. The benefit of this is that the task of allocating, storing and deallocating is performed in one place in the module.

```Fortran
    ! Allocate memory 
    call alloc_mem(N)

    ! Fill arrays with random numbers using the
    ! Fortran intrinsic function "random_number"
    call random_number(A_h)
    call random_number(B_h)

    ! Run the kernel function over each element of the array
    call launch_kernel

    ! Check the answer
    success = check(eps_mult)

    ! Release resources
    call free_mem
```

### Run the example code

Run the compiled application `tensoradd_module` and check that it passes validation.

In [34]:
!source ../env; tensoradd_module

 Tensor addition passed validation.


## Integrate Fortran and C code

Integrating Fortran with C can be challenging, according to [this source](https://fftw.org/doc/Allocating-aligned-memory-in-Fortran.html#:~:text=7.5%20Allocating%20aligned%20memory%20in%20Fortran&text=Unfortunately%2C%20standard%20Fortran%20arrays%20do,the%20fftw_alloc_real%20and%20fftw_alloc_complex%20functions.),  Fortran arrays don't have any guarantee that they will be aligned to any multiple of bytes. Furthermore, aligned memory on the host is crucial when working with frameworks like OpenCL. Whenever you integerate Fortran and C code, it is then a good idea to have a C routine allocate memory with whatever alignment you need, then you can then wrap a Fortran pointer around the allocated memory.

### Example program that integrates Fortran and C

In the file [tensor_lib_c.f90](tensor_lib_c.f90) is a Fortran module that provides the same functions and subroutines as [tensor_lib.f90](tensor_lib.f90), namely:

* A function called `check` to check the result of tensor addition
* Subroutine called `init_mem` to allocate memory for the tensors
* Subroutine `kernel` to perform kernel math at a single index `i` in the tensors **A_h**, **B_h**, and **C_h**
* Subroutine `launch_kernel` to launch an instance of `kernel` at every point along the tensors.
* Subroutine called `free_mem` to de-allocate memory in the tensors.

However in the `tensor_lib` module in [tensor_lib_c.f90](tensor_lib_c.f90) we are using calls to C functions to provide all the backend functionality. In the file [c_functions.cpp](c_functions.cpp) are four C functions, three of which will be called from [tensor_lib_c.f90](tensor_lib_c.f90):

* **c_kernel** runs a kernel element $i$ in arrays `A`, `B`, and `C`
* **launch_c_kernel** runs c_kernel over every element in `A`, `B`, and `C`.
* **c_alloc** allocates memory that is aligned to a 128 Byte boundary.
* **c_free** frees memory that was previously allocated with **c_alloc**.

### Prepare C code

C code can be made visible to Fortran by wrapping it in an `extern "C"` code block like we have done in [c_functions.cpp](c_functions.cpp). This enables the function to have C linkage, meaning the function name won't be mangled during compilation and the Fortran code will be able to call it.

```C++
extern "C" {

    // Function to launch c_kernel at every point
    // in the domain of A, B, and C
    void launch_c_kernel(
        float_type* A,
        float_type* B,
        float_type* C,
        int N) {

        // Launch an instance of c_kernel
        // at every point i in tensors A, B, C
        // and run c_kernel over it
        for(int i=0; i<N; i++) {
            c_kernel(A, B, C, i, N);
        }
    }

    // Function to allocate memory
    void* c_alloc(size_t nbytes) {
        
        // Allocate memory that is aligned 
        // to a 128 byte boundary
        void* ptr = aligned_alloc(128, nbytes);

        //void* ptr = calloc(1, nbytes);
        if (!ptr) {
            printf("Memory allocation failed, exiting.\n");
            exit(1);
        }

        // If all is good, initialise the memory to 0
        std::memset(ptr, '\0', nbytes);
        
        return ptr;
    }

    // Function to free memory
    void c_free(void* ptr) {
        free(ptr);
    }
}
```

### Prepare Fortran code

Fortran code that calls C must have an `interface` section with the functions and subroutines defined as well as their arguments. C functions with a `void` return type are regarded as **subroutines** in Fortran. C functions with other return types are similarly regarded as **functions** in Fortran as well. Before we begin we need to use the `iso_c_binding` module which gives us access to C datatypes, as well as helpful utilities and definitions that are useful in working with C.

```Fortran
    ! Use the ISO Fortran environment module
    use iso_c_binding
```

#### Define an interface

Here is the Fortran interface in [tensor_lib_c.f90](tensor_lib_c.f90) for the C functions defined above. Note that each function and subroutine defined in the interface also uses the `iso_c_binding` module to define C datatypes.

```Fortran
    ! Interface to C kernel functions
    interface
    
        ! Fortran regards a C function with void return type
        ! as a subroutine 
        ! This is the fortran interface to the C function
        subroutine launch_c_kernel(A, B, C, N) bind(C)
            use iso_c_binding
            ! Fortran passes by reference as the default
            ! Must have the "value" option present to pass by value
            ! Otherwise ckernel will receive pointers of type void**
            ! instead of void*
            type(c_ptr), value :: A, B, C
            integer(c_int), value :: N
        end subroutine

        ! C function to allocate memory
        function c_alloc(nbytes) result(ptr) bind(C)
            use iso_c_binding
            ! Make sure we have the value option set
            ! to pass by value
            integer(c_size_t), intent(in), value :: nbytes
            type(c_ptr) :: ptr
        end function c_alloc

        ! C function to free memory 
        subroutine c_free(ptr) bind(C)
            use iso_c_binding
            ! Make sure we have the value option set
            ! to pass by value
            type(c_ptr), intent(in), value :: ptr
        end subroutine c_free
        
    end interface
```

Every function or subroutine defined in the interface has the same number of arguments as the corresponding C function in [c_functions.cpp](c_functions.cpp). In order to bind the name (without mangling) to that of the corresponding C function also notice the presence of the `bind(C)` statement.

#### Standard datatypes

In Fortran the number of bytes used to represent reals or integers may be easily changed with a compiler option. When passing memory between Fortran an C is it a **very good idea** to use standardised data types from the `iso_fortran_env` module. In [tensor_lib_c.f90](tensor_lib_c.f90) we use the **real32** kind from `iso_fortran_env` to make sure that the pointers always have a consistent datatype that is compatible with `float` in C. 

```Fortran
real(kind=real32), pointer, dimension(:) :: A_h => null(), B_h => null(), C_h => null()
```

Alternatively, we could also have used the `c_float` kind from the `iso_c_binding` module. 

#### Fortran equivalent of void*

The data type of arguments in and out of the functions and subroutines are defined in the interface. The Fortran `type(c_ptr)` is equivalent to `void*` in C/C++. Also notice that integers passed in can have a type of `c_int` or `c_size_t` that correspond to their C/C++ equivalents.

#### Pass by value instead

When functions are called from Fortran, remember that arguments are usually passed **by reference** instead of by **value**. This means that by default the C function will receive a **pointer** to the argument instead of the actual value of the argument. If you are trying to pass a pointer of `type(c_ptr)` then the C function by default will receive a **reference** to that pointer, which is a pointer to a pointer of type `void**`. You can modify this behaviour by adding the `value` option for each input argument in the interface. Then the **value** of each argument will be passed to the C function instead and the C function will receive a pointer of type `void*`.

### Call C from Fortran

In the module [tensor_lib_c.f90](tensor_lib_c.f90) we call C code from the Fortran subroutines `init_mem`, `launch_kernel` and `free_mem`. In `init_mem` we call the function `c_alloc` (defined in [c_functions.cpp](c_functions.cpp)) that allocates aligned memory and returns a C pointer. 

#### Fortran data types to C equivalents

When calling C functions it is **vital** that the input data type is the same type and precision as what the C function is expecting. We pass in an integer of type `size_t` by using the Fortran `int` function to cast the input argument (number of bytes to allocate) to an integer of type `c_size_t` defined in the `iso_c_binding` module.

```Fortran
! Allocate memory for arrays using C functions
temp_cptr = c_alloc(int(N_in*sizeof(temp_real), c_size_t))
```

#### C pointers to Fortran pointers

The returned data type is of `type(c_ptr)`. We convert this pointer to a Fortran pointer using the `c_f_pointer` subroutine. At the same time we have the option of passing in a shape ([N_in]) for the created Fortran pointer

```
call c_f_pointer(temp_cptr, A_h, [N_in])
```

#### Fortran pointers to C pointers

The `launch_kernel` subroutine in [tensor_lib_c.f90](tensor_lib_c.f90) calls the C function `launch_c_kernel` from [c_functions.cpp](c_functions.cpp). It expects three arguments of `type(c_ptr)` and one integer of type `c_int`. We use the function `c_loc` to get a void pointer to the starting address for the Fortran pointers `A_h`, `B_h`, and `C_h`.

```Fortran
call launch_c_kernel( &
    c_loc(A_h), &
    c_loc(B_h), &
    c_loc(C_h), &
    int(N, c_int) &
)
```

One could also have called `c_loc` to get the address of the first element of `A_h`, `B_h`, and `C_h`, like this:

```Fortran
call launch_c_kernel( &
    c_loc(A_h(1)), &
    c_loc(B_h(1)), &
    c_loc(C_h(1)), &
    int(N, c_int) &
)
```

The pointer returned is the same, however this is a **potential source of a bug** if the pointers are defined to have lower bounds (i.e 0). In a similar manner we use `c_loc` again in the subroutine `free_mem` in [tensor_lib_c.f90](tensor_lib_c.f90) to get the C address of Fortran pointers for deallocation.

```Fortran
    ! De-allocate memory using calls to C functions
    call c_free(c_loc(A_h))
    call c_free(c_loc(B_h))
    call c_free(c_loc(C_h))
```

#### Run the example code

Run the compiled application `tensoradd_cfun` and ensure that it passes validation.

In [6]:
!source ../env; tensoradd_cfun

 Tensor addition passed validation.


<address>
Written by Dr. Toby Potter of <a href="https://www.pelagos-consulting.com">Pelagos Consulting and Education</a> and Dr. Joe Schoonover from <a href="https://www.fluidnumerics.com">Fluid Numerics</a>. All trademarks mentioned in this page are the property of their prospective owners.
</address> 