# Using Compiled Languages 

## First steps with python using numba

In [1]:
import numpy as np
import numba

Python is a nice scripting object-oriented language but it can run into performance issues. We will see a few examples below. To get better performance, one uses compiled languages, such a C, C++ and Fortran. We will also use the numba python library that allows one to perform "just in time" compilations. We will however explore in more details in this lecture how to compile directly C and Fortran codes. We will see in a next lecture how one can interface these compiled functions directly to python.

Let's start first with a simple example to see how bad python performs when not used properly. We define a simple function that uses a python loop, which is generally a very bad idea with python.

In [2]:
def f_simple(X):
    Y = np.empty_like(X)
    for i in range(len(X)):
        x = X[i]
        Y[i] = x + x**2 + x**3 + x**4 + x**5 + x**6 + x**7 + x**8
    return Y

We create a random numpy array of moderate size.

In [3]:
x=np.random.normal(size=1_000_000)

We finally call the function and time it using the ``%%timeit`` python function.

In [4]:
%%timeit 
f_simple(x)

1.82 s ± 14.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


We see that it called the timing routine 7 times, each timing using only 1 call to the function. We can change this to see how robust the time measurments are. The standard deviation seems indeed a bit large.

In [5]:
%%timeit -r 4 -n 4
f_simple(x)

1.84 s ± 1.23 ms per loop (mean ± std. dev. of 4 runs, 4 loops each)


We see that the measurement seems now more consistent with a smaller standard deviation. 

Let's now try to use more proper python programming, avoiding using explicit loops, but direct numpy array notations instead.

In [6]:
def f_numpy(X):
    return X + X**2 + X**3 + X**4 + X**5 + X**6 + X**7 + X**8

In [7]:
%%timeit -r 4 -n 4
f_numpy(x)

357 ms ± 963 µs per loop (mean ± std. dev. of 4 runs, 4 loops each)


Wow! It is indeed much faster. Too fast even... Let's use a bigger array.

In [8]:
x=np.random.normal(size=10_000_000)

In [9]:
%%timeit -r 4 -n 4
f_numpy(x)

3.64 s ± 19 ms per loop (mean ± std. dev. of 4 runs, 4 loops each)


These multiple powers are probably slow to evaluate. Let's use a nice trick to avoid having to call these expensive operations.

In [10]:
def f_numpy_2(X):
    return X*(1 + X*(1 + X*(1 + X*(1 + X*(1 + X*(1 + X*(1 + X)))))))

In [11]:
%%timeit -r 4 -n 4
f_numpy_2(x)

143 ms ± 644 µs per loop (mean ± std. dev. of 4 runs, 4 loops each)


Wow! Another dramatic improvement!

We kind of reached the maximum we can do using python alone. We will now try to use a nice python package called ``numba`` that allows one to perform _just in time compilation_. What ``numba`` does is to first convert the python function into a C code and then to compile this C code on the fly. The performance of the resulting function is usually much higher. Since the function is now compiled, you don't need to worry about using loops directly anymore. In fact, to allow ``numba`` to translate the python instructions into C instructions, it is recommended to use explicit loops. 

Let see how we can optimize our function using ``numba``.

In [12]:
@numba.jit(nopython=True)
def f_numba(X):
    Y = np.empty_like(X)
    for i in range(len(X)):
        x = X[i]
        Y[i] = x + x**2 + x**3 + x**4 + x**5 + x**6 + x**7 + x**8
#        Y[i] = x*(1 + x*(1 + x*(1 + x*(1 + x*(1 + x*(1 + x*(1 + x)))))))
    return Y

Note that we have used the _decorator_ ``@numba.jit`` that tells ``numba`` to translate the function in C and compile it. ``numba`` tries to translate everything in C. If it cannot, it will keep the python code as is. Using the option ``nopython=True`` forces ``numba`` to translate in C. If ``numba`` fails to do it, an error will follow.

Let's now time the resulting _compiled_ function.

In [13]:
%%timeit -r 4 -n 4
f_numba(x)

The slowest run took 8.62 times longer than the fastest. This could mean that an intermediate result is being cached.
63.5 ms ± 72.1 ms per loop (mean ± std. dev. of 4 runs, 4 loops each)


This is now really fast! This is the main advantage of using a compiled language. The standard deviation is quite large when compared to the mean. This is because the timer is also counting the extra time ``numba`` needs to compile the function. To avoid this, we can use an even bigger array. Note that we could also have used the ``cache=True`` option of ``numba`` but this is beyond the scope of this lecture. 

In [14]:
x=np.random.normal(size=100_000_000)

In [15]:
%%timeit -r 4 -n 4
f_numba(x)

238 ms ± 476 µs per loop (mean ± std. dev. of 4 runs, 4 loops each)


Using a compiler also allows us to use parallel computing. We will see in future lectures how to program in parallel. For the time being, we just trust ``numba`` to do it for us. To parallelize a ``numba`` function, just add the ``parallel=True`` option and replace the ``range`` function defining the loop by the parallel function ``numba.prange`` which defines the method to divide up the loop into parallel tasks.

In [16]:
@numba.jit(nopython=True, parallel=True)
def f_numba_para(X):
    Y = np.empty_like(X)
    for i in numba.prange(len(X)):
        x = X[i]
        Y[i] = x + x**2 + x**3 + x**4 + x**5 + x**6 + x**7 + x**8
    return Y

In [19]:
%%timeit -r 4 -n 40
f_numba_para(x)

26 ms ± 276 µs per loop (mean ± std. dev. of 4 runs, 40 loops each)


This ends our journey towards better and better performance using python. We started with an explicit loop within a pure python function and the pretty awful timing of roughly 200s. We ended with a compiled parallel C code generated by ``numba`` with an amazing 10000x (yes 10 thousands!) speedup with roughly 20ms of execution time.  

## Linux and the Terminal window

To understand better what ``numba`` is doing under the hood, we now switch to using the Terminal window. We will edit our C or Fortran codes using an editor (could be ``vim`` or ``emacs``). We will then compile our C or Fortran code using a compiler. Before we get there, let's first get some practice in the Terminal window.

In your jupyter notebook Home page, hit the **New** button, this time choosing the **Terminal** option.
You should see a prompt like ``$`` and a cursor. Just type:
``$ ls``. You should see the content of the course directlry in the Terminal window.

In this jupyter notebook, you can execute all the same command line instructions using a ``!`` before.

In [21]:
!ls

Compiled_Languages.ipynb  compiled.md


Here is a list of very useful Linux commands that you have to know by heart.

| Command | Examples | Description |
| :----------- | :----------- | :----------- |
| ``ls`` | ``ls``<br> ``ls -als`` | List files in current directory <br> List in long format including hidden files and file sizes|
| ``cd`` | ``cd ..`` <br> ``cd week9`` <br> ``cd ~bob/se-for-sci/content``| Change to parent directory <br> Change ot directory ``week9`` <br> Change to target directory inside Bob's SE course directory|
| ``mkdir`` | ``mkdir test``| Creating a new directory called ``test`` |
| ``rmdir`` | ``rmdir test`` | Removing the directory called ``test`` |
| ``cp`` | ``cp file1.txt file2.txt`` <br> ``cp ~bob/file1.txt .`` <br> ``cp ~bob/* .`` <br> ``cp -r ~bob/se-for-sci .`` | Copy ``file1.txt`` into a new file called ``file2.txt`` <br> Copy the file called ``file1.txt`` in Bob's home directory into a new file locally keeping the same name <br> Copy all the files in Bob's home directory locally giving them the same name <br> Copy recursively the entire content of Bob's SE course directory locally keeping the same names | 
| ``rm`` | ``rm file1.txt`` <br> ``rm -rf *`` | Remove only the file called ``file1.txt`` <br> Remove recursively all files and directories without asking permission (very dangerous) |
| ``mv`` | ``mv ~bob/file1.txt file2.txt`` | Move one file into another location and with a new name |
| ``more`` | ``more file1.txt`` | Look at the file content one page at a time |
| ``man`` | ``man more`` | Look at the manual for a given Linux command |
| ``grep`` | ``grep Hello file1.txt`` | Search for string ``Hello`` inside the file ``file1.txt`` |

Try now to play with these different commands in the Terminal window. In the remainder of the lecture, we will have to use the Terminal window again so get used to it!

## Compiling a C code

Now let's move to the core of the lecture, namely learning how to compile actual code. We will start simple with the famous ``Hello world`` example. The cell below will write a C file called ``hello.c``. 

In [7]:
%%writefile hello.c
#include <stdio.h>
int main() {
   // This is a comment 
   printf("Hello, World!\n");
   return 0;
}

Overwriting hello.c


You can check in parallel in the Terminal window that this file was properly created using the ``$ more hello.c`` command.

The first line is an **include** statement. It tells the compiler to include at the beginning of the file another file called ``stdio.h`` which is part of the C compiler library of files. As the name indicates, this files contains the standard Input/Output C functions. The function we will use here is ``printf`` to output to screen the character string ``Hello world!``.  Comments are defining in CC using the ``//`` directive. 

In this lecture, we will not teach the basics of the C language. We only focus on the compilers. If you need more details on the C syntax, please use the web as a never ending source of information.  

To compile the code, we now need to use a compier. In most Linux system, you always find by default the GNU compiler called ``gcc``. More resources on the GNU C compiler can be found [here](https://gcc.gnu.org). The command to compile our simple ``hello.c`` code is as follows:

In [8]:
!gcc hello.c

This command creates a new file called ``a.out`` which is the **executable** of your code. You can check that it is indeed here by typing:

In [9]:
!ls

Compiled_Languages.ipynb [1m[36mcmake_example[m[m            hello.c
[31ma.out[m[m                    compiled.md


You can now run this executable by typing:

In [10]:
!./a.out

Hello, World!


Congratulations! You succeeded in running your first compiled code!

In the previous cell, the symbol ``!`` is used to execute from the jupyter notebook a Linux command. In the Terminal window, you can try and execute ``$ ./a.out`` where ``$`` is the prompt (don't type the ``$`` symbol, it should be already there!). The dot-slash ``./`` means execute the code that sits here, in this directory.

Note that if you type only ``$ a.out`` it won't work.

In [11]:
!a.out

zsh:1: command not found: a.out


Indeed, the operating system wasn't able to find the executable anywhere in the system. 

For that, you need to define the ``PATH`` variable that contains the path to your executables. 

Try now to type in the Terminal window:

``export PATH=~/se-for-sci/content/week9:$PATH``

``echo $PATH``

You should see a long list of directories conrtaining all the executables accessible to you, including:

``<yourhomedir>/se-for-sci/content/week9``

You can now type ``a.out`` in the Terminal window and it will work like a charm.

Note that this jupyter notebook inherits the ``PATH`` the system had when you launched it the first time with the command ``jupyter notebook``. You won't be able to change the path anymore. This is why you need to use the Terminal window for this little exercise we just did.

As you have probablty guessed, ``a.out`` is the default standard name for executable. If you want to give it a proper name, use the ``-o`` option.

In [16]:
!gcc hello.c -o hello

In [17]:
!ls

Compiled_Languages.ipynb [1m[36mcmake_example[m[m            [31mhello[m[m
[31ma.out[m[m                    compiled.md              hello.c


We have now a new file called ``hello`` which is our new exectuable.

In [19]:
!./hello

Hello, World!


Let's now move to a more complicated task. We would like to reproduce the exercise we did using python and ``numba`` but this time directly ourselves using C. 

Here is the C code that implements the power function we used before.

In [23]:
%%writefile power.c
#include <stdio.h>
#include <stdlib.h>
#include <math.h>

int main(int argc, char *argv[])
{
    int i,n=100000000;
    float x,y;
    
    printf("%i\n",n);
    for (i=0;i<n;i++){
        x=rand();
        y=x+pow(x,2)+pow(x,3)+pow(x,4)+pow(x,5)+pow(x,6)+pow(x,7)+pow(x,8);
    }
    return 0;
}

Overwriting power.c


We will not dwelled on the new C syntax introduced here: declaring integer and floating point variables, a for loop and the external functions ``rand()`` and ``pow()``.

The key points are that we now need to add more ``include`` statements to allow the use of these external functions. The new library element ``strlib.h`` is already contained in the standard GNU C libraries. The libray element ``math.h`` is not. We new to tell the compiler to look into an external library to find there the ``pow()`` function. 

This is done using the compiler using the ``-l`` option that tells the compiler to **link** your code with an external library of already compiled function. In our case, the name of the **math** library is simply ``m``, so we have to type the command:

In [33]:
!gcc power.c -o power -lm

Note that the outcome of this compilation might depend on your system. Some GNU C compiler versions need the ``-lm`` linking option, some other don't. 

To see what version of the compiler you use, just type:

In [34]:
!gcc --version

Apple clang version 13.1.6 (clang-1316.0.21.2)
Target: x86_64-apple-darwin21.6.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin


Let's now try to execute our new code using the Linux command ``time``:

In [35]:
!time ./power

100000000
./power  14.63s user 0.06s system 97% cpu 15.112 total


This is very disappointing! The reason for this poor performance is the function ``pow()`` that works for any floating point powers, not just integer powers like we need here. Let's use the same trick we use above for our python code and re-write our C code as follows:

In [36]:
%%writefile mult.c
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
    int i,n=100000000;
    float x,y;
    
    printf("%i\n",n);
    for (i=0;i<n;i++){
        x=rand();
        y=x*(1+x*(1+x*(1+x*(1+x*(1+x*(1+x*(1+x)))))));
    }
    return 0;
}


Overwriting mult.c


Note that we don't use the ``math`` library anymore. We can compile now using:

In [37]:
!gcc mult.c -o mult

In [38]:
!time ./mult

100000000
./mult  1.94s user 0.01s system 93% cpu 2.083 total


Much better! We can also compile the code using the optimization option ``-O`` (capital O) that allows to explore various degree of optimization, from ``-O0`` which corresponds to basicaly zero optimization to ``-O3`` which allows the compiler to re-write aggressively parts of your code to make it faster.

Let's try to optimize our executable using:

In [39]:
!gcc -O3 mult.c -o mult

In [40]:
!time ./mult

100000000
./mult  0.70s user 0.01s system 70% cpu 1.011 total


Indeed, much better! We have now reached the same level of performance than ``numba``, but we did it ourselves.

## Compiling a Fortran code

In [31]:
%%writefile hello.f90
program hello

write(*,*)"Hello world!"

end program hello

Writing hello.f90


In [32]:
!gfortran hello.f90 -o hello

In [33]:
!./hello

 Hello world!


In [71]:
%%writefile power.f90
program power

    real(kind=8)::x, y
    integer::i,n=100000000

    write(*,*)n
    do i=1,n
        x=rand()
        y=x*(1+x*(1+x*(1+x*(1+x*(1+x*(1+x*(1+x)))))))
    enddo
    
end program power

Overwriting power.f90


In [72]:
!gfortran -O3 power.f90 -o power

In [73]:
!time ./power

   100000000

real	0m0.484s
user	0m0.482s
sys	0m0.000s


## Compiling a C++ code


In [52]:
%%writefile hello.cpp
#include <iostream>

int main() {
// This is a comment
    std::cout << "Hello World!";
    return 0;
}

Overwriting hello.cpp


In [53]:
!gcc hello.cpp -o hello -lstdc++

In [54]:
!./hello

Hello World!

In [74]:
%%writefile mult.cpp
#include <iostream>

int main(int argc, char *argv[])
{
    int i,n=100000000;
    float x,y;
    
    std::cout << n;
    for (i=0;i<n;i++){
        x=rand();
        y=x*(1+x*(1+x*(1+x*(1+x*(1+x*(1+x*(1+x)))))));
    }
    return 0;
}


Overwriting mult.cpp


In [75]:
!gcc -O3 mult.cpp -o mult -lstdc++

In [76]:
!time ./mult

100000000
real	0m0.613s
user	0m0.609s
sys	0m0.002s


## Building a code with more than one file

In [121]:
%%writefile hello.f90
program main
    
    integer :: i=1

    call greetings(i)

end program main

Writing hello.f90


In [94]:
%%writefile greet.f90
subroutine greetings(i)

    integer, intent(in) :: i

    write(*,*)"Hello world!",i

end subroutine greetings

Overwriting greet.f90


In [95]:
!gfortran -c hello.f90

In [96]:
!gfortran -c greet.f90

In [97]:
!ls

a.out			  ex.sh~     hello.c	mult	  power.c
Compiled_Languages.ipynb  greet.f90  hello.cpp	mult.c	  power.f90
compiled.md		  greet.o    hello.f90	mult.cpp  power_para.f90
ex.sh			  hello      hello.o	power


In [98]:
!gfortran hello.o greet.o -o hello 

In [99]:
!./hello

 Hello world!            1


## Preprocessor directives

In [117]:
%%writefile greet.f90
subroutine greetings(i)

    integer, intent(in) :: i
#ifdef FRENCH
    write(*,*)"Bonjour tout le monde !",i
#else
    write(*,*)"Hello world!",i
#endif
    
end subroutine greetings

Writing greet.f90


In [118]:
!gfortran -cpp -DFRENCH -c greet.f90

In [114]:
!gfortran hello.o greet.o -o hello 

In [115]:
!./hello

 Bonjour tout le monde !           1


## Libraries

In [119]:
!ar r libgreet.a greet.o

ar: creating libgreet.a


In [120]:
!ranlib libgreet.a

In [132]:
!gfortran hello.f90 -o hello -L. -lgreet 

In [133]:
!ls

Compiled_Languages.ipynb  greet.f90  hello	libgreet.a
compiled.md		  greet.o    hello.f90


In [134]:
!./hello

 Bonjour tout le monde !           1


## Using ``PATH``, ``LIBRARY_PATH`` and ``LD_LIBRARY_PATH``

In [9]:
!echo $PATH

/usr/licensed/anaconda3/2021.11/bin:/home/rt3504/.local/bin:/home/rt3504/bin:/usr/share/Modules/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/puppetlabs/bin:/opt/dell/srvadmin/bin


## Dealing with complex libraries and compiler versions using module

In [1]:
!module list

[?1h=Currently Loaded Modulefiles:[m
 1) anaconda3/2021.11  [m
[K[?1l>

In [2]:
!module avail ! grep fftw

[?1h=--------------------- [1;94m/usr/local/share/Modules/modulefiles[0m ---------------------[m
[1mfftw[22m/aocc-3.0.0/3.3.9                [1mfftw[22m/intel-19.1/openmpi-4.1.0/2.1.5    [m
[1mfftw[22m/aocc-3.0.0/openmpi-4.1.0/2.1.5  [1mfftw[22m/intel-19.1/openmpi-4.1.0/3.3.9    [m
[1mfftw[22m/aocc-3.0.0/openmpi-4.1.0/3.3.9  [1mfftw[22m/intel-21.5/openmpi-4.1.1/2.1.5    [m
[1mfftw[22m/gcc/3.3.9                       [1mfftw[22m/intel-2021.1/3.3.9                [m
[1mfftw[22m/gcc/intel-mpi/2.1.5             [1mfftw[22m/intel-2021.1/intel-mpi/2.1.5      [m
[1mfftw[22m/gcc/intel-mpi/3.3.9             [1mfftw[22m/intel-2021.1/intel-mpi/3.3.9      [m
[1mfftw[22m/gcc/openmpi-4.1.0/2.1.5         [1mfftw[22m/intel-2021.1/openmpi-4.1.0/2.1.5  [m
[1mfftw[22m/gcc/openmpi-4.1.0/3.3.9         [1mfftw[22m/intel-2021.1/openmpi-4.1.0/3.3.9  [m
[1mfftw[22m/intel-19.1/3.3.9                [1mfftw[22m/nvhpc-21.5/3.3.9                  [m
[1mf

## Compiling more complex codes: Makefile and CMake

In [45]:
%%writefile Makefile
hello : hello.o greet.o Makefile ; gfortran hello.o greet.o -o hello

%.o : %.f90 ; gfortran -cpp -DFRENCH -c $<

clean : ; rm *.o

Overwriting Makefile


In [47]:
!make hello

gfortran -cpp -DFRENCH -c hello.f90
gfortran -cpp -DFRENCH -c greet.f90
gfortran hello.o greet.o -o hello


In [48]:
!./hello

 Bonjour tout le monde !           1


In [10]:
!git clone https://rteyssie@bitbucket.org/rteyssie/mini-ramses.git

Cloning into 'mini-ramses'...
remote: Enumerating objects: 13026, done.[K
remote: Counting objects: 100% (13026/13026), done.[K
remote: Compressing objects: 100% (3243/3243), done.[K
remote: Total 13026 (delta 8827), reused 12662 (delta 8510), pack-reused 0[K
Receiving objects: 100% (13026/13026), 9.28 MiB | 23.57 MiB/s, done.
Resolving deltas: 100% (8827/8827), done.


In [1]:
!ls

Compiled_Languages.ipynb [1m[36mcmake_example[m[m            compiled.md
