# Highlights of Compiling, linking, and the C++ ecosystem

  * So far we did everything within the interactive environment.
  * To contribute to larger projects and to build optimized applications, we need to use the compiler directly
  

# C++ Compilers

Widely used compilers include:
  * GCC is the current standard in HEP
  * LLVM/Clang
  * Intel/ICC
  * Microsoft/MSVC
  
We will use GCC today. The syntax is essentially the same (some compiler options might be different)

Best practice note: Using 2 will help find bugs and practices that do not conform to the C++ standard. Each compiler is different 


# What do compilers do?

Inputs can be
   * source code 
   * header/interface files
   * already built libraries
   
Output can be
   * Object files
   * libraries 
   * executables
   
Typical usage is to build objects files, (optionally) combine them into libraries, and build an overall executable

<center><img src="./images/compile.png" width="500" /></center>

# Essential options

Execution
  * ```-E``` stop after preprocessing 
  * ```-c``` stop after compilation 

Optimization
  * ```-g``` add debug symbols
  * ```-O0``` no optimization - Use ```-g -O0``` to debug
  * ```-O2``` and ```-O3``` to have the compiler (highly) optimize [```-O3``` is a good default]
  
File management
  * ```-I<directory>``` Directory in which to look for header/include files (use multiple ```-I``` for multiple paths)
  * ```-L<directory>``` Directory in which to look for (shared) libraries (use multiple ```-L``` for multiple paths)
  * ```-l<lib name>``` Library to link against (but drop the leading ```lib``` and trailing ```.so``` or ```.a```
  * ```-o<output name>``` Output file
  

# More essential options

Compiler configuration
  * ```-std=c++14``` to set the C++ standard to use (11,14,17,20....)
  * ```-fPIC``` needed when compiling for shared libraries
  
Warnings
  * ```-Wall``` get all warnings (but only a message in stdout..)
  * ```-Werror``` Any warning stops compilation
  * You also have fine-grained control over different types of warnings to enable as errors
  
Best practice: Use ```-Wall``` - warnings are more often than not a sign of a code bug

# Lets try it

  * First lets look at the environment

In [1]:
%%sh
cd /notebook/code
ls *.cpp
echo "\nsome include files too"
ls includes

hello.cpp
long-lines.cpp
main.cpp

some include files too
hello.hpp


In [2]:
%%sh
which g++

/opt/conda/envs/2023-05-01-hsf-india-tutorial/bin/g++


In [3]:
%%sh
cd /notebook/code
g++ -c hello.cpp

hello.cpp:1:10: fatal error: hello.hpp: No such file or directory
    1 | #include "hello.hpp"
      |          ^~~~~~~~~~~
compilation terminated.


CalledProcessError: Command 'b'cd /notebook/code\ng++ -c hello.cpp\n'' returned non-zero exit status 1.

Whoops, we need to tell the compiler where to find the include file that our code uses

In [None]:
%%sh
cd /notebook/code
g++ -c hello.cpp -I./includes

And we should add recommended options

In [None]:
%%sh
cd /notebook/code
g++ -c hello.cpp -I./includes -Wall -std=c++17

Whoops, we found a bug. lets fix it and try again

In [None]:
%%sh
cd /notebook/code
g++ -c hello.cpp -I./includes -Wall -std=c++17
ls -l hello.o

Great. Now it works. We can also optimize our program

In [None]:
%%sh
cd /notebook/code
g++ -c hello.cpp -I./includes -Wall  -std=c++17 -O3
ls -l hello.o

Great - this is a simple program, so we can build an executable just with this

In [None]:
%%sh
cd /notebook/code
g++ main.cpp hello.o -I./includes -std=c++17
ls -l a.out

Whoops - lets give our program a better name

In [None]:
%%sh
cd /notebook/code
g++ main.cpp hello.o -I./includes -o hello_world
ls -l hello_world

And run it

In [None]:
%%sh
cd /notebook/code
./hello_world

We can also make a static library and link against it

In [None]:
%%sh
cd /notebook/code
rm hello_world
ar rcs libhello.a hello.o
ls -l libhello.a
rm hello.o
g++ main.cpp -I./includes -lhello -L. -o hello_world
ls -l hello_world

Now lets try the same, but with a shared library

In [None]:
%%sh
cd /notebook/code
rm libhello.a hello.o hello_world

In [None]:
%%sh
cd /notebook/code
g++ -c hello.cpp -I./includes  -Wall -std=c++17 -O3 -fPIC 
ls -l hello.o

Next build the shared library itself

In [None]:
%%sh
cd /notebook/code
g++ hello.o -shared -o libhello.so
ls -l libhello.so

And the executable

In [None]:
%%sh
cd /notebook/code
g++ main.cpp -I./includes -lhello -L. -o hello_world
ls -l hello_world

And run it

In [None]:
%%sh
cd /notebook/code
./hello_world

whoops - that did not work because with shared libraries need to be available/findable at run time

In [None]:
%%sh
cd /notebook/code
export LD_LIBRARY_PATH=$PWD
./hello_world

Now it works.
  * We adjusted LD_LIBRARY_PATH at run time to find the library we just built. 

Shared libraries are the recommended approach for complex programs. This differs from static linking:
  * All ```.so``` must be available at run time.
  * No need to relink the program when a single translation unit is changed. Just recompile it and remake the corresponding shared library.
  * Large flexibly enhancement, but normally at a small computational penelty

# How to generalize this for your program?

  * For each source translation unit (```cpp``` file)
```g++ -fPIC <file> -I./includes -O3``` <br>
```g++ <list of .o files> -shared -o <output library name>``` <br>
```g++ <file with main> -l<library name> .... options```

# Automating your compile and link steps

  * Makefiles (simple but relatively hardcoded)
  * automake (relatively obsolete now)
  * cmake -- Flexible and powerful
  
**Recommendation: Use cmake**

Learning cmake is highly useful when managing libraries and programs (I need to learn it too - so I won't try to teach it) 
  * Here is a course on it https://hsf-training.github.io/hsf-training-cmake-webpage/

# Selected tools from C++ ecosystem

# Code formating using ```clang-format``` 

  * ```clang-format``` automatically formats your code according to your specificiations. It Integrates into your CI, editor, etc. 
      * Has several predefined styles and considerable customization options
      * Likely your experiment is using one (well, hopefully!)
      * Compiler driven, so if you have problems with clang-format, be sure that your code compiles..
  * Standardized code style increases readability. This is especially important in collaborative projects where you need to be able to read and understand code written by others. Even if its not in your preferred style, a unified style helps considerably 

  

In [None]:
%%sh
which clang-format

In [None]:
%%sh
clang-format --help

In [None]:
cat /notebook/code/.clang-format

In [None]:
%%sh
cd /notebook/code
cat long-lines.cpp

In [None]:
%%sh
cd /notebook/code
g++ long-lines.cpp -Wall -std=c++17 -o ll_exec
./ll_exec

In [None]:
%%sh
cd /notebook/code
clang-format long-lines.cpp

In [None]:
%%sh
cd /notebook/code
clang-format -i long-lines.cpp
git diff long-lines.cpp

# clang-tidy can find performance bugs and modernize your code

  * ```clang-tidy``` is another LLVM tool that can find and replace out dated code constructs, or code that can be written in a more efficient manner
  * It works much the same way as clang-format, but requires full compilation commands (eg, a *compilation command database*) to run.
  * Extendable so you can write your own checks (perhaps your experiment has some of these running it its CI?)
  
I won't go through an example, but lets look at some of the ```clang-tidy``` documentation to understand what kind of code improvements it can make

https://clang.llvm.org/extra/clang-tidy/


# Thats it for this notebook