# Highlights of Compiling, linking, and the C++ ecosystem

  * So far we did everything within the interactive environment.
  * To contribute to larger projects and to build optimized applications, we need to use the compiler directly
  

# C++ Compilers

Widely used compilers include:
  * GCC is the current standard in HEP
  * LLVM/Clang
  * Intel/ICC
  * Microsoft/MSVC
  
We will use GCC today. The syntax is essentially the same (some compiler options might be different)

Best practice note: Using 2 will help find bugs and practices that do not conform to the C++ standard. Each compiler is different 


# What do compilers do?

Inputs can be
   * source code 
   * header/interface files
   * already built libraries
   
Output can be
   * Object files
   * libraries 
   * executables
   
Typical usage is to build objects files, (optionally) combine them into libraries, and build an overall executable

<img src="./images/compile.png" width="500" />

# Essential options

Execution
  * ```-E``` stop after preprocessing 
  * ```-c``` stop after compilation 

Optimization
  * ```-g``` add debug symbols
  * ```-O0``` no optimization - Use ```-g -O0``` to debug
  * ```-O2``` and ```-O3``` to have the compiler (highly) optimize [```-O3``` is a good default]
  
File management
  * ```-I<directory>``` Directory in which to look for header/include files (use multiple ```-I``` for multiple paths)
  * ```-L<directory>``` Directory in which to look for (shared) libraries (use multiple ```-L``` for multiple paths)
  * ```-l<lib name>``` Library to link against (but drop the leading ```lib``` and trailing ```.so``` or ```.a```
  * ```-o<output name>``` Output file
  

# More essential options

Compiler configuration
  * ```-std=c++14``` to set the C++ standard to use (11,14,17,20....)
  * ```-fPIC``` needed when compiling for shared libraries
  
Warnings
  * ```-Wall``` get all warnings (but only a message in stdout..)
  * ```-Werror``` Any warning stops compilation
  * You also have fine-grained control over different types of warnings to enable as errors
  
Best practice: Use ```-Wall``` - warnings are more often than not a sign of a code bug

# Lets try it

  * First lets look at the environment

In [24]:
%%sh
cd /notebook/code
ls *.cpp
echo "\nsome include files too"
ls includes

hello.cpp
main.cpp

some include files too
Untitled.ipynb
hello.hpp


In [25]:
%%sh
which g++

/opt/conda/envs/2023-05-01-hsf-india-tutorial/bin/g++


In [26]:
%%sh
cd /notebook/code
g++ -c hello.cpp

hello.cpp:1:10: fatal error: hello.hpp: No such file or directory
    1 | #include "hello.hpp"
      |          ^~~~~~~~~~~
compilation terminated.


CalledProcessError: Command 'b'g++ -c hello.cpp\n'' returned non-zero exit status 1.

Whoops, we need to tell the compiler where to find the include file that our code uses

In [28]:
%%sh
cd /notebook/code
g++ -c hello.cpp -I./includes

And we should add recommended options

In [30]:
%%sh
cd /notebook/code
g++ -c hello.cpp -I./includes -Wall -std=c++17

hello.cpp: In function 'void checkIt()':
   13 |   int i;
      |       ^


Whoops, we found a bug. lets fix it and try again

In [33]:
%%sh
cd /notebook/code
g++ -c hello.cpp -I./includes -Wall -std=c++17
ls -l hello.o

-rw-r--r-- 1 root root 15080 Apr 29 04:34 hello.o


Great. Now it works. We can also optimize our program

In [52]:
%%sh
cd /notebook/code
g++ -c hello.cpp -I./includes -Wall  -std=c++17 -O3
ls -l hello.o

-rw-r--r-- 1 root root 4432 Apr 29 04:42 hello.o


Great - this is a simple program, so we can build an executable just with this

In [53]:
%%sh
cd /notebook/code
g++ main.cpp hello.o -I./includes -std=c++17
ls -l a.out

-rwxr-xr-x 1 root root 16896 Apr 29 04:42 a.out


Whoops - lets give our program a better name

In [54]:
%%sh
cd /notebook/code
g++ main.cpp hello.o -I./includes -o hello_world
ls -l hello_world

-rwxr-xr-x 1 root root 16896 Apr 29 04:42 hello_world


And run it

In [55]:
%%sh
cd /notebook/code
./hello_world

Hello, world 0
Hello, world 1
Hello, world 2
Bonjour
Ciao
Guten Tag
Hello


We can also make a static library and link against it

In [56]:
%%sh
cd /notebook/code
rm hello_world
ar rcs libhello.a hello.o
ls -l libhello.a
rm hello.o
g++ main.cpp -I./includes -lhello -L. -o hello_world
ls -l hello_world

-rw-r--r-- 1 root root 4632 Apr 29 04:42 libhello.a
-rwxr-xr-x 1 root root 16896 Apr 29 04:42 hello_world


Now lets try the same, but with a shared library

In [60]:
%%sh
cd /notebook/code
rm libhello.a hello.o hello_world

In [66]:
%%sh
cd /notebook/code
g++ -c hello.cpp -I./includes  -Wall -std=c++17 -O3 -fPIC 
ls -l hello.o

-rw-r--r-- 1 root root 4432 Apr 29 04:46 hello.o


Next build the shared library itself

In [71]:
%%sh
cd /notebook/code
g++ hello.o -shared -o libhello.so
ls -l libhello.so

-rwxr-xr-x 1 root root 16192 Apr 29 04:47 libhello.so


And the executable

In [74]:
%%sh
cd /notebook/code
g++ main.cpp -I./includes -lhello -L. -o hello_world
ls -l hello_world

-rwxr-xr-x 1 root root 16040 Apr 29 04:48 hello_world


And run it

In [76]:
%%sh
cd /notebook/code
./hello_world

./hello_world: error while loading shared libraries: libhello.so: cannot open shared object file: No such file or directory


CalledProcessError: Command 'b'./hello_world\n'' returned non-zero exit status 127.

whoops - that did not work because with shared libraries need to be available/findable at run time

In [78]:
%%sh
cd /notebook/code
export LD_LIBRARY_PATH=$PWD
./hello_world

Hello, world 0
Hello, world 1
Hello, world 2
Bonjour
Ciao
Guten Tag
Hello


Now it works.
  * We adjusted LD_LIBRARY_PATH at run time to find the library we just built. 

Shared libraries are the recommended approach for complex programs. This differs from static linking:
  * All ```.so``` must be available at run time.
  * No need to relink the program when a single translation unit is changed. Just recompile it and remake the corresponding shared library.
  * Large flexibly enhancement, but normally at a small computational penelty

# How to generalize this for your program?

  * For each source translation unit (```cpp``` file)
```g++ -fPIC <file> -I./includes -O3``` <br>
```g++ <list of .o files> -shared -o <output library name>``` <br>
```g++ <file with main> -l<library name> .... options```

# Automating your compile and link steps

  * Makefiles (simple but relatively hardcoded)
  * automake (relatively obsolete now)
  * cmake -- Flexible and powerful
  
**Recommendation: Use cmake**

Learning cmake is highly useful when managing libraries and programs (I need to learn it too - so I won't try to teach it) 
  * Here is a course on it ADD LINK

# Selected tools from C++ ecosystem

# Valgrind: Debugging memory issues

Signs of having memory issues in your program
  * Continued growth in RSS
  * Irreproducible results or program crashes

In [1]:
%%sh
cd /notebook/code
g++ buggy.cpp -Wall -std=c++17 -o buggy_exec
ls -l buggy_exec

buggy.cpp: In function 'int main()':
    4 |   std::array < std::array<int, 32>, 16 > my_array;
      |                                          ^~~~~~~~


-rwxr-xr-x 1 root root 16000 Apr 29 15:38 buggy_exec


In [2]:
%%sh
cd /notebook/code
./buggy_exec


In [3]:
%%sh
cd /notebook/code
cat buggy.cpp

#include <array>

int main() {
  std::array < std::array<int, 32>, 16 > my_array;

  //for ( int i=0; i<16; i++)
  //  for ( int j=0; j<32; j++)
  //    my_array[i][j]=i+j;

  //int sum=0;
  //for ( int i=0; i<32; i++ )
  //  for ( int j=0; j<32; j++)
  //    sum+=my_array[i][j];
    
  int *myints = new int[20];
  myints[0]=0;
    
  return 0;
}


In [4]:
%%sh
cd /notebook/code
which valgrind
valgrind --version
valgrind --tool=memcheck ./buggy_exec

/usr/local/bin/valgrind
valgrind-3.21.0


==1603== Memcheck, a memory error detector
==1603== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==1603== Using Valgrind-3.21.0 and LibVEX; rerun with -h for copyright info
==1603== Command: ./buggy_exec
==1603== 

valgrind:  Fatal error at startup: a function redirection
valgrind:  which is mandatory for this platform-tool combination
valgrind:  cannot be set up.  Details of the redirection are:
valgrind:  
valgrind:  A must-be-redirected function
valgrind:  whose name matches the pattern:      strlen
valgrind:  in an object with soname matching:   ld-linux-x86-64.so.2
valgrind:  was not found whilst processing
valgrind:  symbols from the object with soname: ld-linux-x86-64.so.2
valgrind:  
valgrind:  Possible fixes: (1, short term): install glibc's debuginfo
valgrind:  package on this machine.  (2, longer term): ask the packagers
valgrind:  for your Linux distribution to please in future ship a non-
valgrind:  stripped ld.so (or whatever the dynamic linker .so is c

CalledProcessError: Command 'b'cd /notebook/code\nwhich valgrind\nvalgrind --version\nvalgrind --tool=memcheck ./buggy_exec\n'' returned non-zero exit status 1.

# Code formating using ```clang-format``` 

  * ```clang-format``` automatically formats your code according to your specificiations. It Integrates into your CI, editor, etc. 
      * Has several predefined styles and considerable customization options
      * Likely your experiment is using one (well, hopefully!)
      * Compiler driven, so if you have problems with clang-format, be sure that your code compiles..
  * Standardized code style increases readability. This is especially important in collaborative projects where you need to be able to read and understand code written by others. Even if its not in your preferred style, a unified style helps considerably 

  

In [19]:
%%sh
which clang-format

/opt/conda/envs/2023-05-01-hsf-india-tutorial/bin/clang-format


In [17]:
%%sh
clang-format --help

OVERVIEW: A tool to format C/C++/Java/JavaScript/JSON/Objective-C/Protobuf/C# code.

If no arguments are specified, it formats the code from standard input
and writes the result to the standard output.
If <file>s are given, it reformats the files. If -i is specified
together with <file>s, the files are edited in-place. Otherwise, the
result is written to the standard output.

USAGE: clang-format [options] [<file> ...]

OPTIONS:

Clang-format options:

    =unknown                     -   If set, unknown format options are only warned about.
                                     This can be used to enable formatting, even if the
                                     configuration contains unknown (newer) options.
                                     Use with caution, as this might lead to dramatically
                                     differing format depending on an option being
                                     supported or not.
  --assume-filename=<string>     - Set filename used

In [20]:
cat /notebook/code/.clang-format

BasedOnStyle: Google
ColumnLimit: 120


In [21]:
%%sh
cd /notebook/code
cat long-lines.cpp

#include <array>
#include <iostream>
#include <string>

int main() {
  std::array<std::string, 10> my_strings = {
      "A first string",  "a second string",        "a third string",          "and so on",
      "and so forth",    "a testing string",       "a nother testing string", "my eighth string so far",
      "nine, one to go", "at last the last string"};

  for (const auto &str : my_strings) {
    std::cout << str << std::endl;
  }

  return 0;
}


In [22]:
%%sh
cd /notebook/code
g++ long-lines.cpp -Wall -std=c++17 -o ll_exec
./ll_exec

A first string
a second string
a third string
and so on
and so forth
a testing string
a nother testing string
my eighth string so far
nine, one to go
at last the last string


In [25]:
%%sh
cd /notebook/code
clang-format long-lines.cpp

#include <array>
#include <iostream>
#include <string>

int main() {
  std::array<std::string, 10> my_strings = {
      "A first string",  "a second string",        "a third string",          "and so on",
      "and so forth",    "a testing string",       "a nother testing string", "my eighth string so far",
      "nine, one to go", "at last the last string"};

  for (const auto &str : my_strings) {
    std::cout << str << std::endl;
  }

  return 0;
}


In [18]:
%%sh
cd /notebook/code
clang-format -i long-lines.cpp
git diff long-lines.cpp

diff --git a/code/long-lines.cpp b/code/long-lines.cpp
index f62e008..eae997f 100644
--- a/code/long-lines.cpp
+++ b/code/long-lines.cpp
@@ -1,14 +1,16 @@
 #include <array>
-#include <string>
 #include <iostream>
+#include <string>
 
 int main() {
-  std::array< std::string, 10 > my_strings = {"A first string", "a second string", "a third string", "and so on", "and so forth", "a testing string", "a nother testing string", "my eighth string so far", "nine, one to go", "at last the last string"};
+  std::array<std::string, 10> my_strings = {
+      "A first string",  "a second string",        "a third string",          "and so on",
+      "and so forth",    "a testing string",       "a nother testing string", "my eighth string so far",
+      "nine, one to go", "at last the last string"};
 
-  for ( const auto &str: my_strings ) {
+  for (const auto &str : my_strings) {
     std::cout << str << std::endl;
   }
-  
-    
+
   return 0;
 }


# clang-tidy can find performance bugs and modernize your code

  * ```clang-tidy``` is another LLVM tool that can find and replace out dated code constructs, or code that can be written in a more efficient manner
  * It works much the same way as clang-format, but requires full compilation commands (eg, a *compilation command database*) to run.
  * Extendable so you can write your own checks (perhaps your experiment has some of these running it its CI?)
  
I won't go through an example, but lets look at some of the ```clang-tidy``` documentation to understand what kind of code improvements it can make

https://clang.llvm.org/extra/clang-tidy/
