# Topical lectures on modern C++ with a bias towards contributing to HEP codes 

Presenter: David Lange (Princeton University)

May 2023 HSF-India tutorials

# Goals for these lectures
  * Introduce concepts needed for CUDA lectures later this week
  * Introduce how the evolving C++ standard helps to write clear and performant code
  * Introduce some needed techniques for developing codes in a larger ecosystem 
  
# Topics for these lectures (today and tomorrow)
  * Language introduction
  * Declaration syntax
  * Loops
  * Standard Template Library
  * Modern pointers
  * Memory considerations
  * Parallelism
  * Compile and link
  * Optimization
  * Tools
  

# C++ language and its evolution

  * The C++ programming language was devised by Bjarne Stroustrup - then an employee of Bell Labs (AT&T). Stroustrup started working on C with Classes in 1979 and the first commercial release of the C++ language was in October 1985
    * Strengths of C++ include how it allows researchers to write efficient code without losing high-level abstraction; its support for developing both large scale (eg, HPC) applications and low-level drivers and embedded systems; and its support for high-performance/latency-critical applications.
    * Now a mature ecosystem: Many available utilities to help programmers write and debug. These include debuggers, memory checkers, coverage, static analysis, profiling tools, etc
    * The C++ language continues to evolve to support modern architecturs and improved code quality/performance
  * HEP moved to C++ (from Fortran) as its primary programming language for computationally intensive applications starting in the late 1990s. Now the field has 100s of millions of lines of code in production. It would be a significant undertaking to rewrite this code base.

# C++ language and its evolution (II)
“The Evolution of C++Past, Present, and Future”, B. Stroustrup, CppCon16

<img src="./dl_images/cppmap.png" width="400" />

# Resources

## These are some web-based (and open) resources that you might find useful for following up on these lectures
  * C++ standard pages https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md
  * HSF C++ course: https://github.com/hsf-training/cpluspluscourse (including videos)
  * github course I found
  * https://en.cppreference.com/w/ : Usage and syntax including how it evolves with C++ standards and including (non-trivial) examples

# C++ in notebooks

  * Much of this course will be using C++ in notebooks. This is great for learning and prototyping.
  * It does let us take shortcuts - for example, often proper header file includes can be skipped, compiling and linking happens behind the scenes so we do not need to worry about. 
    * Do not worry - we will come back to these topics at the end (hopefully)

# Declaring variables

  * C++ enforces type safety. This includes compile time checking

In [None]:
int16_t a = 1; // Prefer fixed-width (signed or unsigned) integers instead of native types
int16_t a2(1); // Direct initialization
int16_t a3=int(1); // Copy initialization
int16_t a4{1}; // Copy-list initialization

std::cout << a << " " << a2 << " " << a3 << " " << a4 << std::endl;

In [None]:
//Pitfalls that the compiler can help you avoid
uint16_t c = -1;

std::cout << c << std::endl;

In [None]:
// Prefer {} to = as it allows compile time checks
uint16_t c{-1}; // {} allows the compiler to find your mistake at compile time

In [None]:
//Same for floats
float f1=10e40;
std::cout << f1 << std::endl;

In [None]:
float f2{10e40};

# Declaring arrays has also gotten easier

In [None]:
//helper function
template<class T>
void print(T *arr, int16_t len) {
   for ( int16_t i=0; i<len; i++ ) std::cout << arr[i] << " ";
   std::cout << std::endl; 
}

In [None]:
//An old way - lots of typing, and hard to read
double a[4];
a[0]=1.0;
a[1]=1.0;
a[2]=1.0;
a[3]=1.0;
print(a,4);


In [None]:
//Another old way
double a[4];
for ( int i=0; i<4; i++ ) { a[i] = 1.0; }
print(a,4);

In [None]:
//Example newer approaches
double a[4]{1.0, 1.0, 1.0, 1.0};
print(a,4);

In [None]:
//or even shorter
double a[]{1.0,1.0,1.0,1.0};
std::cout << a[0] << " " << a[1] << " " << a[2] << " " << a[3] << std::endl;

In [None]:
//it is also easy to use the default initializer
double a[4]{0};
print(a,4);
double b[4]{};
print(b,4);

In [None]:
//maybe not the behavior that you expected?
double a[4]{1.0};
print(a,4);

  * We will come back to arrays later..

# Lets talk about loops

In [None]:
//Basic old-style loop
double a[4]{1.0,2.0,4.0,9.0};

double sum_sq{0.0};
for (int i=0; i<4; i++) 
    sum_sq += a[i]*a[i];
print(&sum_sq,1);


In [None]:

//Range-based loops can simplify this syntax
double sum_sq{0.0};
for ( double v: a) 
    sum_sq += v*v;
print(&sum_sq,1);

In [None]:

//You can also inline the values to loop over
double sum_sq{0.0};
for ( double v: {1.0,2.0,4.0,9.0}) 
    sum_sq += v*v;
print(&sum_sq,1);

In [None]:

//And let the compiler understand the type using "auto"
double sum_sq{0.0};
for ( auto v: {1.0,2.0,4.0,9.0}) 
    sum_sq += v*v;
print(&sum_sq,1);

# Pitfalls in loops

In [None]:
float x = 0.0f;
for (int i = 0; i < 40000000; i++) 
    x += 1.0f;
std::cout << (int)x << std::endl;



In [None]:
//Think about when to use floating point vs fixed point variables
int64_t x = 0.0f;
for (int i = 0; i < 40000000; i++) 
    x += 1;
std::cout << (int)x << std::endl;



  * This pitfall isn't really about loops, it is about floating point representations
  * Floating point precision is finite! For example, floats in C++ have 1 sign bit, 8-bit exponent, 23-bit significand

In [None]:
std::cout << 16777217 << std::endl;
std::cout << (int) 16777217.0f << std::endl;
std::cout << (int) 16777217.0 << std::endl; 
std::cout << (int) std::pow(2,24) << std::endl;

  * using double precision is one solution (1 sign bit, 11-bit exponent, 52-bit significand), but requires 2x the clock cycles for any calculation
  * Complex numerical calculations require care, especially when the developer does not know (or have control over) input parameters. Examples that you need to consider
    * Overflows/underflows
    * Precision loss due to accidental cancelation of terms

# Using the Standard Template Library to reduce code complexity

  * The STL is a set of powerful, predefined abstract datatypes, functions, and algorithms designed to handle user-specified datatypes (as well as basic types).
  * By using them you can reduce the ammount of code you have to write and make it easier to optimize and maintain your code in the future (as the compiler will do that for you as it improves)
  

In [None]:
// vector is a resizeable container of any type
// For example, a vector of integers
std::vector<int> a(10,1.0);
print(&a[0],a.size());

In [None]:
a.resize(20,2.0);
print(&a[0],a.size());

In [None]:

// A common pattern to avoid when possible
std::vector<int> a;
for (int i=0; i<10; i++)
    a.push_back(i*i);
print(&a[0],a.size());


  * What is wrong with this syntax?

  * Memory is allocated when the vector length is increased. (in this simple example, the compiler likely optimizes it away)
  * Repeated allocations are slow and fragment memory
  * Use reserve to pre-allocate the size of the vector when you know the size of the vector

In [None]:

std::vector<int> a;
a.reserve(10);
for (int i=0; i<10; i++)
    a.push_back(i*i);
print(&a[0],a.size());


# How should you loop over vectors? As with arrays, we can use recent language improvements to increase readability and performance

In [None]:
std::vector<int> a(20,3.0);

int sum=5;
for ( std::vector<int>::size_type i=0; i< a.size(); i++)
    sum += a[i];
std::cout << sum << std::endl;

In [None]:
int sum=5;
for ( auto i = a.begin(); i!=a.end(); i++)
    sum += *i;
std::cout << sum << std::endl;

In [None]:
int sum=5;
for ( int i: a)
    sum += a[i];
std::cout << sum << std::endl;

In [None]:
int sum=5;
for ( auto i: a)
    sum += a[i];
std::cout << sum << std::endl;

In [None]:
int sum=5;
for ( const auto &i: a)
    sum += a[i];
std::cout << sum << std::endl;

  * What is the benefit of adding const &?

# The STL also provides many helpful functions. 

In [None]:
// In this case, std::accumulate
// lets you avoid writing the loop completely
// Note: Can be used for variables of other types. Anything with a "begin()" and "end()" function
int sum = std::accumulate(a.begin(), a.end(), 5);
std::cout << sum << std::endl;

In [None]:
//It is straightforward to customize which elements are considered. 
//For example, we can skip the first and last one
int sum = std::accumulate(a.begin()+1, a.end()-1, 5);
std::cout << sum << std::endl;
//but beware of overflows and underflows if the length of the vector is unknown...

# You can also do more customized calculations with minimal code

In [None]:
int sq(int c, int d) {return c + d*d;} 
int sum = std::accumulate(a.begin(), a.end(), 0,sq);
std::cout << sum << std::endl;

In [None]:
auto lambda = [&](int c, int d){return c + d*d; };
int sum = std::accumulate(a.begin(), a.end(), 0,lambda);
std::cout << sum << std::endl;

  * Prefer lambda syntax here...

  * Lets investigate other functions provided by the algorithm package https://en.cppreference.com/w/cpp/algorithm

# There is lots more beyond vectors in the STL

  * array (prefer array to vector when possible)
  * string (prefer string to char*, TString, or other custom string classes)
  * map
  * unordered_map (prefer unordered_map to map when possible)

# Arrays are vectors that are not resizeable

In [None]:
//Initialization works just like before
std::array<int, 10> a{1,2,3,4,5,6,7,8,9,10};
print(&a[0],a.size());


# Why use arrays?

  * Compared to using std::vector More memory efficient when you know the size of your array. std::array has zero memory overhead compared to raw arrays
  * Compared to using raw arrays: like vector, std::array provides bounds checking and helper functions for things like copying and looping
  * HEP note: Last I tried, ROOT serialization has a love-hate relationship with std::array. Use with care for use cases that 

# An example of added safety using std::array

In [None]:
int sum(const std::array<int, 10> &a) {
   return std::accumulate(a.begin(),a.end(),0);
}

std::array<int, 10> a{1,2,3,4,5,6,7,8,9,10};
std::cout << sum(a) << std::endl;

  * No need to worry about getting the wrong array length. The function interface enforces that the array should be length 10 (and provides a self documenting interface)

# Strings

In [None]:
// Creating a string from const char*
std::string str1 = "hello";
 
// Creating a string using string literal
auto str2 = "world"s;
 
// Concatenating strings
std::string str3 = str1 + " " + str2;
 
// Print out the result
std::cout << str3 << '\n';

In [None]:
 
std::string::size_type pos = str3.find(" ");
str1 = str3.substr(pos + 1); // the part after the space
str2 = str3.substr(0, pos);  // the part till the space
 
std::cout << str1 << ' ' << str2 << '\n';
 

In [None]:
// Accessing an element using subscript operator[]
std::cout << str1[0] << '\n';
str1[0] = 'W';
std::cout << str1 << '\n';

  * As we saw with std::vector, there are more and more helper functions that facilitate string manipulations in newer C++ standards. (but use Python if thats the primary purpose of your program...)  

# map and unordered_map

  * These containers do what their name suggests - store key-value pairs.
  * std::map performance is O(log(n)) and elements are ordered
  * std::unordered_map uses a hash table with performance between O(1) and O(n)
  * Prefer std::vector to either when possible in performance sensitive applications (different from Python).

In [None]:
std::unordered_map<std::string, std::string> u = {
     {"RED","#FF0000"},
     {"GREEN","#00FF00"},
     {"BLUE","#0000FF"}
};
 
// Helper lambda function to print key-value pairs
auto print_key_value = [](const auto& key, const auto& value) {
    std::cout << "Key:[" << key << "] Value:[" << value << "]\n";
};
 
std::cout << "\nIterate and print key-value pairs using C++17 structured binding:\n";
for( const auto& [key, value] : u )
    print_key_value(key, value);

In [None]:

// Add two new entries to the unordered_map
u["BLACK"] = "#000000";
u["WHITE"] = "#FFFFFF";
 
std::cout << "\nCheck what we just did:\n"
             "The HEX of color RED is: " << u["RED"] << "\n"
             "The HEX of color BLACK is: " << u["BLACK"] << "\n";

In [None]:
std::cout << "\nIterate and print key-value pairs using C++17 structured binding:\n";
for( const auto& [key, value] : u )
    print_key_value(key, value);
 

# Pointers

  * A "pointer" (eg, "int*") is a value referring to a memory location
  * "Dereference" a pointer to get the value stored in that memory location
  * Pointers support operators (+,-,++,--), subscript operators [] and comparison operators (==, !=)

In [None]:
int* ptr = nullptr; // avoid ptr=0 for type safety checks

ptr = new int; // create an integer on the heap
*ptr = 10;
std::cout << *ptr << std::endl;
delete ptr; //always clean up (unless some other variable takes ownership)

In [None]:
// Similarly you can allocate an array
int* ptr = new int[10]; //create an array of integers on the heap
ptr[2]=10;
int myint = ptr[2];
std::cout << myint << std::endl;
delete [] ptr; //always clean up (unless some other variable takes ownership)

In [None]:
int vals[4] = {1, 2, 3, 4}; 
int *ptr = vals;
std::cout << ptr[1] << std::endl; 

In [None]:
// Two other ways to say the same thing
std::cout << *(ptr + 1) << std::endl; 

int* ptr1 = ptr + 2; 
std::cout << ptr1[-1] << std::endl;

In [None]:
// We can also inspect the memory location
std::cout << ptr << std::endl;

In [None]:
std::cout << ptr + 1 << std::endl; 

# Best practice - use modern pointers when possible

  * std::unique_ptr and std::shared_ptr let you avoid worrying about memory management. As it is easy to forget to delete memory you allocated, this is very convienent.
      


# std::unique_ptr

In [None]:
#include <memory>
std::unique_ptr<int> p{ new int{10} };

//Or avoid "new" completely
std::unique_ptr<int> p1 = std::make_unique<int>(11);

//Print the value and get the memory location
std::cout << *p << " " << p.get() << " " << *p1 << " " << p1.get() << std::endl;

# unique_ptrs are *unique*

In [None]:
std::unique_ptr<int> p2;
p2=p;

# Instead their ownership can be transfered

In [None]:
std::unique_ptr<int> p2(std::move(p));
std::cout << *p2 << " " << p2.get() << " " << std::endl;

  * When the unique_ptr goes out of scope, the memory it points to is cleaned up
  

# shared_ptrs are a reference counted alternative

  * Memory is cleaned up when no variable refers to the memory

In [7]:
auto sptr = std::make_shared<int>(12);
auto sptr2 = sptr; //now this works and creates another "reference"
std::cout << sptr.use_count() << " " << (*sptr) << " " << ++(*sptr) << std::endl;

2 12 13


  * Thats the end of this notebook...