# Survival C++ and how to use it to work with data

In order to understand the course material we must first understand the C language and some basic concepts from C++. Although C++ is the language of choice for many real-world scientific applications, it is more complex to understand than plain C, and can potentially introduce uncecessary complexity that may get in the way of effectively grasping the course material. This course will use mostly the C subset of the C++ language, however we use C++ compilers and tools so that we can use more advanced C++ features if convenient.

## Comments

Comments in C++ begin with two forward slashes //. Anything after the two forward slashes is ignored for the rest of the line.

```C++
// This is a comment
```

## Statements and code blocks

A statement is a line of code. Since C++ doesn't use new lines to separate statements then every statement must have an ending, C++ uses the semicolon ";" character to end statements.

```C++
// This is a statement that declares and integer with value 2
int a = 2;
```

**Code blocks** are collections of statements enclosed by a starting brace "{", and ending with an end brace "}". Variables declared within code blocks do not exist outside their enclosing code block.

```C++
{ // Starting a code block
    int a = 2;
} // Ending a code block
```

## Basic data types

There are only really two kinds of data types, integers and floats, with varying numbers of bits to represent them. Every other data type is a reinterpretation or compound mixture of these basic data types.

### Integers

Integers represent whole numbers. There are two main types, signed and unsigned, with varying numbers of bits to represent them. For integers of length **N**, each bit is associated with a power of 2, and the value of the integer is the dot product between the vector of bits (1's and 0's) and a vector of decreasing powers of 2 ranging from $2^{N-1}$ to $2^{0}$ . With signed integers the most common representation is **two's complement**. This is where the first (most significant or highest value) element in the vector of two's is negative.  

<figure style="margin-left:auto; margin-right:auto; width:60%;">
    <img style="vertical-align:middle" src="images/integers.svg">
    <figcaption style= "text-align:lower; margin:1em; float:bottom; vertical-align:bottom;">Signed and unsigned integers with N=8 bits.</figcaption>
</figure>

Using the formulae for values we can derive the largest and smallest values for the integer types.

| | signed | unsigned | 
| :- | :- | :- |
| **smallest value** | $$-2^{N-1}$$ | $$0$$ | 
| **largest value** | $$2^{N-2}-1$$ | $$2^{N-1}$$ | 


There are a number of data types in C that represent integers with varying numbers of bits. Not every integer type means the same number of bits on every platform.

| Nominal number of bits (N) | name of signed form | name of unsigned form | 
| :- | :- | :- |
|8|char|unsigned char| 
|16|short|unsigned short|
|32|int|unsigned int| 
|64|long|unsigned long|

The C statement to create integers is, for example:

```C++
// Creating integers
int a=10;
unsigned int b=10;
```

The character type (char) is just an 8-Bit integer whose value is associated with a lookup table of ASCII characters. We create a character in C using the following form.

```C++
// Create a character
char c = 'a';
```

### IEEE754 Floating point numbers

The IEEE754 standard for floating point numbers was established in 1985 and is the standard used in many applications, including OpenCL implementations. The bits are laid out in three sections: a Sign bit, an Exponent, and a Mantissa. The Sign bit occupies one bit, the Exponent has **NBE** bits and the Mantissa has **NBM** bits. The total number of bits for a floating point number is then NBE+NBM+1. The Exponent **E** is just an unsigned integer created by a dot product of the Exponent bits with a vector of decreasing positive powers of 2 ranging from $2^{\mathrm{NBE}-1}$ to $2^0$. The Mantissa is constructed the same way, but the value is $2^{0}==1$ plus the dot product of the Mantissa bits with a vector of decreasing powers of 2 ranging from $2^{-1}$ to $2^{-\mathrm{NBM}}$. All three components combine together to form the value, as shown below.

<figure style="margin-left:auto; margin-right:auto; width:100%;">
    <img style="vertical-align:middle" src="images/floating_point.svg">
    <figcaption style= "text-align:lower; margin:1em; float:bottom; vertical-align:bottom;">Floating point representations with differing numbers of bits.</figcaption>
</figure>

The standard reserves special meaning (such as $\pm \infty$, NaN, or subnormal), for when the bits in the Exponent are either all zeros 0 or all 1. Therefore the smallest value is $E=1$ and the largest is $E=(2^{\mathrm{NBE}}-1)-1$. Within these limits we see that a floating point number can describe everything within the following range of numbers:

| | | 
| :- | :- |
| **smallest value** | $$ 2^{1-\mathrm{Bias}}$$ 
| **largest value** | $$ \left (2-2^{\mathrm{-NBM}} \right) 2^{\mathrm{Bias}}$$ 

The **frexp** function in Python and C returns the values $x=0.5 \times (-1)^{S} \times \left ( 1.0 + \sum^{\mathrm{NBM}-1}_{i=0} B_i 2^{i-\mathrm{NBM}} \right )$ and $y=E-\mathrm{Bias}+1.$ to make $$\mathrm{Value} = x \times 2^{y}$$

In [30]:
import math
(x,y) = math.frexp(1.0)
print(x,y)

0.5 1


For any given floating point number $f$ within the limits, the next represented floating point number is exactly at a spacing of $\Delta f=2^{(E-\mathrm{Bias}-\mathrm{NBM})}=2^{(y-1-\mathrm{NBM})}$. A twofold change in spacing occurs at every power of two.

#### Example

For 32-bit floating point numbers the number of bits in the Exponent is $\mathrm{NBE}=8$ and the number of bits in the Mantissa is $\mathrm{NBM=23}$, therefore $\mathrm{Bias}=127$. The smallest (normal) floating point representation is $2^{1-\mathrm{Bias}}=1.1754944\times 10^{-38}$ and the largest (normal) floating point representation is $3.4028235\times10^{38}$. For values around $f=1.0$, where $E-\mathrm{Bias}=0$ the spacing to the next floating point representation is $\Delta f=2^{-23} \approx 1.1920929 \times 10^{-7}$.

### Floating point numbers in C

The C floating point data types for varying numbers of bits are as follows:

| Nominal number of bits (NBM+NBE+1) | name |  
| :- | :- | 
|16|half||
|32|float| 
|64|double|
|64-128| long double|

Declaring a floating point number in C may be done as follows:

```C++

// half precision (16-bit), only on some platforms
half a=2.0;

// single precision (32-bit)
float b=2.0;

// double precision (64-bit)
double c=2.0;

// quadruple precision (64-bit)
long double d=2.0;

```


### Pointers and the stack

Thus far we have been initialising integers and floats as statements in a program. These values require memory somewhere in which to store the bits. When the program is executed, the operating system allocates memory for the values in a pre-prepared and reserved area of memory for the program called the **stack**. In C/C++ one can get the starting address (as an integer) of the allocated memory, and this address is called a **pointer**.

Pointers can store the address of any memory allocation. The pointer type determines how the memory will be accessed, i.e as a float, as an integer etc. We create a pointer to a particular data type by declaring a variable as usual and putting a **\*** in front of the variable name to indicate that it is a pointer to a variable of that type.

```C++
int *p;  // Create a pointer to an integer called p
int a=2; // Create an integer and give it a value of 2
```

The address operator **&** gets the address of allocated memory, which can be assigned to the pointer.

```C++
p = &a;  // Get the address of the integer and assign it to p
```

Finally, the de-referencing operator **\*** can access the value pointed to by a pointer.

```C++
int y = *p; // Access the value pointed to by p and assign to y
```

There is a danger here, if **p** was not assigned to an address but we still tried to use it then the result will lead to a memory error.


### Memory allocation on the heap

* Stacks and heaps, allocating and de-allocating memory
* OpenCL Data types
* Reading and writing to and from a file
* Structures
* Multi-dimensional array access
* math functions in C
* programs
* Debugging with GDB
* Exercise: debugging with GDB