# Representing Data types in Digital computers

## Natural numbers

* Basically, computers can handle 0 or 1 only.

* Here, to **handle** means to store in memory, to operate, and to output.

* We call the smallest unit that can handle 0 or 1 a **bit**.

* When you hear about a personal computer is for example 32 bit or 64 bit, please recall this.

* It means the number of bits that the computer can process at once.

* To handle larger (natural) numbers, we need more number of bits.

* Following table shows decimal numbers in binary and hexadecimal format.

In [None]:
# https://stackoverflow.com/questions/35160256/how-do-i-output-lists-as-a-table-in-jupyter-notebook
# http://nbviewer.jupyter.org/github/ipython/ipython/blob/4.0.x/examples/IPython%20Kernel/Rich%20Output.ipynb

import math
import IPython

rows_list = ['''| Decimal | Binary | no. bits | no. bytes | Hexadecimal |
|:--------------:|:------------:|:------------:|:------------:|:-------------------:|''']

# Decimal number loop
for i in list(range(0, 21)) + [127, 128, 255, 256, 32767, 32768, 65535, 65536, 2**32-1, 2**32, 2**63-1, 2**63, 2**64-1, 2**64, ]:

    # Decimal format
    d_str = str(i)

    # Binary format
    b_str = f"{i:b}"

    # Number of bits
    n_bits = len(b_str)
    n_bits_str = str(len(b_str))
    
    if 17 < n_bits:
        b_str = '...'

    # Hexadecimal format
    h_str = f"{i:X}"

    # Number of bytes
    # Try `help(math.ceil)` to check what it does
    n_bytes = math.ceil(len(h_str) * 0.5)
    n_bytes_str = str(n_bytes)

    # Indicate a row of the table
    rows_list.append('|'.join(['', d_str, b_str, n_bits_str, n_bytes_str, h_str, '']))

IPython.display.display(IPython.display.Markdown('\n'.join(rows_list)))



* We can see that one hexadecimal digit represents four binary digits.

* Because of this, frequently we group four digits of a binary number; for example `1101 0101`.

* We call a collection of 8 bits a **byte**.

* Q: One digit of which base would represent three binary digits?

## Negative integers

* Example above implies that computers are designed to handle 0 and positive integers.

* To represent and handle negative integers, computers convert a negative integer to **2's complement**.

### An 8 bit example

* To simplify, we will look at the 8bit case first.

* Positive integer 7 is 00000111 in an 8 bit binary number.

* To find 2's complement of binary number 00000111, change 0's to 1's and add 1.

* 2's complement of binary number 00000111 would be 11111001.

* Adding these two numbers would be follows.

In [None]:
a = int('00000111', base=2)
b = int('11111001', base=2)

c = a+b
print(f'c = {c:b}(binary)')
print(f'c = {c:d}(decimal)')
print(f'c = {c:x}(hexadecimal)')



* The result above is 256 in decimal, however, in binary `1 0000 0000` whose lower 8 bits are all zeros. In case of the 8 bit operation, we regard this result as zero.

* Following table compares 8bit bit patterns vs `unsigned int8_t` and `signed int8_t` values.



In [None]:
# https://stackoverflow.com/questions/35160256/how-do-i-output-lists-as-a-table-in-jupyter-notebook
# http://nbviewer.jupyter.org/github/ipython/ipython/blob/4.0.x/examples/IPython%20Kernel/Rich%20Output.ipynb

import IPython

n = 8

table = [ f''' {n} bit bit pattern | `unsigned int{n}_t` | `signed int{n}_t` 
:-----------------:|:--------:|:------:''']

for i in range(0, 3):
    table.append(f'{i:0{n}b} | {i} | {i}')

table.append(f' ... | ... | ... ')

for i in range(2**(n-1)-2, 2**(n-1)-1+1):
    table.append(f'{i:0{n}b} | {i} | {i}')

for i in range(2**(n-1), 2**(n-1)+2+1):
    table.append(f'{i:0{n}b} | {i} | {i-(2**n)}')

table.append(f' ... | ... | ... ')

for i in range((2**n)-2, (2**n)-1+1):
    table.append(f'{i:0{n}b} | {i} | {i-(2**n)}')

IPython.display.display(IPython.display.Markdown('\n'.join(table)))



### A 16 bit example

* Following example shows a 16bit example.

In [None]:
a = int('0000''0000''0000''0111', base=2)
b = int('1111''1111''1111''1001', base=2)

c = a+b
print(f'c = {c:b}(binary)')
print(f'c = {c:d}(decimal)')
print(f'c = {c:x}(hexadecimal)')



* The result above is 65536 in decimal, however, in binary `1 0000 0000 0000 0000` whose lower 16 bits are all zeros.

* Following table compares 16bit bit patterns vs `unsigned int16_t` and `signed int16_t` values.



In [None]:
# https://stackoverflow.com/questions/35160256/how-do-i-output-lists-as-a-table-in-jupyter-notebook
# http://nbviewer.jupyter.org/github/ipython/ipython/blob/4.0.x/examples/IPython%20Kernel/Rich%20Output.ipynb

import IPython

n = 16

table = [ f''' {n} bit bit pattern | `unsigned int{n}_t` | `signed int{n}_t` 
:-----------------:|:--------:|:------:''']

for i in range(0, 3):
    table.append(f'{i:0{n}b} | {i} | {i}')

table.append(f' ... | ... | ... ')

for i in range(2**(n-1)-2, 2**(n-1)-1+1):
    table.append(f'{i:0{n}b} | {i} | {i}')

for i in range(2**(n-1), 2**(n-1)+2+1):
    table.append(f'{i:0{n}b} | {i} | {i-(2**n)}')

table.append(f' ... | ... | ... ')

for i in range((2**n)-2, (2**n)-1+1):
    table.append(f'{i:0{n}b} | {i} | {i-(2**n)}')

IPython.display.display(IPython.display.Markdown('\n'.join(table)))



### Summary

* Computers handle negative intergers as 2's complementary, which we can find by exchanging 0's and 1's of the binary representation of the integer's absolute value and then adding one.

## Real numbers

### Fixed point

* We can represent a number in an integer and multiply a fixed number.

* For example, we can indicate all lengths in cm units and multipy 0.01 to find values in m units. 

* However, in this way, it may not be easy to indicate in mm units.

### Floating point

* In short, we can also represent a real number using significand and exponents.

* For example, $2.3456 \times 10^0$ m would be $2.3456 \times 10^2$ in cm, and $2.3456 \times 10^3$ in mm.

* An engineering calculator may indicate $2.3456 \times 10^3$ as `2.3456E3`. Here, we can see that `2.3456` is the significand (also mantissa, coefficient, argument or fraction) and `3` is the exponent.

* Even if the sigtificand does not change, when the exponent changes, the location of the decimal point changes.

* On the contrary, $2.3456 \times 10^0$ mm would be $2.3456 \times 10^{-1}$ in cm, and $2.3456 \times 10^{-3}$ in m.

* Computers store in binary numbers. For more informaiton, please refer to [IEEE 754, Wikipedia](https://en.wikipedia.org/wiki/IEEE_754).

* Usually we use 4Byte (32bit) single precision or 8Byte (64bit) double precision, which includes $\pm$, exponent, and significand.

* Following table shows the breakout of 32 bits of single precision.  Here, `e` and `s` indicate exponent and sigdificand, respectively.

In [None]:
# https://stackoverflow.com/questions/35160256/how-do-i-output-lists-as-a-table-in-jupyter-notebook
# http://nbviewer.jupyter.org/github/ipython/ipython/blob/4.0.x/examples/IPython%20Kernel/Rich%20Output.ipynb

import IPython.display as disp

# number of bits
n = 32
ne = 8
ns = n - 1 - ne


disp.display(
    disp.Markdown(
        '\n'.join(
            [
                ' | '.join(str(k) for k in range(1, n+1)),
                '|'.join(':---:' for k in range(1, n+1)),
                ' | '.join(['$\pm$'] + ['`e`']*ne + ['`s`']*ns),
            ],
        )
    )
)



* Please note that the exponent value $0$ means $2^{-127}$ and $2^{8}-1=255$ means $2^{128}$.

* Following table shows the breakout of 64 bits of double precision.

In [None]:
# https://stackoverflow.com/questions/35160256/how-do-i-output-lists-as-a-table-in-jupyter-notebook
# http://nbviewer.jupyter.org/github/ipython/ipython/blob/4.0.x/examples/IPython%20Kernel/Rich%20Output.ipynb

import IPython.display as disp

# number of bits
n = 64
ne = 11
ns = n - 1 - ne


disp.display(
    disp.Markdown(
        '\n'.join(
            [
                ' | '.join(str(k) for k in range(1, n+1)),
                '|'.join(':---:' for k in range(1, n+1)),
                ' | '.join(['$\pm$'] + ['`e`']*ne + ['`s`']*ns),
            ],
        )
    )
)



* Please note that the exponent value $0$ means $2^{-1023}$ and $2^{11}-1=2047$ means $2^{1024}$.

## Complex numbers

* Usually two real numbers; respectively for real and imaginary parts. [[ref](https://en.cppreference.com/w/cpp/numeric/complex)]

In [None]:
import IPython.display as disp
import pylab as py

z1 = 1j * 1j

disp.display(disp.Latex(f'$i \\times i = {z1}$'))


z2 = 1j ** 2

disp.display(disp.Latex(f'$i^2  = {z2}$'))


pi = py.arccos(-1)
z3 = py.exp(1j * pi)

disp.display(disp.Latex(r'$e^{j\pi}  = '+f'{z3}$'))

z4 = 1 + 2j
z5 = 1 - 2j

disp.display(disp.Latex(f'${z4}\\times{z5} = {z4 * z5}$'))



* C++ has `std::complex`. Its example is as follows. [[ref](https://en.cppreference.com/w/cpp/numeric/complex)]

``` C++
#include <iostream>
#include <iomanip>
#include <complex>
#include <cmath>
 
int main()
{
    using namespace std::complex_literals;
    std::cout << std::fixed << std::setprecision(1);
 
    std::complex<double> z1 = 1i * 1i;     // imaginary unit squared
    std::cout << "i * i = " << z1 << '\n';
 
    std::complex<double> z2 = std::pow(1i, 2); // imaginary unit squared
    std::cout << "pow(i, 2) = " << z2 << '\n';
 
    double PI = std::acos(-1);
    std::complex<double> z3 = std::exp(1i * PI); // Euler's formula
    std::cout << "exp(i * pi) = " << z3 << '\n';
 
    std::complex<double> z4 = 1. + 2i, z5 = 1. - 2i; // conjugates
    std::cout << "(1+2i)*(1-2i) = " << z4*z5 << '\n';
}
```

* Here, `using namespace std::complex_literals;` enables writing a complex constant such as `1i` in the source code directly.

* `std::complex<double> z1` declares a C++ variable named `z1`.<br>  "`complex` with real and imaginary parts in `double` precision floating point" can be its type.

# Characters and Strings

* If computers can handle only 0s and 1s, how come we are reading this page?

* It is because of a set of promises that we call [**character encodings**](https://en.wikipedia.org/wiki/Character_encoding).

* Following table depicts the ASCII table, one of such standards.

In [None]:
# https://stackoverflow.com/questions/35160256/how-do-i-output-lists-as-a-table-in-jupyter-notebook
# http://nbviewer.jupyter.org/github/ipython/ipython/blob/4.0.x/examples/IPython%20Kernel/Rich%20Output.ipynb

import IPython

n = 8

# Python can use triple quote to represent a multi line string
table = [ f''' {n} bit bit pattern | `unsigned int{n}_t` | `char` 
:-----------------:|:--------:|:------:''']

for i in range(0, 8+1):
    table.append(f'{i:0{n}b} | {i} | {i:c}')

# Python can use '' or "" to represent a string literal
table.append(f" ... | ... | ... ")

for i in range(14, 50+1):
    table.append(f'{i:0{n}b} | {i} | {chr(i)}')

table.append(f' ... | ... | ... ')

for i in range(57, 65+1):
    table.append(f'{i:0{n}b} | {i} | {i:c}')

table.append(f' ... | ... | ... ')

for i in range(90, 97+1):
    table.append(f'{i:0{n}b} | {i} | {i:c}')

table.append(f' ... | ... | ... ')

for i in range(122, 127+1):
    table.append(f'{i:0{n}b} | {i} | {i:c}')

table.append(f' ... | ... | ... ')


IPython.display.display(IPython.display.Markdown('\n'.join(table)))



``` C++
// http://www.cplusplus.com/reference/fstream/fstream/
#include <fstream>
// https://stackoverflow.com/questions/7349689/
#include <bitset>

using namespace std;

int main(int argn, char *argv[]){

    // C/C++ uses "" to represent a string
	ofstream ofs("ascii_table.md", std::ofstream::out);	
    // '' indicates a single character
    char endl = '\n';
	char elipsis_line[] = " ... | ... | ... \n";
	const int n = 8;

	ofs << " " << n << " bit bit pattern | `unsigned int" 
        << n << "_t` | `char`\n";
	ofs << ":-----------------:|:--------:|:------:\n";

	for (int i=0; i<(8+1); ++i){
		ofs << bitset<n>(i) << " | " << i << " | " << char(i) << endl;
	}

    ofs << elipsis_line;

        for (int i=14; i<(50+1); ++i){
                ofs << bitset<n>(i) << " | " << i << " | " << char(i) << endl;
        }
        ofs << elipsis_line;

        for (int i=57; i<(65+1); ++i){
                ofs << bitset<n>(i) << " | " << i << " | " << char(i) << endl;
        }
        ofs << elipsis_line;

	for (int i=90; i<(97+1); ++i){
                ofs << bitset<n>(i) << " | " << i << " | " << char(i) << endl;
        }
        ofs << elipsis_line;

	for (int i=122; i<(127+1); ++i){
                ofs << bitset<n>(i) << " | " << i << " | " << char(i) << endl;
        }
        ofs << elipsis_line;


	ofs.close();

	return 0;
}


```

### Strings

* To represent a word or a sentence, we use a group of characters in order.

* In python, all characters are strings with length of one.

In [None]:
s = "Hello World!"

print(s)



* In C/C++, a string always ends with a character '\0'; hence we call it a "zero terminated string".

``` C++
#include <iostream>

using namespace std;

int main(int argn, char * argv []){
    char s[] = "Hello World!";
    int n = sizeof(s);
    
    for (int i = 0; i < n; ++i){
    cout << "s[" << setw(2) <<i <<"] = " 
         << s[i] << "(" << setw(2) << (int) s[i]<< ")\n";
    }
    return 0;
}
```



* In python, we cannot change one character within a string; in C/C++ we can change any one character except the '\0'.

In [None]:
s = "hello World!"
# Following is an exception handling block
try:
    # This will cause an error
    s[0] = 'h'
except TypeError as e:
    # This line will present the error message
    print(e)



``` C++
#include <iostream>

using namespace std;

int main(int argn, char * argv []){
    char s[] = "Hello World!\n";
    s[0] = 'H';
    cout << s ;    
    return 0;
}
```



## Converting between Types

* What if we need to multiply an integer with a floating point real number (float)?

* Python, C, and C++ all convert the integer into a float.

* What if we want to convert a float into an intger?

* Following C++ example includes some of possible type conversions.

``` C++
#include <cstdio>

int main(int argn, char *argv[]){

    int i = 1;
    double s = 0.1, q = 2.04;

    double pi_float = i + q + s;
    int pi_int = (int)(i + q + s);

    char c = 'c';

    printf("pi_float = %f\n", pi_float);
    printf("pi_int = %d\n", pi_int);
    printf("c (in char) = %c\n", c);
    printf("c (in int) = %d\n", (int) c);

}

```

## `list` and `tuple` from Python vs arrays from C/C++

* To store multiple entities under one variable name, we can use these :

| type | declaration |
|:-----:|:------------:|
| Python `list` | `s = [1, 2, '3', [4]]` |
| Python `typle` | `t = ('1', '2', 3, '4')` |
| C/C++ array | `int a[] = {1, 2, 3, 4};`<br>`char b[] = "1234";` |



* Following operations are possible:

| operation | `python` | `C`/`C++` |
|:-----:|:------------:|:------------:|
| read | `s[0]` | `a[0]`
| write | `s[0]=5` | `a[0]=5`
| number of elements | `len(s)` | `sizeof(a)/sizeof(a[0])`
| size in bytes | N/A |  `sizeof(a)`
| pop last element | `s.pop()` | N/A
| add a new last element | `s.append(5)` | N/A
| concatenate two of them | `s0 + s1` | N/A
| insert a new element at `i` | `s.insert(i, 'a')` | N/A
| slicing | `s[0:2]` | N/A



* In fact, this may not be a fair comparison.



* There is a [discussion](https://stackoverflow.com/questions/17528657/python-list-equivalent-in-c) claiming that `std::vector` or `std::deque` of C++ could be similar to python `list`.

* Implementing full features of `list` may not be straight forward in C/C++.

## `dict` of Python vs `struct` and `union` of C/C++

[[ref0](https://stackoverflow.com/questions/330793/how-to-initialize-a-struct-in-accordance-with-c-programming-language-standards), [ref1](https://en.cppreference.com/w/c/language/struct_initialization)]

* To store data with the field names, followings are available:

| type | declaration |
|:-----:|:------------:|
| Python `dict` | `s = {'name' = 'python', 'year' = 1989, 'cmd' = 'python'}` |
| C/C++ `struct` | `struct {char name[], int year[], char cmd[]} t = {.name="C", .year=1970, .cmd="python"};` |
| C/C++ `union` | `union {int x, char c[4]} u = {1};` |



* Following operations are possible:

| operation | `python` | `C`/`C++` |
|:-----:|:------------:|:------------:|
| read | `s['name']` | `t.name`
| write | `s['name']='Python'` | `t.year=1972;`
| size in bytes | N/A |  `sizeof(t)`
| delete one element | `del s['cmd`] | N/A
| add a new element | `s.['ide']='idle'` | N/A
| merge two of them | `s0.update[s1]` | N/A



* This is not a fair comparison, either.



* Another [discussion](https://stackoverflow.com/questions/1842941/translating-python-dictionary-to-c) claims that `std::map` of C++ could be similar to python `map`.