<a href="https://colab.research.google.com/github/kangwonlee/2018pycpp/blob/colab-buttons/10.data-types-and-operators/10.types.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Representing Data in Digital computers



## Natural numbers



* Basically, computers can handle 0 or 1 only.



* Here, to **handle** means to store in memory, to operate, and to output.



* We call the smallest unit that can handle 0 or 1 a **bit**.



* When you hear about a personal computer is for example 32 bit or 64 bit, please recall this.



* It means the number of bits that the computer can process at once.



* To handle larger (natural) numbers, we need more number of bits.



* Following table shows decimal numbers in binary and hexadecimal format.



In [None]:
import numpy as np
import pandas as pd


# make a table of deciman, binary, and hexademcial numbers

# decimal
decimal = pd.Series(
    list(range(0,20+1)) +
    [
      2**7-1, 2**7, 2**8-1, 2**8,
      2**15-1, 2**15, 2**16-1, 2**16,
      2**32-1, 2**32,
      2**63-1, 2**63, 2**64-1, 2**64,
    ], name="Decimal"
)

# hexadecimal
hexadecimal = decimal.apply(hex)

# number of Bytes
n_bytes = np.ceil((hexadecimal.str.len() - 2).div(2)).astype(int)

# binary
binary = decimal.apply(bin)

# number of bits
n_bits = (binary.str.len() - 2).astype(int)

# if binary number too long, set "..."
binary[n_bits.index[n_bits>20]] = "..."

# Make a Pandas DataFrame
df_dec_bin_hex = pd.DataFrame(
    {
        "Decimal":decimal,
        "Binary":binary,
        "no. bits":n_bits,
        "no. bytes":n_bytes,
        "Hexadecimal":hexadecimal
     },
)

df_dec_bin_hex.set_index("Decimal", inplace=True)

# present the table
df_dec_bin_hex



* We can see that one hexadecimal digit represents four binary digits.



* Because of this, frequently we group four digits of a binary number; for example `1101 0101`.



* We call a collection of 8 bits a **byte**.



* Q: One digit of which base would represent three binary digits?



## Negative integers



* Example above implies that computers are designed to handle 0 and positive integers.



* To represent and handle negative integers, computers convert a negative integer to **2's complement**.



### An 8 bit example



* To simplify, we will look at the 8bit case first.



* Positive integer 7 is 00000111 in an 8 bit binary number.



* To find 2's complement of binary number 00000111, change 0's to 1's and add 1.



* 2's complement of binary number 00000111 would be 11111001.



* Adding these two numbers would be follows.



In [None]:
# convert strings to integers. 
# regard the base is two == binary numbers
a = int('00000111', base=2)
b = int('11111001', base=2)

# see help(int) for more information

# add two integers above
c = a+b

# print the sum c as a binary number
print(f'c = {c:b}(binary)')

# as a decimal number
print(f'c = {c:d}(decimal)')

# as a hexadecimal number
print(f'c = {c:x}(hexadecimal)')



* The result above is 256 in decimal, however, in binary `1 0000 0000` whose lower 8 bits are all zeros. In case of the 8 bit operation, we regard this result as zero.



* Following table compares 8bit bit patterns vs `unsigned int8_t` and `signed int8_t` values.



In [None]:
import pandas as pd


def bit_unit_sint_table(n:int):
  """
  make a table of unsigned or signed integers
  """
  uint = [0, 1, 2,] + list(range(2**(n-1)-2, 2**(n-1)+3)) + [(2**n-2), (2**n-1)]

  bit_key = f"{n} bit bit pattern"
  uint_key = f"unsigned int{n}_t"
  sint_key = f"signed int{n}_t"


  df_bit = pd.DataFrame(
    {
      bit_key: list(map(lambda i: f"{i:0{n}b}", uint)),
      uint_key: uint,
      sint_key: list(map(lambda ui: ui if ui < 2**(n-1) else (ui - 2**n), uint)),
    }
  )

  elipsis = pd.DataFrame({
        bit_key: ["..."],
        uint_key: ["..."],
        sint_key: ["..."],
      })

  df_bit = pd.concat([
      df_bit.loc[:2,:],
      elipsis,
      df_bit.loc[3:7,:],
      elipsis,
      df_bit.loc[8:,:],
  ], axis=0)

  df_bit.set_index(bit_key, inplace=True)
  return df_bit



In [None]:
# number of binary digits
n = 8

# present the table of unsigned or signed integers
bit_unit_sint_table(n)



### A 16 bit example



* Following example shows a 16bit example.



In [None]:
# Repeating the above example in 16bits
a = int('0000''0000''0000''0111', base=2)
b = int('1111''1111''1111''1001', base=2)

c = a+b
print(f'c = {c:b}(binary)')
print(f'c = {c:d}(decimal)')
print(f'c = {c:x}(hexadecimal)')



* The result above is 65536 in decimal, however, in binary `1 0000 0000 0000 0000` whose lower 16 bits are all zeros.



* Following table compares 16bit bit patterns vs `unsigned int16_t` and `signed int16_t` values.



In [None]:
# number of binary digits
n = 16

# present the table of unsigned or signed integers
bit_unit_sint_table(n)



### Summary



* Computers handle negative intergers as 2's complementary, which we can find by exchanging 0's and 1's of the binary representation of the integer's absolute value and then adding one.



## Real numbers



### Fixed point



* We can represent a number in an integer and multiply a fixed number.



* For example, we can indicate all lengths in cm units and multipy 0.01 to find values in m units. 



* However, in this way, it may not be easy to indicate in mm units.



### Floating point



* In short, we can also represent a real number using significand and exponents.



* For example, $2.3456 \times 10^0$ m would be $2.3456 \times 10^2$ in cm, and $2.3456 \times 10^3$ in mm.



* An engineering calculator may indicate $2.3456 \times 10^3$ as `2.3456E3`. Here, we can see that `2.3456` is the significand (also mantissa, coefficient, argument or fraction) and `3` is the exponent.



* Even if the sigtificand does not change, when the exponent changes, the location of the decimal point changes.



* On the contrary, $2.3456 \times 10^0$ mm would be $2.3456 \times 10^{-1}$ in cm, and $2.3456 \times 10^{-3}$ in m.



* Computers store in binary numbers. For more informaiton, please refer to [IEEE 754, Wikipedia](https://en.wikipedia.org/wiki/IEEE_754).



* Usually we use 4Byte (32bit) single precision or 8Byte (64bit) double precision, which includes $\pm$, exponent, and significand.



* Following table shows the breakout of 32 bits of single precision.  Here, `e` and `s` indicate exponent and sigdificand, respectively.



In [None]:
import pandas as pd


pd.set_option("display.max_columns", 36)

def float_table(n:int, ne:int) -> pd.DataFrame:
    df_float = pd.DataFrame(
        ['±'] + ['e'] * ne + ['s'] * (n-ne-1),
        index=range(n-1, 0-1, -1)
    ).T

    df_float.set_index(31, inplace=True)

    return df_float



In [None]:
n = 32    # number of bits
ne = 8    # number of exponent bits

float_table(32, 8)



* Please note that the exponent value $0$ means $2^{-127}$ and $2^{8}-1=255$ means $2^{128}$.



* Following table shows the breakout of 64 bits of double precision.



In [None]:
n = 64    # number of bits
ne = 11   # number of exponent bits

float_table(64, 11)



* Please note that the exponent value $0$ means $2^{-1023}$ and $2^{11}-1=2047$ means $2^{1024}$.



## Complex numbers



* Usually two real numbers; respectively for real and imaginary parts. [[ref](https://en.cppreference.com/w/cpp/numeric/complex)]



In [None]:
import IPython.display as disp
import pylab as py

z1 = 1j * 1j

disp.display(disp.Latex(f'$i \\times i = {z1}$'))


z2 = 1j ** 2

disp.display(disp.Latex(f'$i^2  = {z2}$'))


pi = py.arccos(-1)
z3 = py.exp(1j * pi)

disp.display(disp.Latex(r'$e^{j\pi}  = '+f'{z3}$'))

z4 = 1 + 2j
z5 = 1 - 2j

disp.display(disp.Latex(f'${z4}\\times{z5} = {z4 * z5}$'))



* C++ has `std::complex`. Its example is as follows. [[ref](https://en.cppreference.com/w/cpp/numeric/complex)]<br>
Note : C++14 or higher supports complex literal.



In [None]:
%%writefile complex.cpp
#include <iostream>
#include <iomanip>
#include <complex>
#include <cmath>
 
int main(int argn, char* argv[])
{
    using namespace std::complex_literals;
    std::cout << std::fixed << std::setprecision(1);
 
    std::complex<double> z1 = 1i * 1i;     // imaginary unit squared
    std::cout << "i * i = " << z1 << '\n';
 
    std::complex<double> z2 = std::pow(1i, 2); // imaginary unit squared
    std::cout << "pow(i, 2) = " << z2 << '\n';
 
    double PI = std::acos(-1);
    std::complex<double> z3 = std::exp(1i * PI); // Euler's formula
    std::cout << "exp(i * pi) = " << z3 << '\n';
 
    std::complex<double> z4 = 1. + 2i, z5 = 1. - 2i; // conjugates
    std::cout << "(1+2i)*(1-2i) = " << z4*z5 << '\n';
    
    return 0;
}



In [None]:
!g++ -Wall -g -std=c++14 complex.cpp -o complex -Wa,-adhln=complex.s



In [None]:
!./complex



* Here, `using namespace std::complex_literals;` enables writing a complex constant such as `1i` in the source code directly.



* `std::complex<double> z1` declares a C++ variable named `z1`.<br>  "`complex` with real and imaginary parts in `double` precision floating point" can be its type.



# Characters and Strings



* If computers can handle only 0s and 1s, how come we are reading this page?



* It is because of a set of promises that we call [**character encodings**](https://en.wikipedia.org/wiki/Character_encoding).



* Following table depicts the ASCII table, one of such standards.



In [None]:
# https://stackoverflow.com/questions/35160256/how-do-i-output-lists-as-a-table-in-jupyter-notebook
# http://nbviewer.jupyter.org/github/ipython/ipython/blob/4.0.x/examples/IPython%20Kernel/Rich%20Output.ipynb

import IPython

n = 8

elipsis_line = "| ... | ... | ... |"

# Python can use triple quote to represent a multi line string
table = [
  f"| {n} bit bit pattern | `unsigned int{n}_t` | `char` |",
   "|:-----------------:|:--------:|:------:|"
]

for i in range(0, 8+1):
    table.append(f'| {i:0{n}b} | {i} | {i:c} |')

# Python can use '' or "" to represent a string literal
table.append(elipsis_line)

for i in range(14, 27+1):
    table.append(f'| {i:0{n}b} | {i} | {chr(i)} |')

table.append(elipsis_line)

for i in range(31, 50+1):
    table.append(f'| {i:0{n}b} | {i} | {chr(i)} |')

table.append(elipsis_line)

for i in range(57, 65+1):
    table.append(f'| {i:0{n}b} | {i} | {i:c} |')

table.append(elipsis_line)

for i in range(90, 97+1):
    table.append(f'| {i:0{n}b} | {i} | {i:c} |')

table.append(elipsis_line)

for i in range(122, 127+1):
    table.append(f'| {i:0{n}b} | {i} | {i:c} |')

table.append(elipsis_line)


IPython.display.display(IPython.display.Markdown('\n'.join(table)))



* Following C++ program would generate an ASCII table summary similar to the python code above.



In [None]:
%%writefile ascii_table.cpp
// http://www.cplusplus.com/reference/fstream/fstream/
#include <iostream>
// https://stackoverflow.com/questions/7349689/
#include <bitset>
#include <iomanip>


using namespace std;

int main(int argn, char *argv[]){
  // C/C++ uses "" to represent a string and '' indicates a single character

  char sep[] = " | ", endl = '\n';
  char elipsis_line[] = "| ... | ... | ... |\n";

  const int n = 8;

  cout << "| " << n << " bit bit pattern | `unsigned int" 
    << n << "_t` | `char` |" << endl;
  cout << "|:-----------------:|:--------:|:------:|" << endl;

  for (int i=0; i<(8+1); ++i){
    cout << "| "<< bitset<n>(i) << sep << i << sep << char(i) << " |" << endl;
  }

  cout << elipsis_line;

  for (int i=14; i<(27+1); ++i){
    cout << "| "<< bitset<n>(i) << sep << i << sep << (char) i << " |" << endl;
  }

  cout << elipsis_line;

  for (int i=31; i<(50+1); ++i){
    cout << "| "<< bitset<n>(i) << sep << i << sep << (char) i << " |" << endl;
  }

  cout << elipsis_line;

  for (int i=57; i<(65+1); ++i){
    cout << "| "<< bitset<n>(i) << sep << i << sep << char(i) << " |" << endl;
  }

  cout << elipsis_line;

  for (int i=90; i<(97+1); ++i){
    cout << "| "<< bitset<n>(i) << sep << i << sep << char(i) << " |" << endl;
  }

  cout << elipsis_line;

  for (int i=122; i<(127+1); ++i){
    cout << "| "<< bitset<n>(i) << sep << i << sep << char(i) << " |" << endl;
  }

  cout << elipsis_line;

  return 0;
}



In [None]:
!g++ -Wall -g -std=c++14 ascii_table.cpp -o ascii_table -Wa,-adhln=ascii_table.s



In [None]:
import IPython


table_cpp_list = !./ascii_table
IPython.display.display(IPython.display.Markdown('\n'.join(table_cpp_list)))



Compare python and c++ tables



In [None]:
assert not(set(table) - set(table_cpp_list)), set(table) - set(table_cpp_list)



In [None]:
assert not(set(table_cpp_list) - set(table)), set(table_cpp_list) - set(table)



### Strings



* To represent a word or a sentence, we use a group of characters in order.



* In python, all characters are strings with length of one.



In [None]:
s = "Hello World!"

print(s)



* In C/C++, a string always ends with a character '\0'; hence we call it a "zero terminated string".



In [None]:
%%writefile zero_terminated_string.cpp
#include <iostream>
#include <iomanip>

using namespace std;

int main(int argn, char * argv []){

    char s[] = "Hello World!";
    int n = sizeof(s);
    
    for (int i = 0; i < n; ++i){
        // setw() sets width
        cout << "s[" << setw(2) << i <<"] = " 
         << s[i] << "(" << setw(2) << (int) s[i]<< ")\n";
    }

    return 0;

}



In [None]:
!g++ -Wall -g -std=c++14 zero_terminated_string.cpp -o zero_terminated_string -Wa,-adhln=zero_terminated_string.s



In [None]:
!./zero_terminated_string



* In python, we cannot change one character within a string; in C/C++ we can change any one character except the last '\0'.



In [None]:
s = "hello World!"
# Following is an exception handling block
try:
    # This will cause an error
    s[0] = 'h'
except TypeError as e:
    # This line will present the error message
    print(e)



In [None]:
%%writefile assign_char.cpp
#include <iostream>


using namespace std;


int main(int argn, char * argv []){

    char s[] = "Hello World!\n";
    s[0] = 'H';
    cout << s ;

    return 0;
}



In [None]:
!g++ -Wall -g -std=c++14 assign_char.cpp -o assign_char -Wa,-adhln=assign_char.s



In [None]:
!./assign_char



## Converting between Types



* What if we need to multiply an integer with a floating point real number (float)?



* Python, C, and C++ all convert the integer into a float.



* What if we want to convert a float into an intger?



* Following C++ example includes some of possible type conversions.



In [None]:
%%writefile types.cpp
#include <cstdio>

int main(int argn, char *argv[]){

    int i = 1;
    double s = 0.1, q = 2.04;

    double pi_float = i + q + s;
    int pi_int = (int)(i + q + s);

    char c = 'c';

    printf("pi_float = %f\n", pi_float);
    printf("pi_int = %d\n", pi_int);
    printf("c (in char) = %c\n", c);
    printf("c (in int) = %d\n", (int) c);

    return 0;
}



In [None]:
!g++ -Wall -g -std=c++14 types.cpp -o types -Wa,-adhln=types.s



In [None]:
!./types



* In case of python, we can try followings.



In [None]:
i = 1
s, q = 0.1, 2.04

pi_float = i + q + s

# float -> int
pi_int = int(i + q + s)

c = 'c'

# char -> int
c_int = ord(c)

print(f'pi_float = %f' % pi_float)
print(f'pi_int = %d' % pi_float)
print(f'c (in char) = %c' % c)
print(f'c (in int) = %d' % c_int)



## `list` and `tuple` from Python vs arrays from C/C++



* To store multiple entities under one variable name, we can use these :

| type | declaration |
|:-----:|:------------:|
| Python `list` | `s = [1, 2, '3', [4]]` |
| Python `typle` | `t = ('1', '2', 3, '4')`<br>`u = '1', ('2', 3), 'a'` |
| C/C++ array | `int a[] = {1, 2, 3, 4};`<br>`char b[] = "1234";` |



* Following operations are possible:

| operation | `python` | `C`/`C++` |
|:-----:|:------------:|:------------:|
| read | `s[0]` | `a[0]`
| write | `s[0]=5` | `a[0]=5`
| number of elements | `len(s)` | `sizeof(a)/sizeof(a[0])`
| size in bytes | N/A |  `sizeof(a)`
| pop last element | `s.pop()` | N/A
| add a new last element | `s.append(5)` | N/A
| concatenate two of them | `s0 + s1` | N/A
| insert a new element at `i` | `s.insert(i, 'a')` | N/A
| slicing | `s[0:2]` | N/A



* In fact, this may not be a fair comparison.



In [None]:
counter = 0
s = [1, 2, '3', [4]]

counter += 1; print(f'{counter:02d}. s = {s}')
s[0] = 5

counter += 1; print(f'{counter:02d}. s = {s}')
print(f'len(s) = {len(s)}')

print(f's.pop() = {s.pop()}')

counter += 1; print(f'{counter:02d}. s = {s}')

s.append(15)
counter += 1; print(f'{counter:02d}. s = {s}')

s.insert(2, 'a')
counter += 1; print(f'{counter:02d}. s = {s}')



* There is a [discussion](https://stackoverflow.com/questions/17528657/python-list-equivalent-in-c) claiming that `std::vector` or `std::deque` of C++ could be similar to python `list`.



* Following C++ example shows [C++ `vector`](http://www.cplusplus.com/reference/vector/vector/).



In [None]:
%%writefile vector_example.cpp
#include <iostream>
#include <vector>

using namespace std;

int main ()
{
        // http://www.cplusplus.com/reference/vector/vector/
        vector<char> a ({'a', 'b', 'c', 'd'});

        for (unsigned int i=0; i < a.size(); ++i){
                cout << "a[" << i << "] = " << a[i] << "\n";
        }

        return 0;
}



In [None]:
!g++ -Wall -g -std=c++14 vector_example.cpp -o vector_example -Wa,-adhln=vector_example.s



In [None]:
!./vector_example



* Implementing full features of `list` may not be straight forward in C/C++.



In [None]:
a = list(range(10))
print(a[4:7])



## `dict` of Python vs `struct` and `union` of C/C++



[[ref0](https://stackoverflow.com/questions/330793/how-to-initialize-a-struct-in-accordance-with-c-programming-language-standards), [ref1](https://en.cppreference.com/w/c/language/struct_initialization)]



* To store data with the field names, followings are available:

| type | declaration |
|:-----:| ------------ |
| Python `dict` | <pre><code class="language-python">s = {'name' : 'python', <br>'year' : 1989,<br> 'cmd' : 'python'}</code></pre> |
| C/C++ `struct` | <pre><code class="language-c++">struct {char name[],<br> int year[],<br> char cmd[]<br>} t = {.name="C",<br>.year=1970,<br>.cmd="python"};</code></pre> |
| C/C++ `union` | <pre><code class="language-c++">union {<br>int x,<br>char c[4]<br>} u = {1};</code></pre> |



* Following operations are possible:

| operation | `python` | `C`/`C++` |
|:-----:|:------------:|:------------:|
| read | `s['name']` | `t.name`
| write | `s['name']='Python'` | `t.year=1972;`
| size in bytes | N/A |  `sizeof(t)`
| delete one element | `del s['cmd`] | N/A
| add a new element | `s['ide']='idle'` | N/A
| merge two of them | `s0.update[s1]` | N/A



* For python, we can use any type for the value and any immutable type for the key.



In [None]:
s =  {'name' : 'python', 'year' : 1989, 'cmd' : 'python'}

print(f"s['name'] = {s['name']}")

s['name']='Python'
print(f"s = {s}")

del s['cmd']
print(f"s = {s}")

s['ide'] = 'idle'
print(f"s = {s}")

s0 = {'ide' : 'spyder'}
print(f"s0 = {s0}")

s0.update(s)
print(f"s0 = {s0}")



* This is not a fair comparison, either.



* Another [discussion](https://stackoverflow.com/questions/1842941/translating-python-dictionary-to-c) claims that [`std::map`](http://www.cplusplus.com/reference/map/map/) of C++ could be similar to python `dict`.



In [None]:
%%writefile map_example.cpp
#include <iostream>
#include <map>
// https://en.cppreference.com/w/cpp/string/byte/strcmp
#include <cstring>

using namespace std;

// https://stackoverflow.com/questions/4157687/
struct cmp_str
{
   bool operator()(char const *a, char const *b) const
   {
      return std::strcmp(a, b) < 0;
   }
};

int main(int argn, char * argv[]){
	map<const char *, const char *, cmp_str>s;

	s["name"] = "python";
	s["year"] = "AD1989";
	s["cmd"] = "python";

	// https://stackoverflow.com/questions/14070940/
	
	for (auto& t: s){
		cout << t.first << ':' << t.second << '\n';
	}
	return 0;
}



In [None]:
!g++ -Wall -g -std=c++14 map_example.cpp -o map_example -Wa,-adhln=map_example.s



In [None]:
!./map_example



* To use any type as values in C++, some [additional arrangement](https://stackoverflow.com/questions/24702235/c-stdmap-holding-any-type-of-value) might be necessary.



## Exercises



### 00: Integers



* Please add a text file named `01_00_integers.txt` and answer these questions



1. Pick two integers : One positive, one negative
1. Convert them into binary representations
1. Add these two binary numbers
1. Convert the sum back to decimal
1. Compare with decimal result



### 01 : Integers



* Please add a text file named `01_01_integers.txt` and answer these questions



1. Pick two integers : Both positive (One bigger, the other like $2^n + 3$)
1. Convert them into binary representations
1. Multiply these two binary numbers
1. Convert them back to decimal
1. Compare with decimal results (Calculator OK)



### 02 : Floating point



* Please add a text file named `01_02_floats.txt` and answer these questions



* Can you find the largest $n \in \mathbb{N}$ satisfying $2.0 < \left(2.0 + 2^{-n}\right)$ in 64 bit floating point representation. If so, what would it be?



### 03 : Characters



* Pick 5 characters from your keyboard
* Add a cell below and write a python lines printing the character and its ASCII code.
* Repeat in C++ using `std::cout`. Compile, & run it
* Add & commit



### 04 : `list`



* Add a cell below 
* In the cell, write a python code



1. Create a list containing the 5 characters from above
1. Print the `list`
1. Pick 3 integers from your keyboard
1. Append one at the last position
1. Insert one at the first position
1. Insert one at a place of your choice.
1. Print three of the five members of the `list`



### 05 : `char` array



* Create a C++ file named `01_05_array.cpp`



1. Create a `char` array containing 5 characters from above
1. Using `std::cout`, print all 5 in one line
1. Consider 3 integers above
1. Put ASCII code of one number character at the last position
1. Put ASCII code of another number character at the first position
1. Put ASCII code of the other number character at a place of your choice.
1. Again, using `std::cout`, print all 5 in one line
1. Compile, run it, and add & commit.



### 06 : `dict`



* Add a cell below
* In the cell, write a python code



1. Pick 3 characters from above
1. Make three pairs of a character and an integer
1. Create a `dict` initializing using the pairs
1. Print the `dict`
1. Add the remaining two characters to the `dict`
1. Print the `dict` again
1. Print 2 of the pairs of the `dict`



### 07 : `union`



* Create a C++ file named `01_07_union.cpp`



1. Define a template of `union` with an `unsigned int` and an array of four `unsigned int8_t`'s
1. Create a `union` initalizing the array with four 8bit integers of your choice
1. Using `printf` of `<cstdio>` print each member of the array in hexadecimal
1. Using `printf` print the integer in hexadecimal
1. Change a member of the array
1. Using `printf` print each member of the array in hexadecimal
1. Using `printf` print the integer in hexadecimal

