# Real numbers and their representations

## $ \S 1 $ The representation of a number in base $ b $

The main purpose of numerical methods is to provide computational tools to solve
engineering, scientific and mathematical problems and to acquire insight about
them. Numbers are the basis of computation. It is therefore necessary that we
first understand how numbers are represented in a machine, and how its finite
memory capacity and processing power inevitably lead to errors, and how these
errors can be estimated or, even better, avoided.

It is essential to differentiate a real number $ x $ from its representation in
some system (binary, decimal, etc.). Any _integer_ $ b \ge 2
$ can serve as the __base__ of such a system. Once such a choice is made, $ x $
can be expressed in the form
$$
x = \pm\, \big(\,d_mb^m + d_{m - 1}b^{m - 1} + \cdots + d_1 b + d_0 
   + d_{-1}b^{-1} + d_{-2}b^{-2} + \cdots\,\big)\,.
$$

for some integers $ d_k $ satisfying $ 0 \le d_k \le b - 1 $, which are called
the __digits__ of $ x $ in base $ b $. The notation
$$
x = \pm(d_m\, d_{m - 1}\cdots d_1\, d_0\,.\,d_{-1}\, d_{-2} \cdots)_{b}
$$
is just an abbreviation of the preceding representation.

   
The number of nonzero digits to the _left_ of the point is always _finite_,
because any real number is bounded. In contrast, the number of nonzero digits to
the _right_ of the point may or may not be _infinite_.

The same real number $ x $ has infinitely many representations, one for each
possible base $ b \ge 2 $. We say that $ x $ has a __finite__ representation in
base $ b $ when only finitely many of the $ d_k $ are nonzero. Otherwise, its
representation is said to be __infinite__. Irrational numbers have infinite
representations in any base $ b \ge 2 $. A rational number may have a finite
representation in one base, but an infinite one in another base.

This method of representing real numbers is not the only possible one, nor
necessarily the most appropriate. As an example, it may be better to represent a
rational number as a pair of integers (its numerator and denominator) instead of
storing its decimal or binary representation.

⚠️ Because we use the decimal system almost exclusively to represent numbers in
daily life, we write, say, $ 592 $ instead of $ (592)_{10} $. However,
it is important to keep in mind that '$ 592 $' is not the number itself, but
rather its _representation_ in base $ 10 $. The same number could also be
written as $ (1001010000)_2 $, and were $ b = 2 $ the most commonly used base,
we would instead abuse our notation to write simply $ 1001010000 $.

__Exercise:__ Determine the number $ (1001101)_2 $.

__Exercise:__ Show that $ (12210)_3 = (156)_{10} $.

__Exercise:__ Let $ b \ge 2 $ and $ m \ge 0 $. How many numbers $ x $ are represented as in (1) by digits $ d_k $ such that:
$$ d_k = 0 \quad \textrm{ if $\  k > m \ $ or $\ k < 0 \ $?} $$

Note in particular how the number of digits required to represent an integer $ n
$ in a given base $ b $ varies _logarithmically_ with $ n $.

**Exercise:** Let $ b \ge 2 $ be an integer. Prove that the representation in base $ b $ of a real number $ x $ is finite (i.e., has finitely many nonzero digits) if and only if $ x $ can be expressed as a fraction in which the numerator is an integer and the denominator is of the form $ b^r $ for some integer $ r \ge 0 \,$. 

__Exercise__: Which of the following numbers has a finite representation in base $ 2 $?

(a) $ \frac{1}{3} $;

(b) $ \frac{2}{3} $;

(c) $ \frac{3}{8} $;

(d) $ \frac{7}{56} $.


⚠️ To avoid confusion, when considering systems other than the decimal ($ b = 10 $), it is better to avoid the expressions "decimal point", "decimal digit" and "decimal part". We will speak instead of "point", "digit" and "fractional part".

⚠️ For a given base $ b $, the representation in (1) is unique except for the consequences of the fact that 
$$ (b-1)\,b^{-r} + (b-1)\,b^{-r-1} + (b-1)\,b^{-r-2} + \dots = b^{-r + 1} .$$
Thus, for instance,

$$ (0.1239999\dots)_{10} = (0.124)_{10} \quad \text{and}
\quad (1010.10101111\dots)_2 = (1010.1011)_2 .$$ 

It follows that the only real numbers which have non-unique representations in base $ b $ are those which have a representation with $ d_k = b - 1 $ for all $ k \ge r $ (for some $ r \in \mathbb{Z} $), and such numbers have *exactly two* different representations.

For convenience we will assume implicitly below that the representation of a number is *not* such that every digit equals $ b - 1 $ from some point on (if this happens to be the case, we can just replace it by the other one, which is finite).

__Exercise (harder):__ Prove that a real number $ x $ is rational if and only if
its representation in any base $ b $ is periodic (i.e., repeating) from some
digit on.

_Hint:_ In one direction, suppose that $ x = n / m $ for some (positive)
integers $ n $ and $ m $. Since there are only $ m $ possible remainders upon
division by $ m $, eventually the same remainder occurs twice. This implies that
the representation of $ x $ in a given base is periodic. Conversely, if the
representation of a number $ x $ in base $ b $ is periodic with period $ p $,
show that $ b^p x - x $ has a finite representation, hence is rational.


__Exercise (harder):__ Prove that the representation of a real number $ x $ in
base $ b $ is unique except for the cases mentioned in the preceding warning.
_Hint:_ Suppose that a real number $ x > 0 $ has two distinct representations in
some base $ b $:
$$
x = (d_m\, d_{m - 1}\cdots d_1\, d_0\,.\,d_{-1}\, d_{-2} \cdots)_{b}
= (d_m'\, d_{m - 1}'\cdots d_1'\, d_0'\,.\,d_{-1}'\, d_{-2}' \cdots)_{b}
$$
If $ r $ is the leftmost index such that the two corresponding digits $ d_r $ and
$ d_r' $ do not coincide, then $ 0 = x - x $ can be written in the form 
$$ 0 = b^r\sum_{k=0}^{+\infty} e_kb^{-k} \quad \text{where}\quad
e_0 > 0 \quad \text{and}\quad 0 \le \vert e_{i} \vert \le b - 1 \quad \text{for
each $ k \ge 0 \,$}.
$$
Show that the only possibility is that $ e_0 = 1 $ and $ e_i = -(b - 1) $ for all $
i > 0 $.

⚡ A real number is said to be __algebraic__ if it is a root of some polynomial
having coefficients in $ \mathbb Z $. A real number which is not algebraic is called
__transcendental__.

* Any rational number $ r $ is algebraic since it is the root of the polynomial
  $ x - r $ of degree $ 1 $. Equivalently, any transcendental number must be
  irrational.
  
* Some irrational numbers, such as $ \sqrt{3}, \sqrt[3]{7} $ and
  $ \sqrt[5]{2 + \sqrt{3}} $ are also algebraic (why?).

* The familiar numbers $ \pi $ and $ e $ are transcendental, however this was
  only established near the end of the 19th century. It is still an open problem
  to prove whether relatively simple numbers such as $ \pi^\pi $ are
  transcendental.

⚡ A real number $ x $ is said to be **computable** if its representation (say, in base $ 2 $) can, in principle, be determined by an *algorithm*. Intuitively, $ x $ is computable if its representation can be calculated by a computer with unlimited memory but limited processing power, provided with finitely many instructions but having infinite time to perform its computations, a so-called **Turing machine**.

It should be clear that any number whose representation consists of finitely many digits is computable. An example of a number which does not have this property and yet is computable is $ \frac{1}{3} = (.010101\dots)_2 $ in base $ 2 $. We can certainly write a simple program to print all of its digits, even though such a program will never terminate.

The class of computable numbers includes all algebraic numbers, plus some transcendental numbers, such as $ e $ and $ \pi $.

However, not every real number is computable. In fact, in a sense which can be
made precise, _almost no real number is computable_ (more precisely, the set of
computable numbers is denumerable). This result is due to A. Turing — _On
computable numbers, with an application to the Entscheidungsproblem_ (1936).
Among other results, in this paper Turing introduces the computational model
which now bears his name and also constructs a __universal__ Turing machine (a
Turing machine that can simulate any other Turing machine), which served
as a theoretical model for some of the early electronic computers.

It can be argued that the only tangible numbers in a philosophical sense are the
computable numbers, and that the remaining so-called real numbers are not "real"
at all, since even though they can in principle be defined, they cannot in fact
be "realized" as the result of some finite process.

## $ \S 2 $ The binary system and its cousins

Most computing uses the __binary__ form of representing numbers, in which the
base $ b $ equals $ 2 $ and there are only two possible digits: $ 0 $ and $ 1 $.
Such a digit is usually referred to as a __bit__ (short for _binary digit_).

There are several reasons why the binary system is prevalent. Firstly, it is the
simplest representation system. Secondly, it is also the easiest one to
implement from an engineering standpoint, because it involves devices which
need only distinguish between two stable positions (on and off). But perhaps the
main reason is the close relationship between Boolean logic and the arithmetic
of integers modulo 2, given by the correspondence below:

| Boolean expression/operation | Binary equivalent |
| :----------------- | :----------------- |
| False              | 0                 |
| True               | 1                 |
| And                | $ \times $ (multiplication) |
| _Exclusive_ or (__xor__)   | $ + $ (addition)  |
| Not                | $ x \mapsto 1 + x $ |
| Or (inclusive)     | $ (x, y) \mapsto x + y + xy $ |

Note that all three operations on the right side are __modulo 2__, that is,
the result of, say, $ x + y \pmod 2 $ is the _remainder_ of $ x + y $ after
division by $ 2 $, so that in this sense $ 1 + 1 = 0 $. Note that $ n \equiv -n
\pmod 2 $ for any integer $ n $.

📝 Besides the binary and decimal systems, the following two systems are also relatively important in applications:

* The _octal_ system, where $ b = 8 $ and the possible digits are $ 0,1,\dots,7 $;
* The _hexadecimal_ system, where $ b = 16 $ and the possible digits are
  $ 0, 1, \dots, 9 $ and $ A, B, \dots, F $, which stand for
  $ 10, 11, \dots, 15 $ in the decimal system, respectively.

__Exercise:__ Determine the number $ (ABC)_{16} $.

__Exercise:__ Show that if an integer $ n $ has a binary representation of length $ m $, then its representation in the octal system is of length at most $ \lceil\frac{m}{3}\rceil $.

Our main objective is to understand how to efficiently convert from the binary system to the more familiar decimal system and back.

For example, 

\begin{align} (1001.101)_2 &= 1 \times 2^3 + 0 \times 2^2 + 0 \times 2^1 + 1 \times 2^0 + 1 \times 2^{-1} + 0 \times 2^{-2} + 1 \times 2^{-3}  \\
& = 8 + 1 + \frac{1}{2} + \frac{1}{8} \\ &= \frac{77}{8}.
\end{align}

However, since there is no additional difficulty, and much to be gained, in working with an arbitrary base $ b \ge 2 $, we will formulate our results and scripts to cover any base.

## $ \S 3 $ Computing a number from its representation in base $ b $<a name="section2"></a>

In this section we will consider a solution to the following problem.

<a name="Problem1"></a>**Problem 1 (representation to number):** Given a base $ b \ge 2 $ and the representation of an unknown real number $ x $ in this base, determine $ x $ with the least possible amount of computation.

📝 Even though the problem has been formulated in full generality, in practice
we can only effectively compute $ x $ when the given representation is *finite*.
Otherwise, we need to content ourselves with an approximation to $ x $.

This problem may seem trivial because (1) already gives us a way to compute $ x
$; however, our interest is in carrying out the computation more efficiently
than the obvious implementation of (1) as an algorithm.

<div class="alert alert-info"> The following observation will be useful: Any real number $ x $ can be written as the sum of a unique <i>integer</i> $ n $ and a unique <i>fractional number</i> $ t $ such that $ 0 \le t < 1 $. In Python, they are given by:
<ul>
    <li> $ t = $ <code>x % 1</code>;</li>
    <li> $ n = $ <code>int(x)</code> if $ x \ge 0 $ or <code>int(x) - 1</code> if $ x < 0 $.</li>
    </ul>

In any case, $ n $ is the largest integer $ \le x $ and $ t = x - n $.
We will call $ n $ the <b>integral part</b> of $ x $ and $ t $ its <b>fractional part</b>
</div>

📝 If $ x \ge 0 $, then the function `modf` from the **math** module returns the tuple $ (t, n) $ when applied to $ x $. If $ x < 0 $, then it returns the negative of `modf(-x)`, which is not the same as the decomposition considered above.

In [1]:
from math import modf


def decompose(x: float) -> tuple:
    t = x % 1
    if x >= 0:
        n = int(x)
    else:
        n = int(x) - 1
    print(f"The decomposition of {x} into its integral and fractional part is:")
    print(f"{x} = {n} + {t}\n")


x = 3.14159
y = -1.4
z = -34.9

decompose(x)
decompose(y)
decompose(z)

print(modf(z))
print((-modf(-z)[0], -modf(-z)[1]))

The decomposition of 3.14159 into its integral and fractional part is:
3.14159 = 3 + 0.14158999999999988

The decomposition of -1.4 into its integral and fractional part is:
-1.4 = -2 + 0.6000000000000001

The decomposition of -34.9 into its integral and fractional part is:
-34.9 = -35 + 0.10000000000000142

(-0.8999999999999986, -34.0)
(-0.8999999999999986, -34.0)


By means of the aforementioned decomposition, we can reduce Problem 1 to two easier problems:

**Subproblem 1.1:** Given a base $ b $ and the representation of an unknown *integer* $ n $ in this base, determine $ n $ with the least possible amount of computation.

**Subproblem 1.2:** Given a base $ b $ and the representation of an unknown *fractional number* $ 0 \le t < 1 $ in this base, determine $ t $ with the least possible amount of computation.

To solve [Problem 1](#Problem1) for a given representation, the idea is to consider separately the parts to the left and to the right of the point '.'. Applying the solutions to Subproblems 1.1 and 1.2 to each, we will obtain the integral and fractional parts $ n $ and $ t $ of $ x $. Hence $ x = n + t $.

**Example:** Determine $ x = (123.41)_5 $.

*Solution:* Let $ n = (123)_5 $ and $ t = (.41)_5 $. Then
$$ n = 1 \times 5^2 + 2 \times 5 + 3 = 25 + 10 + 3 = 38\ ,$$
while 
$$ t = \frac{4}{5} + \frac{1}{25} = \frac{21}{25}\ . $$
Therefore 
$$ x = n + t = \frac{38 \cdot 25 + 21}{25} = \frac{971}{25} .$$

### 1.1 Computing an integer given its representation in base $ b $

**Solution to Subproblem 1.1:** Suppose that 
$$ n = (d_m\,d_{m-1}\dots d_1d_0)_b\ ,$$
where on the right side we have the given representation in base $ b $. This means that 
$$ n = d_m\, b^m + d_{m-1}\, b^{m-1} + \dots + d_1 \, b^1 + d_0\, b^0 .$$

Consider the following sequence:
* $ a_m = d_m $;
* $ a_{m - 1} = b a_m + d_{m - 1} $;
* $ a_{m - 2} = b a_{m - 1} + d_{m - 2} $;
* $ \vdots = \vdots $
* $a_{k} = b a_{k + 1} + d_k $;
* $ \vdots = \vdots $
* $ a_1 = b a_2 + d_1 $;
* $ a_0 = b a_1 + d_0 $.

**Exercise:** Prove by induction on $ m $ that $ a_0 = n $.

The preceding steps can immediately be converted into an efficient procedure to compute $ n $.

In [3]:
def rep_to_number(b: int, digits: str) -> int:   # The type annotations are optional!
    """A function which takes a string of digits between 0 and b - 1,
    where the base b >= 2 is an integer, and returns the integer
    whose representation in base b is the given one."""
    assert isinstance(b, int) and b >= 2         # Make sure b is an integer >= 2.
    allowed = [str(d) for d in range(0, b)]      # Allowed digits.
    for digit in digits:                         # Check if the representation is valid.
        if digit not in allowed:
            raise ValueError(f"Invalid representation of a number in base {b}.")
       
    n = 0
    for d in digits:
        d = int(d)       # Convert the current digit to an int.
        n = b * n + d
    
    return n

print(rep_to_number(2, "1011"))
print(rep_to_number(3, "1011"))
print(rep_to_number(10, "1011"))
print(rep_to_number(2, "100110"))
print(rep_to_number(3, "12210"))

11
31
1011
38
156


### 1.2 Obtaining a fractional number from its representation in base $ b $
**Solution to Subproblem 1.2:** Suppose that we are given the (finite) representation
$$ .d_{-1}\,d_{-2}\dots d_{-r} $$
in base $ b $ and let $ t $, $ 0 \le t < 1 $, be the (unknown) represented number. 
Then
$$
t = d_{-1}b^{-1} + d_{-2}b^{-2} + \dots + d_{-r}b^{-r}.
$$
To compute $ t $, observe that
$$
b^{r} t = d_{-1}b^{r - 1} + d_{-2}b^{r - 2} + \dots + d_{-r + 1} b + d_{-r}.
$$
Therefore we can apply the solution to Subproblem 1.1 to compute $ b^r t $ from its  representation
$$ (d_{-1}\,d_{-2}\dots d_{-r})_b = b^r t $$
and then divide by $ b^r $ (where $ r $ is the length of the given representation of $ t $) to find $ t $. This is implemented below. $\qed $

In [3]:
def rep_to_fractional(b: int, digits: str) -> int:   # The type annotations are optional!
    """A function which takes a string of digits between 0 and b - 1,
    where the base b >= 2 is an integer, and returns the unique
    fractional number whose representation in base b is the given one."""
    assert isinstance(b, int) and b >= 2         # Make sure b is an integer >= 2.
    allowed = [str(d) for d in range(0, b)]      # Allowed digits.
    for digit in digits:                         # Check if the representation is valid.
        if digit not in allowed:
            raise ValueError(f"Invalid representation of a number in base {b}.")
    
    r = len(digits)
    n = rep_to_number(b, digits)
    return n * b**(-r)


print(rep_to_fractional(2, "11"))
print(rep_to_fractional(3, "11"))
print(rep_to_fractional(10, "11"))

0.75
0.4444444444444444
0.11


### 1.3 Obtaining a real number from its representation in base $ b $

**Exercise:** Define a function which takes two arguments:
* The base $ b \ge 2 $;
* The (finite) representation of an unknown real number $ x $, provided as a string of digits between $ 0 $ and $ b - 1 $;

and which returns $ x $ as output. (*Hint:* Use the functions that were defined above plus the list method `index`, which returns the index of the first occurrence of some element in a list, in order to find the location of the point '.' in the given representation.)

**Exercise:** How should each of the functions above be modified to include the possibility that $ x \le 0 $?

## 2 Computing the representation of a number in base $ b $

We will now consider the converse to [Problem 1](#Problem1), namely, how to compute the representation of a number in base $ b $ given the number $ x $ itself.

<a name="Problem2"></a>**Problem 2 (number to representation):** Given a base $ b \ge 2 $ and a real number $ x $, find its representation in base $ b $.

**Subproblem 2.1:** Given a base $ b $ and an *integer* $ n $, find its representation in base $ b $.

**Subproblem 2.2:** Given a base $ b $ and a *fractional number* $ t $, $ 0 \le t < 1 $, find its representation in base $ b $.

As before, if we can solve these two subproblems, then we can obtain a solution to Problem 2 by combining their solutions. More precisely, the idea is to separately compute the representations of the integral and fractional parts $ n $ and $ t $ of $ x $, and finally to concatenate them, with a point '.' in between, to obtain the representation of $ x $.


### 2.1 Obtaining the representation in base $ b $ of an integer

**Solution to Subproblem 2.1:** We may assume without loss of generality that $ n \ge 0 $, since the representation of $ -n $ is identical to that of $ n $ except for the $ - $ sign at the beginning. Let
$$ n = (d_m\,d_{m-1}\dots d_1d_0)_b\ ,$$
be the representation of $ n $ in base $ b $, that is, 
$$ n = d_m\, b^m + d_{m-1}\, b^{m-1} + \dots + d_1 \, b^1 + d_0\, b^0 .$$
We need to compute the $ d_k $.

Now, upon (integer) division of $ n $ by $ b $, we obtain:
* $ d_0 $ as the remainder;
* $ d_m\, b^{m - 1} + d_{m - 1}\, b^{m - 2} + \dots + d_2\,b^1 + d_1 \, b^0 $ as the quotient.

Similarly, division of the latter by $ b $ yields:
* $ d_1 $ as the remainder;
* $ d_m\, b^{m - 2} + d_{m - 1}\, b^{m - 3} + \dots + d_3\,b^1 + d_2 \, b^0 $ as the quotient.

Clearly (or, more formally, by induction), continuing in this way *we can obtain the digits $ d_0,\, d_1, \dots, d_m $ as the successive remainders of division of $ n $ by $ b $*, which solves the problem. $ \square $

This solution is implemented in the following script:

In [4]:
def integer_to_rep(b: int, n: int) -> str:
    """Given a base b and a nonnegative integer n, returns the
    representation of n in base b as a string of digits from 0 to (b - 1)."""
    assert isinstance(b, int) and b >= 2
    assert isinstance(n, int) and n >= 0
    
    list_of_digits = []                   # Will store the list of digits of n.
    while n > 0: 
        d = n % b
        n //= b
        list_of_digits.append(str(d))     # Convert d to a string and append it.

    # Now reverse the list of digits and convert it
    # into a string by joining its elements to the empty string:
    list_of_digits.reverse()
    representation = ''.join(list_of_digits)
    return representation


print(integer_to_rep(2, 63))
print(integer_to_rep(2, 64))

111111
1000000


### 2.2 Obtaining the representation in base $ b $ of a fractional number

**Solution to Subproblem 2.2:** Let 
$$ t = (.d_{-1}\,d_{-2}\dots d_{-r}\dots)_b $$
be the (unknown) representation of $ t $, that is,
\begin{align}\label{E:rep2}\tag{2}
t = d_{-1}b^{-1} + d_{-2}b^{-2} + \dots + d_{-r}b^{-r} + \dots .
\end{align}

Since our computer (and mind) has limited memory, we can only compute finitely many digits. To determine, say, the digits $ d_{-1} $ through $ d_{-r} $, we proceed as follows.

Multiplying both sides of \eqref{E:rep2}, we obtain:
* $ d_{-1} $ as the integer part of $ bt $;
* $ d_{-2}b^{-1} + d_{-3}b^{-2} + \dots + d_{-r + 1}b^{-r} + \dots $ as the fractional part  of $ b t $.

Now multiplying the fractional part of $ b t $ by $ b $, we obtain:
* $ d_{-2} $ as the integer part;
* $ d_{-3}b^{-1} + d_{-4}b^{-2} + \dots + d_{-r + 2}b^{-r} + \dots $ as the fractional part.

Continuing in the same fashion, we can compute $ d_{-1}, \dots, d_{-r} $ after $ r $ steps.
$ \square $

This solution is implemented in the script below.

In [5]:
def fractional_to_rep(b: int, t: float, r: int) -> str:
    """Given a base b, a fractional number t and a positive integer r,
    returns the first r digits of the representation of t in base b
    as a string of digits from 0 to (b - 1)."""
    assert isinstance(b, int) and b >= 2
    assert isinstance(t, float)
    assert isinstance(r, int) and r >= 1
    
    digits = []
    while r > 0:
        t *= b
        d = int(t)
        t -= d
        digits.append(str(d))
        r -= 1

    representation = "." + ''.join(digits)
    return representation


print(fractional_to_rep(2, 1 / 8, 5))
print(fractional_to_rep(10, 1 / 4, 10))


.00100
.2500000000


## $ \S 5 $ Python built-in functions for conversion between bases

Python provides the built-in functions

* `bin`
* `oct`
* `hex`

which take a single _integer_ as argument and return its binary, octal or hexadecimal representation, respectively, in the form of strings.

__Example:__ 

In [9]:
x = 12
y = 10923
z = -15

print(bin(x), oct(x), hex(x))
print(bin(y), oct(y), hex(y))
print(bin(z), oct(z), hex(z))

0b1100 0o14 0xc
0b10101010101011 0o25253 0x2aab
-0b1111 -0o17 -0xf


📝 Note how each string is prepended by `0b`, `0o` or `0x`, respectively, to indicate the type of the representation. Note also that the hexadecimal digits $ A, \dots, F $ appear in lowercase.