# Mathematics for Machine Learning

## Session 15: Hyperreal numbers

### Gerhard Jäger


December 12, 2024

---
<br><br>

<small>Most material taken from Chapter 1 of Keisler, H. Jerome. *Elementary Calculus: An Infinitesimal Approach*. 2012.</small><br>
<small>Applets programmed with the help of ChatGPT</small>

# Calculus

I will use an unusual approach to calculus in this course. It goes by the name of *Nonstandard Analysis*. There is an excellent textbook I will use:

* Keisler, H. Jerome. *Elementary Calculus: An Infinitesimal Approach*. 2012.

It is available in Moodle.

## Minimizing a multivariate function

- very frequent task in ML
- typyical example: training a neural network
- central notion: **gradient**
    - direction vector in which the target function changes fastest
    - norm of the gradient proportional to amount of change in target function
    
**Differential calculus** studies how we can compute the gradient of a function.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from ipywidgets import interact


In [14]:
import numpy as np
import matplotlib.pyplot as plt
from ipywidgets import interactive

# Define a 2D loss function
def loss_function(x, y):
    return np.sin(3 * x) * np.cos(3 * y) + x**2 - y**2  # Multimodal, uneven landscape

# Compute gradient at a given point
def gradient(x, y):
    grad_x = 3 * np.cos(3 * x) * np.cos(3 * y) + 2 * x
    grad_y = -3 * np.sin(3 * x) * np.sin(3 * y) - 2 * y
    return grad_x, grad_y

# Generate a grid of points for the loss function
x = np.linspace(-2, 2, 100)
y = np.linspace(-2, 2, 100)
X, Y = np.meshgrid(x, y)
Z = loss_function(X, Y)

# Interactive plot function
def plot_loss_and_gradient(x_point, y_point):
    # Compute gradient at the point
    grad_x, grad_y = gradient(x_point, y_point)

    # Shorten the gradient vector
    scale_factor = 0.5
    grad_x *= scale_factor
    grad_y *= scale_factor

    # Plot the loss function and gradient vector
    plt.figure(figsize=(10, 8))

    # Contour plot of the loss function
    contour = plt.contour(X, Y, Z, levels=20, cmap='viridis')
    plt.clabel(contour, inline=True, fontsize=8)

    # Add color representation
    plt.imshow(Z, extent=(-2, 2, -2, 2), origin='lower', cmap='viridis', alpha=0.6)
    plt.colorbar(label="Loss Value")

    # Plot the gradient vector with an arrow head
    plt.quiver(x_point, y_point, grad_x, grad_y, color='red', angles='xy', scale_units='xy', scale=1, label="Gradient Vector")

    # Highlight the point
    plt.scatter(x_point, y_point, color='blue', label="Point of Interest")

    # Add labels, title, and legend
    plt.xlabel("x")
    plt.ylabel("y")
    plt.title("Loss Function and Gradient Vector")
    plt.legend()
    plt.grid(True)
    plt.show()

# Create interactive widget
interactive_plot = interactive(plot_loss_and_gradient, x_point=(-2.0, 2.0, 0.1), y_point=(-2.0, 2.0, 0.1))
output = interactive_plot.children[-1]
output.layout.height = '600px'
interactive_plot


interactive(children=(FloatSlider(value=0.0, description='x_point', max=2.0, min=-2.0), FloatSlider(value=0.0,…

We start with univariate functions.

In [6]:


# Define the function to be visualized
def f(x):
    return x**2  # Example function

# Function to create the interactive plot
def plot_secant_tangent(x0=1.0, dx=1.0):
    # Define the range for x
    x = np.linspace(-2, 4, 500)
    y = f(x)

    # Define the points for secant line
    x2 = x0 + dx
    y0 = f(x0)
    y2 = f(x2)

    # Calculate the slope of the secant line
    slope = (y2 - y0) / (x2 - x0)

    # Equation of the secant line
    secant_line = lambda x: y0 + slope * (x - x0)

    # Plot the function
    plt.figure(figsize=(12, 9))
    plt.plot(x, y, label="Function $f(x)=x^2$", color="blue")

    # Plot the secant line
    x_secant = np.linspace(-4, 6, 500)  # Extend the secant line indefinitely
    plt.plot(x_secant, secant_line(x_secant), label="Secant line", color="green")

    # Plot the tangent line when dx is very small
    if abs(dx) < 0.01:
        tangent_slope = 2 * x0  # Derivative of x^2
        tangent_line = lambda x: f(x0) + tangent_slope * (x - x0)
        plt.plot(x, tangent_line(x), label="Tangent line (dx -> 0)", color="red", linestyle="--")

    # Highlight points on the curve
    plt.scatter([x0, x2], [y0, y2], color="black", label="Secant points")
    plt.scatter([x0], [f(x0)], color="orange", label="Tangent point")

    # Add labels, title, legend, and grid
    plt.xlabel("x")
    plt.ylabel("f(x)")
    plt.title("Secant and Tangent Lines")
    plt.axhline(0, color='black', linewidth=0.8, linestyle='--')
    plt.axvline(0, color='black', linewidth=0.8, linestyle='--')
    plt.xlim(-2, 4)
    plt.ylim(-1, 16)
    plt.legend()
    plt.grid(True)
    plt.show()

# Interactive widget
interact(plot_secant_tangent, x0=(-2.0, 4.0, 0.1), dx=(-2.0, 2.0, 0.01))

interactive(children=(FloatSlider(value=1.0, description='x0', max=4.0, min=-2.0), FloatSlider(value=1.0, desc…

<function __main__.plot_secant_tangent(x0=1.0, dx=1.0)>

(code developed with ChatGPT)

- goal: compute the **slope** of the tangent at a given point $x$.

## Nonstandard Analysis

The idea is to introduce a new number system, the *hyperreal numbers*, which contains the real numbers as a proper subset. The hyperreal numbers contain infinitesimals, numbers that are smaller than any positive real number, but not zero. This allows us to define the derivative of a function as the ratio of infinitesimals.

The inventors of the infinitesimal calculus, Newton and Leibniz, used infinitesimals in their work. However, they did not have a rigorous foundation for their approach. It was only in the 19th century that the concept of limits was introduced, and the calculus was put on a solid foundation. The concept of limits is very powerful, but it is also somewhat abstract and difficult to understand. The concept of infinitesimals is more intuitive and easier to work with.

In the 1960s, the mathematician Abraham Robinson developed a rigorous foundation for the calculus using infinitesimals. This foundation is called *nonstandard analysis*. It is based on the theory of *ultraproducts*, which is a branch of mathematical logic.

I will treat not really treat the model-theoretic foundations in this course. If you want to go deeper, check out Keisler's book and the references therein.

## Hyperreal numbers

We extend the real numbers by adding at least one *infinitesimal number*, which we denote by $\varepsilon$. 

- $\varepsilon > 0$
- For every positive real number $x$: $0 < \varepsilon < x$

All operations that we know from the real numbers can be extended to the hyperreal numbers. For example, we can add, subtract, multiply, and divide hyperreal numbers. We can also take powers, roots, and trigonometric functions of hyperreal numbers.

Therefore:

- $0 < \ldots < \varepsilon^3 < \varepsilon^2 < \varepsilon < x$, for every positive real number $x$.
- for each real number, $\frac{1}{\varepsilon} > x$

$\frac{1}{\varepsilon}$ is an *infinite number*.

- A number $\varepsilon$ is called *infinitesimally small* or *infinitesimal* if for each real number $a\neq 0$:
  $$
  -|a| < \varepsilon < |a|
  $$
- A number $M$ is called *positive infinite* if for each real number $a$:
    $$
    M > a
    $$
- A number $M$ is called *negative infinite* if for each real number $a$:
    $$
    M < a
    $$
    
- A number $M$ is called *infinite* if it either negative or positive infinite.
    
- A number $a$ is called *finite* if it is neither infinitesimal nore infinite.

<img src="_img/hyperreals.svg"  width="1000" style="display: block; margin-left: auto; margin-right: auto;">

## The Extension Principle

- The real numbers form a subset of the hyperreal numbers, and the order relation $x<y$ for the real numbers is a subset of the order relation for the hyperreal numbers.
- There is a hyperreal number that is greater than $0$ but less than every positive real number.
- For every real function $f$ of one or more variables we are given a corresponding hyperreal function $f^*$ of the same number of variables. $f^*$ is called the natural extension of $f$.

## Transfer Principle

Every real statement that holds for one or more particular real functions holds for the hyperreal natural extension of these functions.



1. **Closure law for addition**: for any $x$ and $y$, the sum $x + y$ is defined.
2. **Commutative law for addition**: $x + y = y + x$.
3. **A rule for order**: If $0 < x < y$, then $0 < 1/y < 1/x$.
4. **Division by zero is never allowed**: $x/0$ is undefined.
5. **An algebraic identity**: $(x - y)^2 = x^2 - 2xy + y^2$.
6. **A trigonometric identity**: $\sin^2 x + \cos^2 x = 1$.
7. **A rule for logarithms**: If $x > 0$ and $y > 0$, then $\log_{10} (xy) = \log_{10} x + \log_{10} y$.


## Rules for infinitesimal, finite and infinite numbers

Assume that $ \varepsilon, \delta $ are infinitesimals; $b, c$ are hyperreal numbers that are finite but not infinitesimal; and $H, K$ are infinite hyperreal numbers.

1. **Real numbers**:  
   - The only infinitesimal real number is $0$.  
   - Every real number is finite.  

2. **Negatives**:  
   - $-\varepsilon$ is infinitesimal.  
   - $-b$ is finite but not infinitesimal.  
   - $-H$ is infinite.  

3. **Reciprocals**:  
   - If $ \varepsilon \neq 0 $, $1/\varepsilon$ is infinite.  
   - $1/b$ is finite but not infinitesimal.  
   - $1/H$ is infinitesimal.  




4. **Sums**:  
   - $\varepsilon + \delta$ is infinitesimal.  
   - $b + \varepsilon$ is finite but not infinitesimal.  
   - $b + c$ is finite (possibly infinitesimal).  
   - $H + \varepsilon$ and $H + b$ are infinite.  
5. **Products**:  
   - $\delta \cdot \varepsilon$ and $b \cdot \varepsilon$ are infinitesimal.  
   - $b \cdot c$ is finite but not infinitesimal.  
   - $H \cdot b$ and $H \cdot K$ are infinite.  


6. **Quotients**:  
   - $\varepsilon / b, \varepsilon / H$, and $b / H$ are infinitesimal.  
   - $b / c$ is finite but not infinitesimal.  
   - $b / \varepsilon, H / \varepsilon$, and $H / b$ are infinite, provided that $ \varepsilon \neq 0 $.  

7. **Roots**:  
   - If $\varepsilon > 0$, $\sqrt[n]{\varepsilon}$ is infinitesimal.  
   - If $b > 0$, $\sqrt[n]{b}$ is finite but not infinitesimal.  
   - If $H > 0$, $\sqrt[n]{H}$ is infinite.  







There are no rules for the following combinations:

$ \varepsilon / \delta $, the quotient of two infinitesimals.  
$ H / K $, the quotient of two infinite numbers.  
$ H \epsilon $, the product of an infinite number and an infinitesimal.  
$ H + K $, the sum of two infinite numbers.  

Each of these can be either infinitesimal, finite but not infinitesimal, or infinite, depending on $\varepsilon$, $\Delta$, $H$ and $K$.

### Examples

$\frac{\varepsilon^2}{\varepsilon}$ is infinitesimal (equal to $\varepsilon$).  
$\frac{\varepsilon}{\varepsilon}$ is finite but not infinitesimal (equal to $1$).  
$\frac{\varepsilon}{\varepsilon^2}$ is infinite (equal to $\frac{1}{\varepsilon}$).  


### Examples

$$
\frac{b - 3\varepsilon}{c + 2\delta}
$$

finite

$$
\frac{5\varepsilon^4 - 8\varepsilon^3+\varepsilon^2}{3\varepsilon}
$$


infinitesimal if $\varepsilon \neq 0$, undefined else

$$
\frac{\varepsilon^4-\varepsilon^3+2\varepsilon^2}{5\varepsilon^4 + \varepsilon^3}
$$

infinite

**THEOREM 1**

(i) Every hyperreal number which is between two infinitesimals is infinitesimal.

(ii) Every hyperreal number which is between two finite hyperreal numbers is finite.

(iii) Every hyperreal number which is greater than some positive infinite number is positive infinite.

(iv) Every hyperreal number which is less than some negative infinite number is negative infinite.


**EXAMPLE**  
If $H$ is positive infinite then, surprisingly,  

$$
\sqrt{H + 1} - \sqrt{H - 1}
$$  

is infinitesimal.  

This is shown using an algebraic trick:  

$$
\sqrt{H + 1} - \sqrt{H - 1} = \frac{\left(\sqrt{H + 1} - \sqrt{H - 1}\right)\left(\sqrt{H + 1} + \sqrt{H - 1}\right)}{\sqrt{H + 1} + \sqrt{H - 1}}
$$  

$$
= \frac{(H + 1) - (H - 1)}{\sqrt{H + 1} + \sqrt{H - 1}} = \frac{2}{\sqrt{H + 1} + \sqrt{H - 1}}.
$$  

The numbers $H + 1$, $H - 1$, and their square roots are positive infinite, and thus the sum $\sqrt{H + 1} + \sqrt{H - 1}$ is positive infinite.  

Therefore, the quotient  

$$
\sqrt{H + 1} - \sqrt{H - 1} = \frac{2}{\sqrt{H + 1} + \sqrt{H - 1}},
$$  

a finite number divided by an infinite number, is infinitesimal.


## Standard parts

**DEFINITION**  

Two hyperreal numbers $b$ and $c$ are said to be infinitely close to each other, in symbols $b \approx c$, if their difference $b - c$ is infinitesimal. $b \not\approx c$ means that $b$ is not infinitely close to $c$.  

Here are three simple remarks:  

1. **If $\varepsilon$ is infinitesimal, then $b \approx b + \varepsilon$**.  
   This is true because the difference, $b - (b + \varepsilon) = -\varepsilon$, is infinitesimal.  

2. **$b$ is infinitesimal if and only if $b \approx 0$**.  
   The formula $b \approx 0$ will be used as a short way of writing "b is infinitesimal."  

3. **If $b$ and $c$ are real and $b$ is infinitely close to $c$, then $b$ equals $c$**.  
   $b - c$ is real and infinitesimal, hence zero; so $b = c$.  


**THEOREM**  

Let $a$, $b$, and $c$ be hyperreal numbers.  

1. $a \approx a$.  
2. If $a \approx b$, then $b \approx a$.  
3. If $a \approx b$ and $b \approx c$, then $a \approx c$.  



**THEOREM 2**  

Assume $a \approx b$. Then:  

1. If $a$ is infinitesimal, so is $b$.  
2. If $a$ is finite, so is $b$.  
3. If $a$ is infinite, so is $b$.  


### STANDARD PART PRINCIPLE  

Every finite hyperreal number is infinitely close to exactly one real number.  

### DEFINITION  

Let $b$ be a finite hyperreal number. The **standard part** of $b$, denoted by $\text{st}(b)$, is the real number which is infinitely close to $b$. Infinite hyperreal numbers do not have standard parts.  

Here are some facts that follow at once from the definition:  

- Let $b$ be a finite hyperreal number:  

    1. $\text{st}(b)$ is a real number.  
    2. $b = \text{st}(b) + \varepsilon$ for some infinitesimal $\varepsilon$.  
    3. If $b$ is real, then $b = \text{st}(b)$.  


**THEOREM**  

Let $a$ and $b$ be finite hyperreal numbers. Then:  

1. $\text{st}(-a) = -\text{st}(a)$.  
2. $\text{st}(a + b) = \text{st}(a) + \text{st}(b)$.  
3. $\text{st}(a - b) = \text{st}(a) - \text{st}(b)$.  
4. $\text{st}(ab) = \text{st}(a) \cdot \text{st}(b)$.  
5. If $\text{st}(b) \neq 0$, then $\text{st}(a/b) = \text{st}(a)/\text{st}(b)$.  
6. $\text{st}(a^n) = (\text{st}(a))^n$.  
7. If $a \geq 0$, then $\text{st}(\sqrt[n]{a}) = \sqrt[n]{\text{st}(a)}$.  
8. If $a \leq b$, then $\text{st}(a) \leq \text{st}(b)$.  


**EXAMPLE 1**  

When $\Delta x$ is an infinitesimal and $x$ is real, compute the standard part of  

$$
3x^2 + 3x \Delta x + (\Delta x)^2.
$$  



Using the rules in Theorem 3, we can write  

$$
\text{st}(3x^2 + 3x \Delta x + (\Delta x)^2) = \text{st}(3x^2) + \text{st}(3x \Delta x) + \text{st}((\Delta x)^2)
$$  

$$
= 3x^2 + \text{st}(3x) \cdot \text{st}(\Delta x) + \text{st}((\Delta x)^2)
$$  

$$
= 3x^2 + 3x \cdot 0 + 0^2 = 3x^2.
$$  


**EXAMPLE 2**  

If $\text{st}(c) = 4$ and $c \neq 4$, find  

$$
\text{st}\left(\frac{c^2 + 2c - 24}{c^2 - 16}\right).
$$  


$$
\begin{align}
\text{st}\left(\frac{c^2 + 2c - 24}{c^2 - 16}\right) &= \text{sc}\left(\frac{(c+6)(c-4)}{(c+4)(c-4)}\right)\\
&= \text{sc}\left(\frac{c+6}{c+4}\right)\\
&= \frac{\text{sc}(c+6)}{\text{sc}(c+4)}\\
&= \frac{\text{sc}(c)+\text{st}(6)}{\text{sc}(c)+\text{st}(4)}\\
&=\frac{4+6}{4+4}\\
&= \frac{10}{8}
\end{align}
$$

**EXAMPLE 3**  

If $H$ is a positive infinite hyperreal number, compute the standard part of  

$$
c = \frac{2H^3 + 5H^2 - 3H}{7H^3 - 2H^2 + 4H}.
$$  


$$
\begin{align}
\text{st}\frac{2H^3 + 5H^2 - 3H}{7H^3 - 2H^2 + 4H} &= \text{st}\frac{
2 + \frac{5}{H}-\frac{3}{H^2}}{7 - \frac{2}{H} + \frac{4}{H^2}}\\
&= \frac{2}{7}
\end{align}
$$

**EXAMPLE 4**  

If $\varepsilon$ is infinitesimal but not zero, find the standard part of  

$$
b = \frac{\varepsilon}{5 - \sqrt{25 + \varepsilon}}.
$$  


There is a general law saying
$$
(a+b)(a-b) = a^2-b^2
$$

Using this, we get
$$
\begin{align}
b &= \frac{\varepsilon(5+\sqrt{25+\varepsilon})}{(5 - \sqrt{25 + \varepsilon})(5+\sqrt{25+\varepsilon})}\\
&=\frac{5\varepsilon+\varepsilon\sqrt{25+\varepsilon}}{25-(25+\varepsilon)}\\
&=\frac{5\varepsilon+\varepsilon\sqrt{25+\varepsilon}}{-\varepsilon}\\
&= -5 - \sqrt{25+\varepsilon}\\
\text{st}(b) &= \text{st}(-5 - \sqrt{25+\varepsilon})\\
&= \text{st}(-5) - \text{st}(\sqrt{25+\varepsilon})\\
&= -5 -\sqrt{\text{st}(25+\varepsilon)}\\
&= -5 -\sqrt{\text{st}(25)+\text{st}(\varepsilon)}\\
&= -5 -\sqrt{25+0}\\
&= -10
\end{align}
$$