In [None]:
%matplotlib inline

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import scipy.integrate as integrate

# PHYS 395 - week 2

**Matt Wiens - #301294492**

This notebook will be organized similarly to the lab script, with major headings corresponding to the headings on the lab script.

*The TA's name (Ignacio) will be shortened to "IC" whenever used.*

## Setup

In [None]:
# Set default plot size
plt.rcParams["figure.figsize"] = (10, 7)

In [None]:
%%javascript
IPython.OutputArea.auto_scroll_threshold = 9999

# Session 1

# Numerical error homework

## 1. Truncation error

Truncation error is the error involved in approximating an infinite sum by a finite sum. For example, the truncation error involved in approximating $e^x$ to the first three terms of its Taylor series is

\begin{equation}
    \left| e^x - \left(1 + x + \frac{x^2}{2}\right)\right|
    .
\end{equation}

For numerical integration, truncation error comes in to play because we are approximating an integral (an infinite sum) by a finite sum.

## 2. Rounding error

Given any algorithm which produces a result, rounding error is the difference between producing that result using exact arithmetic and producing the result using finite-precision, rounded arithmetic (floating point arithmetic). On a computer, a number called "machine epsilon" gives you the upper bound of the relative error that occurs in floating point arithmetic due to rounding.

## 3. Demonstrating floating point arithmetic error

In [None]:
print(7 / 3 - 4 / 3 - 1)

The above in exact arithmetic should be $0$. But due to error in floating point arithmetic it is instead a non-zero small number.

# Numerical integration 

Our goal in this section will be to investigate how we can approximate integrals according to different "rules".    

## Simple rule

Consider some function $f$ defined on an interval $[a, b]$. A simple rule to approximate the integral of $f$ over this interval is to take $N + 1$ evenly spaced points along the interval $\{x_1, x_2, \ldots, x_{N + 1}\}$ and then add up the area of the rectangles induced by these points (see the picture in the lab script). Note that we always have $x_1 = a$ and $x_N = b$. This gives us the following approximation:

\begin{equation}
    \int_a^b f(x) dx \approx \sum_{i = 1}^N f(x_i) h
    ,
\end{equation}

where $h$ is the width of each rectangle given by

\begin{equation}
    h = \frac{b - a}{N}
    .
\end{equation}

In [None]:
def integrate_left_reimann(y: np.ndarray, h: float):
    """Approximates an integral using the left Reimann sum."""
    return np.sum(y * h)

Now we will approximate the integral of $\sin(x)$ over $[0, \frac{\pi}{2}]$ using logarithmically spaced values of $N$. We will plot the absolute error as a function of the bin width $h$.

In [None]:
# Generate a number of N values and the corresponding h values
num_ns = 50

# We need to be a little careful since we want the N values to
# be logarithmically space, but we also require that each N is an integer.
n_vals = np.round(np.logspace(1, 6, num_ns)).astype(int)
h_vals = np.pi / (2 * n_vals)

Before we plot the absolute errors, note that

\begin{equation}
    \int_0^{\frac{\pi}{2}} \sin(x) dx = 1
    .
\end{equation}

In [None]:
# For each N value calculate the absolute error
errors = np.zeros(num_ns)

for idx, n in enumerate(n_vals):
    xs = np.linspace(0, np.pi / 2, n + 1)
    ys = np.sin(xs)

    errors[idx] = abs(1 - integrate_left_reimann(ys, np.pi / (2 * n)))

In [None]:
# Make a scatter plot
_, ax = plt.subplots()

plt.loglog(h_vals, errors, "o")

# Labels
ax.set_xlabel(r"$h$")
ax.set_ylabel("abs error");

Here we see a linear relationship in the log-log plot. This means that the error E is related to $h$ through some relationship of the form

\begin{equation}
    E = A h^\alpha
    .
\end{equation}

This is because a linear relationship in log-log can be expressed as

\begin{align}
    &\log E = \alpha \log h + \log A \\
    &\iff \log E = \log \left( A h^\alpha \right) \\
    &\iff E = A h^\alpha
    .
\end{align}

By inspection, I would guess that $\alpha = 1$ and $A = 1$.

Now we will try to fit the log values we obtained to a line.

In [None]:
# Find the linear coefficients for the loglog relationship
coeffs = np.polyfit(x=np.log(h_vals), y=np.log(errors), deg=1)

# Translate to A and h
print("A = %.2f" % np.exp(coeffs[1]))
print("alpha = %.2f" % coeffs[0])

My guess for $\alpha$ was correct, but the value for $A$ I guessed was not quite right (it's difficult to make out on the above plot what $A$ should be with any precision).

Using the coefficients we calculated, we can estimate that to get an error less than or equal to $10^{-8}$ we would need a width of 

\begin{align}
    h &= \left( \frac{E}{A} \right)^{\frac{1}{\alpha}} \\
      &\approx \frac{10^{-8}}{0.49} \\
      &\approx 2 \cdot 10^{-8}
      ,
\end{align}

which meeds we need at least $77712216$ slices using this method.

In [None]:
# Supporting calculations for above text block
needed_h = 1e-8 / np.exp(coeffs[1])
needed_n = np.pi / 2 / needed_h

print("approx h required: %e" % needed_h)
print("approx N required: %f" % needed_n)

## Trapezoid rule

On the lab script we are shown a picture of how the trapezoid rule works. Given $x$ values $\{x_0, x_1, x_2, x_3\}$ and corresponding function values $\{y_0, y_1, y_2, y_3\}$. There are many ways to compute the sum of the trapezoids: for each trapezoid I will add the the rectangle induced the height of the right endpoint together with the remaining half rectangle. This gives us the formula for the sum $S$ as

\begin{align}
    S &= \sum_{i = 1}^3
        \left(
            y_i h
            + \frac{1}{2} \left( y_{i - 1} - y_i \right) h
        \right) \\
      &= h \sum_{i = 1}^3
        \left(
            \frac{1}{2} \left( y_{i - 1} + y_i \right)
        \right)
\end{align}

where $h = x_i - x_{i - 1}$ (assumed constant).

Now, using the trapezoid rule, we will repeat the investigation we performed when using the left Riemann approximation. We will keep the same number of $N$ values to test and the corresponding $h$ values we used previously.

In [None]:
# For each N value calculate the absolute error
errors = np.zeros(num_ns)

for idx, n in enumerate(n_vals):
    xs = np.linspace(0, np.pi / 2, n + 1)
    ys = np.sin(xs)

    errors[idx] = abs(1 - integrate.trapz(y=ys, dx=np.pi / (2 * n)))

In [None]:
# Make a scatter plot
_, ax = plt.subplots()

plt.loglog(h_vals, errors, "o")

# Labels
ax.set_xlabel(r"$h$")
ax.set_ylabel("abs error");

Here we can see that the log-log relationship is still linear. However, for the same values of $h$ the error in the trapezoidal rule is much lower.

We'll calculate the $A$ and $\alpha$ parameters and estimate what $h$ and $N$ we would need to keep the absolute error below $10^{-8}$.

In [None]:
# Find the linear coefficients for the loglog relationship
coeffs = np.polyfit(x=np.log(h_vals), y=np.log(errors), deg=1)

# Translate to A and h
print("A = %.2f" % np.exp(coeffs[1]))
print("alpha = %.2f" % coeffs[0])

In [None]:
# What is the greatest h/ least N we can use?
needed_h = 1e-8 / np.exp(coeffs[1])
needed_n = np.pi / 2 / needed_h

print("approx h required: %e" % needed_h)
print("approx N required: %f" % needed_n)

To compare with the left Riemann approximation, we need $77712216 - 13088578 = 64623638$ less steps!

## Simpson's method

Note that for Simpson's method we need an *even* number of slices!

First we'll plot the errors for different values of $h$. Note that we need to be careful to make sure $N$ is even here, so we'll adjust each $N$ value so that this holds.

In [None]:
# We need to be a little careful since we want the N values to
# be logarithmically space, but we also require that each N is an integer.
simps_n_vals = n_vals + n_vals % 2
simps_h_vals = np.pi / (2 * n_vals)

In [None]:
# For each N value calculate the absolute error
errors = np.zeros(num_ns)

for idx, n in enumerate(simps_n_vals):
    xs = np.linspace(0, np.pi / 2, n + 1)
    ys = np.sin(xs)

    errors[idx] = abs(1 - integrate.simps(y=ys, dx=np.pi / (2 * n)))

In [None]:
# Make a scatter plot
_, ax = plt.subplots()

plt.loglog(simps_h_vals, errors, "o")

# Labels
ax.set_xlabel(r"$h$")
ax.set_ylabel("abs error");

Here we can clearly see that Simpson's method outperforms both methods we looked at before. We quickly reach the limits of machine precision as we increase $N$. The dependence is still linear, although due to the limits of machine precision, we cannot show the full relationship we would get with exact arithmetic here.

If wanted to fit the linear relationship we need to look at the values of $N$ that lead to $h \approx 10^{-3}$ and higher.

In [None]:
# Last h value index to keep
last_h_idx = 20

# Find the linear coefficients for the loglog relationship
coeffs = np.polyfit(
    x=np.log(h_vals[: last_h_idx + 1]), y=np.log(errors[: last_h_idx + 1]), deg=1
)

# Translate to A and h
print("A = %.2f" % np.exp(coeffs[1]))
print("alpha = %.2f" % coeffs[0])

In [None]:
# What is the greatest h/ least N we can use?
needed_h = 1e-8 / np.exp(coeffs[1])
needed_n = np.pi / 2 / needed_h

print("approx h required: %e" % needed_h)
print("approx N required: %f" % needed_n)

As can be seen from the $A$ value or by looking that the greatest $N$ required, Simpson's method is about twice as efficient as the trapezoid rule  and about eight times as efficient as the left Riemann sum approximation for this integral.