Inga Ulusoy, Computational modelling in python, SoSe2020 

# Numerical integration

Given are the following functions:

\begin{align}
    f_1(x) &= x\left(x-3\right)\left(x+3\right) \\
    f_2(x) &= \left| x \right| \\
    f_3(x) &= \sin \left(2.1x\right)\left(-\frac{x}{2}\right) \\
    f_4(x) &= 1.6^x -1.5x \\
    f_5(x,y) &= \sin\left(x+y\right)\tan\left(0.1x\right) \\
    f_6(x,y) &= \sin\left(\sqrt{5}+x\right)y 
\end{align}

\- courtesy of Anna Bardroff \- 

In [None]:
from numpy import *
import matplotlib.pyplot as plt

def function1(x):
    y = x*(x - 3)*(x + 3)
    return y

def function2(x):
    y = abs(x)
    return y

def function3(x):
    y = sin(x * 2.1) * (-x / 2.0)
    return y

def function4(x):
    y = 1.6 ** x - 1.5 * x
    return y

def function5(x,y):
    z = sin(x + y) * tan(0.1 * x)
    return z

def function6(x,y):
    z = sin(math.sqrt(5) + x) * y
    return z

Numerical integration (also referred to as quadrature) is the integration of a function
\begin{align}
I = \int_a^b f(x) dx
\end{align}
The direct approach to solving this equation is to add values of the function $f(x)$ in the given interval $a,b$ and obtain $I$. The goal is then to obtain $I$ as accurately as possible with as few evaluations of the function as possible. More elegant ways are described below.

A closed formula includes the endpoints at a and b, open formula uses only values between a and b.

Note that this is closely connected to differential equations, as the value $I \equiv y(b)$ is equivalent to
\begin{align}
\frac{dy} {dx} = f(x)
\end{align}
with the boundary condition
\begin{align}
y(a) = 0
\end{align}
Such ordinary differential equations ($F'(x) = f(x)$) are solved using for example, Runge-Kutta methods and will be discussed later.

## The direct approach: The Riemann sum

Integration by summation through the Riemann sum
\begin{align}
I = \int_a^b f(x) dx = \sum_j^n f(x_j) \cdot \Delta x_j
\end{align}
We choose the lower and upper bounds as $a=-4$ and $b=4$ and the spacing is given by $\Delta x$.

In [None]:
a=-4
b=4
npoints = 20
num,dx = linspace(a,b,npoints,retstep=True)
f1 = function1(num)
print(num,dx)
mf=16
fig, ax = plt.subplots(figsize=(8,5))
ax.plot(num,f1,label='function1')
ax.bar(num,f1,width=dx,align='edge',edgecolor='black',alpha=0.5)

plt.xticks(fontsize=mf)
plt.yticks(fontsize=mf)
ax.set_xlabel('x value',fontsize=mf)
ax.set_ylabel('y value',fontsize=mf)
legend = ax.legend(loc='lower right', shadow=False,fontsize=mf,borderpad = 0.1, labelspacing = 0, handlelength = 0.8)
plt.show()

In [None]:
def integrate_Riemann(myfunc,a,b,npoints):
    xvals, dx = linspace(a,b,npoints,retstep = True)
    ff = myfunc(xvals)
    myI = sum(ff)*dx
    return myI

#check this for a known function
#note that we include the last rectangle in a simple approximation 
a=0
b=pi/2
npoints = 100
int1=integrate_Riemann(sin,a,b,npoints)
print('The Riemann integral of the test function is {}.'.format(int1))

In [None]:
%%time
a=-4
b=4
npoints = 100
int1=integrate_Riemann(function1,a,b,npoints)
print('The Riemann integral of function1 is {}.'.format(int1))
int2=integrate_Riemann(function2,a,b,npoints)
print('The Riemann integral of function2 is {}.'.format(int2))
int3=integrate_Riemann(function3,a,b,npoints)
print('The Riemann integral of function3 is {}.'.format(int3))
int4=integrate_Riemann(function4,a,b,npoints)
print('The Riemann integral of function4 is {}.'.format(int4))

In [None]:
num = linspace(a,b,npoints)
f1 = function1(num)
f2 = function2(num)
f3 = function3(num)
f4 = function4(num)

mf=16
fig, ax = plt.subplots(4,figsize=(5,3*4))
ax[0].plot(num,f1,marker='.',label='function1')
ax[1].plot(num,f2,marker='.',label='function2')
ax[2].plot(num,f3,marker='.',label='function3')
ax[3].plot(num,f4,marker='.',label='function4')

for i in range(4):
    ax[i].xaxis.set_tick_params(labelsize=mf)
    ax[i].yaxis.set_tick_params(labelsize=mf)
    ax[i].set_xlabel('x value',fontsize=mf)
    ax[i].set_ylabel('y value',fontsize=mf)
    legend = ax[i].legend(loc='upper left', shadow=False,fontsize=mf,borderpad = 0.1, labelspacing = 0, handlelength = 0.8)
plt.show()

## Trapezoidal rule

In this approach, the function $f(x)$ is first interpolated between two known points, and the interpolating functions are then integrated. As interpolating functions, polynomials (usually of first and second order) are used.

In the simplest approach, the interpolating function is simply a straight line (a polynomial of degree zero) - this is the midpoint rule, and the integral is represented through rectangles or boxes with height equal to the midpoint between $x_j$ and $x_{j+1}$.

In the trapeziodal rule, a linear function is used to connect the two neighboring points, and the integral is represented through trapezoids. Using one trapezoid to represent the integral between $a$ and $b$, this reads (trapeziodal rule)
\begin{align}
\int_a^b f(x) dx \approx (b-a) \frac{f(a)+f(b)}{2} = \frac{\Delta x}{2} \left(f(a)+f(b)\right)
\end{align}
This would only give a rather crude approximation of the integral, and to improve the accuracy, the interval between $a$ and $b$ is divided into a number of subintervals $n-1$. This leads to the extended trapezoidal rule
\begin{align}
\int_a^b f(x) dx \approx \frac{\Delta x}{2} \left( \sum_{j=1}^{n} f(x_j) + f(x_{j-1}) \right)
\end{align}
For subintervals of equal length, this is a Newton-Cotes formula, and equivalent to Simpson's rule at lower order.

The formula can be rewritten as a sum of two Riemann integrals
\begin{align}
\int_a^b f(x) dx \approx \frac{1}{2} \left( \sum_{j=1}^{n} f(x_j) \Delta x + \sum_{j=1}^{n} f(x_{j-1}) \Delta x \right)
\end{align}

The advantage of this approach is that previously evaluated points can be reused and iterated with increasing $n$ until a specified degree of accuracy is achieved.

In [None]:
a=-4
b=4
npoints = 20
num,dx = linspace(a,b,npoints,retstep=True)
f1 = function1(num)

mf=16
fig, ax = plt.subplots(figsize=(8,5))
ax.plot(num,f1,marker='.',label='function1')
ax.stem(num,f1,use_line_collection=True)

plt.xticks(fontsize=mf)
plt.yticks(fontsize=mf)
ax.set_xlabel('x value',fontsize=mf)
ax.set_ylabel('y value',fontsize=mf)
legend = ax.legend(loc='lower right', shadow=False,fontsize=mf,borderpad = 0.1, labelspacing = 0, handlelength = 0.8)
plt.show()

In [None]:
def integrate_Trapezoid(myfunc,a,b,nvals):
    xvals,deltax = linspace(a,b,nvals,retstep=True)
    yvals = myfunc(xvals)
    if nvals == 1:
        myI = (b-a)/0.5*(xvals[0]+xvals[-1])
    else:
        left = yvals[1:]
        right = yvals[:-1]
        myI = deltax/2 * sum(left+right)
    return myI

#check this for a known function
a=0
b=pi/2
n=100
int1=integrate_Trapezoid(sin,a,b,n)
print('The trapezoid integral of test function is {}.'.format(int1))

In [None]:
%%time
a=-4
b=4
npoints=100
int1=integrate_Trapezoid(function1,a,b,n)
print('The extended trapezoidal rule gives the integral {} for function1.'.format(int1))
int2=integrate_Trapezoid(function2,a,b,n)
print('The extended trapezoidal rule gives the integral {} for function2.'.format(int2))
int3=integrate_Trapezoid(function3,a,b,n)
print('The extended trapezoidal rule gives the integral {} for function3.'.format(int3))
int4=integrate_Trapezoid(function4,a,b,n)
print('The extended trapezoidal rule gives the integral {} for function4.'.format(int4))

In [None]:
#Of course, there already is a numpy routine for this
num2=linspace(0,pi/2,npoints)
ff = sin(num2)
int1=trapz(ff,num2)
print('The extended trapezoidal rule gives the integral {} for test function.'.format(int1))

In [None]:
%%time
num = linspace(a,b,npoints)
f1 = function1(num)
f2 = function2(num)
f3 = function3(num)
f4 = function4(num)
int1=trapz(f1,num)
int2=trapz(f2,num)
int3=trapz(f3,num)
int4=trapz(f4,num)
print('The extended trapezoidal rule gives the integral {} for function1.'.format(int1))
print('The extended trapezoidal rule gives the integral {} for function2.'.format(int2))
print('The extended trapezoidal rule gives the integral {} for function3.'.format(int3))
print('The extended trapezoidal rule gives the integral {} for function4.'.format(int4))

For an overview of the available functions, take a look at https://docs.scipy.org/doc/scipy/reference/tutorial/integrate.html. 

The numpy reference can be found here https://het.as.utexas.edu/HET/Software/Numpy/reference/routines.math.html#sums-products-differences.

### Multidimensional integrals

Numerical approaches to multidimensional integrals are computationally expensive (for example, approximations as repeated one-dimensional integrals require repeated function evaluations which grow exponentially with the number of dimensions $N$).

A method that overcomes this curse of dimensionality is Monte Carlo integration. Monte Carlo methods rely on repeated evaluations using random sampling. Thus, in Monte Carlo integration, the sampling grid is chosen randomly.

The multidimensional integral
\begin{align}
I = \int_{a_x}^{b_x}\int_{a_y}^{b_y} f(x,y) dx dy
\end{align}
in the Volume $V = \int_{a_x}^{b_x}\int_{a_y}^{b_y} f(x,y) dx dy$ can be approximated as
\begin{align}
\int \int f dV \approx V \langle f \rangle \pm V \sqrt{\frac{\langle f^2 \rangle-\langle f \rangle^2}{N}}
\end{align}
for the $N$ sampling points and the atandard deviation defined through the mean denoted by $\langle \rangle$ as
\begin{align}
\langle f^2 \rangle &= \frac{1}{N} \sum_i^N f_i^2 \\
\langle f \rangle^2 &= \left( \frac{1}{N} \sum_i^N f_i \right)
\end{align}
A Monte Carlo integration consists of several steps: Random number generation and test that the generated numbers lie within the integration volume, evaluation of the function, evaluation of the integral formula

We will try this for one of the 1D functions, function1. Let's start with the generation of the sampling points:

In [None]:
%%time
N = 1000
a = -4
b = 4
#we need to determine the sampling volume, which lies between a and b,
#and the minimum and maximum value of the function we want to integrate
#trying an educated guess of the fmax and fmin values based on 
#preliminary evaluations of the function
xval = linspace(a, b, N)
yval = function1(xval)
fmax = max(yval)
fmin = min(yval)

#now the sampling grid is generated
x_rand = (b-a)*random.random(N)+a
y_rand = (fmax-fmin)*random.random(N)+fmin

#check if the point that is sampled lies within the integration area - this returns the index
#where the condition evaluates to true because false is interpreted as zero
#this is a numpy function and works only for variables of type array
#positive sampled values
ind_below_pos = nonzero((y_rand <= function1(x_rand)) & (y_rand >= 0))
ind_below_neg = nonzero((y_rand >= function1(x_rand)) & (y_rand < 0))
print(ind_below_pos)
#we only need the points outside the integration volume to see how 
#good the selected volume is - should be as few as possible
ind_above_pos = nonzero((y_rand > function1(x_rand)) & (y_rand >= 0))
ind_above_neg = nonzero((y_rand < function1(x_rand)) & (y_rand < 0))

#we visualize the sampling points to understand how this works
mf=16
fig, ax = plt.subplots(figsize=(8,5))
ax.scatter(x_rand[ind_below_pos], y_rand[ind_below_pos], color = "green",label='below')
ax.scatter(x_rand[ind_below_neg], y_rand[ind_below_neg], color = "red",label='negative')

ax.scatter(x_rand[ind_above_pos], y_rand[ind_above_pos], color = "blue",label='above')
ax.scatter(x_rand[ind_above_neg], y_rand[ind_above_neg], color = "blue")

ax.plot(xval, yval, color = "red",label='function')
plt.xticks(fontsize=mf)
plt.yticks(fontsize=mf)
ax.set_xlabel('x value',fontsize=mf)
ax.set_ylabel('y value',fontsize=mf)
ax.text(-3.8,28,'Area A',fontsize=mf)
legend = ax.legend(loc='lower right', shadow=False,fontsize=mf,borderpad = 0.1, labelspacing = 0, handlelength = 0.8,ncol=4)
plt.show()

Note that the sampling volume for a 1D integral corresponds to a rectangle. The Monte Carlo integral is now the area A multiplied with the fraction of points that fall below the curve (the integral of the function). In other words, a volume is chosen over which to sample, which contains the volume that should be integrated over. The points in this selected volume that fall below the curve are set to its functional value, and the points that lie above (that lie outside of the integration volume) are set to zero.

This immediately makes it clear that the sampling volume should contain the integration volume as closely as possible: If many sampled points lie outside the integration volume, the effective sampling rate is much lower than the number of samples, directly increasing the numerical error.

Generally, Monte Carlo integration requires many functional evaluations to produce accurate integrals, with the accuracy only increasing as $\sqrt{N}$ for $N$ sampled points.

In [None]:
%%time
#now we can compute the above integral
print('Number of pts above the curve = outside of integration volume:', len(ind_above_pos[0])+len(ind_above_neg[0]))
print('Number of pts below the curve = inside of integration volume:', len(ind_below_pos[0])+len(ind_below_neg[0]))
print('''Number of pts below the curve that evaluate to positive values = inside of integration volume,
      count positive:''', len(ind_below_pos[0]))
print('''Number of pts below the curve that evaluate to negative values = inside of integration volume,
      but need to be subtracted:''', len(ind_below_neg[0]))

print('Ratio of sampled points within integration are vs total sample are', (len(ind_below_pos[0])+len(ind_below_neg[0]))/N)
print('Total sample (rectangle) area:', (fmax-fmin)*(b-a))
print('Monte Carlo integral:', (fmax-fmin)*(b-a)*(len(ind_below_pos[0])-len(ind_below_neg[0]))/N)
#for simplicity, the variance / standard deviation as in the above equation is not taken into account here

Compare this to the results above. Let's see how this works with many more points:

In [None]:
def integrate_MC1D(N,myfunc,a,b):
    xval = linspace(a, b, N)
    yval = myfunc(xval)
    fmax = max(yval)
    fmin = min(yval)
    x_rand = (b-a)*random.random(N)+a
    y_rand = (fmax-fmin)*random.random(N)+fmin
    ind_below_pos = nonzero((y_rand <= myfunc(x_rand)) & (y_rand >= 0))
    ind_below_neg = nonzero((y_rand >= myfunc(x_rand)) & (y_rand < 0))
    ind_above_pos = nonzero((y_rand > myfunc(x_rand)) & (y_rand >= 0))
    ind_above_neg = nonzero((y_rand < myfunc(x_rand)) & (y_rand < 0))
    mf=16
    fig, ax = plt.subplots(figsize=(8,5))
    ax.scatter(x_rand[ind_below_pos], y_rand[ind_below_pos], color = "green",label='below,positive')
    ax.scatter(x_rand[ind_below_neg], y_rand[ind_below_neg], color = "red",label='below,negative')
    ax.scatter(x_rand[ind_above_pos], y_rand[ind_above_pos], color = "blue",label='above')
    ax.scatter(x_rand[ind_above_neg], y_rand[ind_above_neg], color = "blue")
    ax.plot(xval, yval, color = "red",label='function')
    plt.xticks(fontsize=mf)
    plt.yticks(fontsize=mf)
    ax.set_xlabel('x value',fontsize=mf)
    ax.set_ylabel('y value',fontsize=mf)
    legend = ax.legend(loc='lower right', shadow=False,fontsize=mf,borderpad = 0.1, labelspacing = 0, handlelength = 0.8,ncol=4)
    plt.show()
    outside=len(ind_above_pos[0])+len(ind_above_neg[0])
    print('Number of pts above the curve (outside of integration area):',outside)
    ins=len(ind_below_pos[0])+len(ind_below_neg[0])
    print('Number of pts below the curve (within integration area):', ins)
    sampled = ins/N
    integral = (len(ind_below_pos[0])-len(ind_below_neg[0]))/N
    print('Ration of sampled points within integration are vs total sample are', sampled)
    area = (fmax-fmin)*(b-a)
    print('Total sample (rectangle) area:', area)
    print('Monte Carlo integral:', area*integral)

#check this for a known function
N=4000
a=0
b=pi/2
int0=integrate_MC1D(N,sin,a,b)

In [None]:
N=40000
a = -4
b = 4
myfunc = function1
int1 = integrate_MC1D(N,myfunc,a,b)
#note how the value changes for same parameters but repeated evaluation of the cell, and how much it fluctuates

In [None]:
N=20000
a = -4
b = 4
myfunc = function2
int1 = integrate_MC1D(N,myfunc,a,b)

In [None]:
N=20000
a = -4
b = 4
myfunc = function3
int1 = integrate_MC1D(N,myfunc,a,b)

In [None]:
N=20000
a = -4
b = 4
myfunc = function4
int1 = integrate_MC1D(N,myfunc,a,b)

# Task 1


Write a class object that contains the three different numerical approaches to integration. Compare accuracy and speed of the different approaches. Which approach works well for which function? Which functions are difficult to integrate with the Monte Carlo approach?

Note the correct values for the integrals - $I_1=0.0$, $I_2=16.0$, $I_3 = -1.1829$, $I_4 = 13.6190$.

# Optional 

Find an appropriate function in the scipy library to integrate the 2D functions and apply it to function5 and function6.

Note that scipy has a comprehensive library of deterministic numerical integration methods:\
https://docs.scipy.org/doc/scipy/reference/tutorial/integrate.html

Upload your notebook to moodle.