## SOEE3250/SOEE5675M/SOEE5116					2024/25

Inverse Theory

# Practical 3: Best Linear Unbiased Estimator (BLUE)

## Geophysical background
A building is slowly subsiding (sinking) with time. The height of the building is assumed to follow an exponential decay of the form:

$h=h_0+A (e^{-t/150}-1 )$

where $h$ is height (in metres), $h_0$ is the height at time $t=0$, $t$ is the time in days and $A$ is a constant representing the total subsidence at infinite time. The height of the building is measured using precise GPS every 60 days for 300 days. 

The errors of the height measurements have zero mean, are assumed independent, and their standard deviations are:

| Time(days)   | Standard deviation (cm)  |
| :------ |---------:|
|  0  | 3 |
|  60 | 5 |
| 120 | 2 | 
| 180 | 1 |
| 240 | 3 | 
| 300 | 2 |


## Inverse theory problem

The aim is to create your own dataset then demonstrate that the BLUE is unbiased and gives better results than the unweighted least squares estimator. To achieve this, you will simulate measurements for a given set of model parameters and then use the BLUE (as well as unweighted least squares) to invert the problem. Simulation of data has the advantage that you know what the true values are for the model parameters, which enables you to assess how successful the estimator is. 


In [96]:
import matplotlib.pyplot as plt
import numpy as np
import random

Q1) In markdown, write down the model vector. 

Q2) In Python, define a vector of times, and the matrix G. Display G to the screen using the 'print' command.
You can implement the exponential function using np.exp(), and pass it a **vector** of times

In [None]:
#time = np.array([.....])
# create a matrix of ones, then overwrite the entries that are not ones
#G = np.ones((6,2))
#G[:,1]=  #define the multiplier of model parameter A in one line of Python
print(G)

Q3) Define a vector, m_true, assuming that $h_0=140$ m and $A=0.15$ m. Now calculate what the measurements would be if there were no measurement errors using your forward model. Define this as the vector d_true. 
Plot d_true against time as blue circles. Label both axes and provide a title. 

In [None]:
#m_true = np.array([....])
#d_true = # G * m_true. In Python, use "@" for matrix multiplication

#plt.figure()
#plt.plot(x,y 'o',color='blue')
#plt.ylabel('')
#plt.xlabel('');
#plt.title('My title')

Q4) Define the variance-covariance matrix $Q_{dd}$ for the observations. 

In [None]:
# Easiest to use the np.diag() function.
# define a vector of standard deviations, then use this in an argument to the np.diag function.
# sd = np.array([....])
# Qdd = np.diag(...)
print(Qdd)

Q5) Calculate some random noise to represent the error for each of the measurements, assuming the noise is drawn from a Gaussian (normal) distribution with the appropriate standard deviation according to the table above. 
To do this:
- initialise the random number generator random.seed(10) which ensures that each time you run the code you'll get the same answer.
- modifying the code below, generate 6 random noise values scaled appropriately 

 Add the noise to d_true to give simulated measurements (call this "d").  Copy and paste the plotting commands from Q3, and add d to your plot as red crosses.


In [100]:
# initialise the psuedorandom number sequence:
random.seed(10)

#initialise the noise variable as zeros
noise = np.zeros(6)  

for i in range(6):
    # edit this:
    noise[i] = random.normalvariate(0,1)  #generate normally distributed random numbers with mean 0 and standard deviation 1.

d = d_true + noise


Q6) Calculate the best linear unbiased estimate. Copy and paste the plotting code from Q5 and add a **smooth** orange line giving your best-fit model.

Q7) Now repeat the calculations in questions 5 and 6, 10000 times, with a different realisation of the noise each time.
Initialise the random number generator as before. One way to achieve this is to create a loop, saving the estimated model parameters for each iteration.

In [None]:
Num = 10000
random.seed(10)
bootstrapped_models = np.zeros((Num,2))
for i in range(Num):
    # generate noise vector
    # create d
    # find BLUE, save in appropriate location of array: bootstrapped_models
    # bootstrapped_models[i,:] = ....


Q8) Plot a histogram of each model parameter with an appropriate number of bins, adding a vertical red line indicating the true values.

In [None]:
# amend this code:
plt.figure()
plt.hist( bootstrapped_models[:,0], bins=2);
plt.plot( [m_true[0],m_true[0]],[0,1000],'-',color='red') #draw on a red line, with height 1000

Q9) Is the estimator (for both model variables) unbiased? Explain. 

Q10) Plot the 10000 values of the two model parameters against each other as a scatter plot (just use the usual plot command in Python, but plot as circles). Are they correlated? Explain briefly.

Q11) Calculate the standard deviation of each model parameter from your 10000 iterations

In [None]:
# use np.std(....)

Q13) Is 10000 enough? Gauge how well the standard deviation is converged by plotting the standard deviation from the first n models, against n, where n takes the values 100, 200, 300 etc.
Are your values converged?

In [None]:
plt.figure()
for n in range(100,10100,100):
    # plot std of the first n models, against n
    

Q12) Calculate the matrix Qmm as defined in lecture 3. Are matrix entries consistent with what you have found?

Q13) Repeat this exercise but use unweighted least squares as the estimator instead of the BLUE. You should be able to copy/paste your code, and only need to modify Qdd. Are the variances better or worse than for the BLUE, and is this what you expect?