This is the second method of Approach 2 for Transient Requirement validation (Section 4.2.2). Users need to read and run [Transient_base.ipynb](./Transient_base.ipynb) before reading and runing this notebook.

In [1]:
import numpy as np
import pickle
from pathlib import Path
import pandas as pd
from scipy.stats import chi2
from matplotlib import pyplot as plt

For the second method we apply to evaluate the noise structure, each data point in every bin is treated as a sample of normal distribution $N(0,\sigma)$, with $\sigma$ being the standard deviation. The evaluation of the noise level in this case becomes estimating the $\sigma^2$: 
$$\sigma^2=\frac{\Sigma_{i=1}^nd_i^2}{n}$$
where $d_i$ is the sampled measurement defined and n is number of data point in the 5-km-wide bin.

The estimated $\sigma^2$ can be treated as an empirical semivariogram, where the semivariogram is defined as 
$$E[{(f(x)-f(x-r))}^2]$$
The lower confidence bound of σ2 is 
$$\sigma_{low}^2=\frac{\Sigma_{i=1}^nd_i^2}{\chi_n^2(1-\alpha)}$$
where $(1−\alpha)$ is the confidence level and $\chi_n^2$ is the percent point function of a chi-squared distribution with n degrees of freedom. If the lower confidence bound of $\sigma$ is larger than 2 (mm/yr) (Requirement 558), 4(1+𝐿1/2) (mm) (Requirement 663), or 3(1+𝐿1/2) (mm) (Requirement 663), respectively, we can conclude that, at the $(1−\alpha)$ confidence level, the $\sigma$ of the observations is larger than the threshold curve and the measurements fail the requirement in the bin. If 1) less than a certain percentage (e.g., 30%) of bins in the observation fails the requirement  and if 2) the mean value of relative deviation of the lower confidence bound ($\sigma_{low}$) from the requirement ($\sigma_0$) in the failed bins,  ${(\sigma}_{low}^2-\sigma_0^2)/\sigma_0^2$, is less than a certain threshold (e.g., 0.3), we judge the observation pass the requirement. 

In [2]:
# Set Parameters
n_bins = 10 # number of bins
mratio = 0.3
mdev = 0.3

In [3]:
calval_dir = Path.cwd()/'calval'
calval_location = 'central_valley'
# calval_location = 'texas'
# calval_location = 'oklahoma'
# calval_location = 'purtorico'
work_dir = calval_dir/calval_location

In [4]:
with open(work_dir/'approach2.pkl','rb') as f:
    dist,rel_measure, ifgs_date = pickle.load(f)

In [5]:
n_ifgs = len(dist)

In [6]:
bins = np.linspace(0.1,50.0,num=n_bins+1)
bins_interval = bins[1:] - bins[:-1]
bins_center = bins[:-1]+bins_interval/2

In [7]:
alpha = 0.05

In [8]:
n_all = np.empty([n_ifgs,n_bins+1],dtype=int) # number of points for each ifgs and bins
lowbound = np.empty([n_ifgs,n_bins])
est = np.empty([n_ifgs,n_bins])
rqmt = (3*(1+np.sqrt(bins_center)))**2 # square of the curve
for i in range(n_ifgs):
    inds = np.digitize(dist[i],bins)
    for j in range(1,n_bins+1):
        rem = rel_measure[i][inds==j] # relative measurement for each bin
        len_rem = len(rem)
        n_all[i,j-1] = len_rem
        lowbound[i,j-1] = sum(rem**2)/chi2.ppf(1-alpha,df=len_rem)
        est[i,j-1] = sum(rem**2)/len_rem
        
    n_all[i,-1] = np.sum(n_all[i,0:-2])

In [9]:
def to_str(x:bool):
    if x==True:
        return 'true '
    elif x==False:
        return 'false '

In [10]:
# for i in range(n_ifgs):
#     fig, ax = plt.subplots(figsize=[18, 5.5])
#     ax.plot(bins_center,rqmt,'r')
#     ax.scatter(bins_center,est[i],c='yellow')
#     ax.scatter(bins_center,lowbound[i],c='green')

#     ax.set_xlabel('Distance (km)')
#     ax.set_ylabel(r'$\sigma^2$ ($mm^2$)')
#     plt.legend(["Mission Requirement","Estimated","Lower Bound"])

In [11]:
dev = (lowbound-rqmt)/rqmt
success_or_fail = dev < 0.0

In [12]:
n_pos = np.empty(n_ifgs)
mean_dev = np.empty(n_ifgs)
success_or_fail_total = np.empty(n_ifgs,dtype=bool)
for i in range(n_ifgs):
    dev_i = dev[i]
    dev_i_pos = dev_i[dev_i>=0.0]
    n_pos[i] = len(dev_i_pos)
    if n_pos[i] == 0:
        mean_dev[i] = 0.0
    else:
        mean_dev[i] = dev_i_pos.mean()
    if n_pos[i]<n_bins*mratio and mean_dev[i] < mdev:
        success_or_fail_total[i] = True
    else:
        success_or_fail_total[i] = False

In [13]:
success_or_fail_total_2d = np.array([success_or_fail_total])
mean_dev = np.array([mean_dev])
success_or_fail = np.hstack((success_or_fail,success_or_fail_total_2d.T))
dev = np.hstack((dev,mean_dev.T))
success_or_fail_str = [list(map(to_str, x)) for x in success_or_fail]

In [14]:
columns = []
for i in range(n_bins):
    columns.append(f'{bins[i]:.2f}-{bins[i+1]:.2f}')
columns.append('mean')

In [15]:
index = []
for i in range(len(ifgs_date)):
    index.append(ifgs_date[i,0].strftime('%Y%m%d')+'-'+ifgs_date[i,1].strftime('%Y%m%d'))

In [16]:
dev_pd = pd.DataFrame(dev,columns=columns,index=index)
success_or_fail_pd = pd.DataFrame(success_or_fail_str,columns=columns,index=index)

In [17]:
s = dev_pd.style
s.set_table_styles([  # create internal CSS classes
    {'selector': '.true', 'props': 'background-color: #e6ffe6;'},
    {'selector': '.false', 'props': 'background-color: #ffe6e6;'},
], overwrite=False)
s.set_td_classes(success_or_fail_pd)

Unnamed: 0,0.10-5.09,5.09-10.08,10.08-15.07,15.07-20.06,20.06-25.05,25.05-30.04,30.04-35.03,35.03-40.02,40.02-45.01,45.01-50.00,mean
20190110-20190122,0.042383,-0.03252,-0.247413,-0.3451,-0.373684,-0.451678,-0.505772,-0.527438,-0.555006,-0.564697,0.042383
20190203-20190215,1.4632,0.679687,0.37686,0.223452,0.087698,0.020041,0.001101,-0.031803,-0.067558,-0.059585,0.407434
20190227-20190311,0.490405,0.310682,0.10624,0.047188,0.035494,0.001446,-0.013496,-0.019339,-0.015103,-0.020117,0.165243
20190323-20190404,-0.304434,-0.346492,-0.435029,-0.43231,-0.391696,-0.42664,-0.404205,-0.425245,-0.438744,-0.452135,0.0
20190416-20190428,0.263882,0.091021,0.029934,-0.001781,-0.052317,-0.102073,-0.079825,-0.081036,-0.151585,-0.172139,0.128279
20190510-20190522,0.59577,0.039126,-0.002989,-0.05704,-0.118541,-0.153945,-0.238424,-0.271476,-0.31446,-0.340247,0.317448
20190603-20190615,-0.492431,-0.535291,-0.452552,-0.376812,-0.30172,-0.289542,-0.189077,-0.149344,-0.115572,-0.057741,0.0
20190627-20190709,-0.599747,-0.549796,-0.482847,-0.389619,-0.341016,-0.262536,-0.212853,-0.202023,-0.166449,-0.161948,0.0
20190721-20190802,-0.614176,-0.542665,-0.505312,-0.514986,-0.509668,-0.509794,-0.552287,-0.575044,-0.599795,-0.617556,0.0
20190814-20190826,-0.258049,-0.321223,-0.187666,-0.111148,-0.069076,0.014469,0.032254,0.095796,0.154068,0.124375,0.084193


Percentage of interferograms passes the requirement (significant level = 0.95):

In [18]:
np.count_nonzero(success_or_fail_total)/len(success_or_fail_total)

0.6666666666666666

**Note**: some low-distance bins are rejected by approach 2.2 but not rejected by 2.1. For example, first bin of central valley 20190122-20190203, 78 percent of points are under the curve but rejected by approach 2.2.

The most potential reason is: for low-distance bins, the variation of the mission requirement is significant so it is not appropriate to assume points in these bins obey the same distribution.

I increase the number of bins to 50 and find the percentage of accepted interferograms by approach 2.2 increases to around 0.8 which is 0.76 before. But the phenomenon still exist. As for approach 2.1, no significant difference.