# Algorithm Theoretical Basis Document: Algorithms to Validate NISAR L2 Transient Displacement Requirements

**Original code authored by:** NISAR Science Team Members and Affiliates  

*March 08, 2022*

*NISAR Solid Earth Team*

<hr/>

## 1.Transient Deformation
Detecting and quantifying transient deformation is essential for improving our understanding of fundamental processes associated with tectonics, subsurface movement of magma, volcanic eruptions, landslides, response to changing surface loads and a wide variety of anthropogenic phenomena.

## 2 Theoretical Basis of Algorithms

### 2.1 Requirements
**L2 Requirement 663 – Deformation Transients:** *The NISAR project shall measure at least two components of the point-to-point vector displacements over at least 70% of targeted sites with accuracy of $3(1+ L^{1/2})$ mm or better, over length scales 0.1 km < L < 50 km, at 100 m resolution, and over 12-day time scales.*

NISAR must be able to constrain displacements in two look directions (left-looking on both ascending and descending tracks) for the active area of 70% of target sites with an accuracy that scales with baseline distance L between any two locations within a scene. This pertains to all directly measured or inferred 12-day interferograms over the duration of the mission. Here, accuracy is calculated using L in kilometers but with the units removed, at or better than 100 m resolution. The 12-day time scale corresponds to a repeat NISAR pass on each of the ascending and descending satellite tracks, from which interferograms can be made and displacements estimated. The NISAR mission has compiled a list of 2000 global targets covering areas of known or potential transient deformation related to the processes specified in the requirement (NISAR Handbook Appendix H, 2018). These targets include all Earth's active volcanoes above sea-level, areas of rapid glacial mass changes, selected deforming reservoirs of water, oil, gas, CO2 and steam and landslide-prone areas near major population centers, as well as sites where selected disaster-related events have occurred. 

### 2.2 Approach to validating the L2 requirements
We use [two] approaches [in this notebook] for validating the NISAR solid earth L2 requirements. [Both] approaches require the generation of a standard set of NISAR L3 data products consisting of surface displacement for selected areas that sample a range of vegetation types, topographic relief, and strain rates. Generation of these products, as discussed in Section 3, requires a set of temporally contiguous/overlapping SAR interferograms over all time periods of interest (see description of inputs and potential preprocessing steps in Sections 3 and 5).

All the Solid Earth Science requirements specify a minimum spatial coverage component, whose validation will rely on a combination of assessing the coverage of InSAR-quality data and ensuring that the required measurement accuracy is achieved in a suite of locations that comprehensively sample different types of regions with respect to surface properties and vegetation land cover. Many of these regions will be automatically evaluated as part of the targeted sites for the transient deformation requirement.

#### 2.2.1	L2 Requirement 663 - Transient Displacements

To validate the L2 requirements on transient displacements, we will produce 12-day interferograms from both descending and ascending tracks over diverse target sites where GNSS observations are available. The two components of vector displacement, ascending and descending, will be validated separately.

For **Validation Approach 1**, we will use unwrapped interferograms at 100-m resolution to produce point-to-point relative LOS measurements (and their associated uncertainties) between GNSS sites. Position observations from the same set of GNSS sites and at the InSAR acquisition times will be projected into the LOS direction and differenced pairwise. These will be compared to the point-to-point InSAR LOS measurements using a methodology similar to that described in Section 2.2.2., except that the accuracy specification is $3(1+L^{1/2})$ mm over 0.1 km < L < 50 km. 

For **Validation Approach 2**, we will utilize interferograms over the non-deforming areas discussed in Section 2.2.1. In practice, characterization of transient deformation will usually be improved by examining longer time series of interferograms. The approach described here validates the requirement that short timescale or temporally complex transients can be characterized with a single interferogram. 

For **Validation Approach 3**, which can be blended with the first two approaches, we use UAVSAR to validate the InSAR-derived motions at scales smaller than the characteristic spacing between the GNSS stations. UAVSAR has the advantage of filling in the spatial sampling between GNSS stations at a resolution approximately 10X higher than the resolution of NISAR images. Ideally, UAVSAR data will be collected during NISAR passes. Corner reflectors will further validate the accuracy and can be used to assess UAVSAR motion compensation errors. Assessment of UAVSAR motion compensation errors should be carried out prior to NISAR launch from experiments using corner reflectors that are moved between passes.

Comprehensive validation requires transient sites possessing different deformation characteristics (e.g., volcanoes, landslides, aquifers, hydrocarbons, etc.), vegetation covers (forest, shrub, bare surface, etc.), seasonality (leaf on/off, snow, etc.), and terrain slopes. The NISAR Science Team will select a set of cal/val regions to be used for this requirement and will list those sites in the NISAR cal/val plan.

### 2.3.	Technical Framework for Validating Requirements
#### 2.3.1.	Comparison of GNSS and InSAR measurements
The InSAR and GNSS comparisons for Requirement [663] will be performed based on the basis of interferogram by interferogram.

#### 2.3.2.	Spatial Analysis of InSAR scenes
Individual interferogram analysis in non-deforming regions will be conducted based on unwrapped interferograms at the required spatial resolutions. We first estimate the covariances or semi-variogram of phase observations between points of varying distances by constructing the structure function (e.g., Lohman & Simons, 2005) (see Section 4.2). We then compare the spatial spectrum of the covariance function to the requirement(s) at distances between 0.1 and 50 km  to validate that the observed noise is smaller than the threshold in the various requirements. We use ensemble statistics over many interferograms and over different terrains and seasons for this validation approach. SES members have done relevant work in the past to validate the NISAR performance tool (Hensley et al., 2016) using the  70-km swath of ALOS interferograms. Before the NISAR launch, we can use ALOS-2 wide-swath or Sentinel-1 scenes to conduct this validation.

## 3 Interferogram Preparation
In this initial processing step, all the necessary Level-2 unwrapped interferogram products are gathered. Ascending and descending interferograms will be prepared for independent analysis. 

### 3.1  Introduction
The project will provide sets of ascending and descending unwrapped L2 interferograms over regions of interest listed in the NISAR Solid Earth calval document. For the purpose of testing calval algorithms prior to NISAR launch, the NISAR SE team will make interferograms using SAR data from complementary missions (e.g. Sentinel-1 or ALOS-2). These will include at a minimum nearest-neighbor interferograms.

As part of L2 processing, the project will calculate and apply required and optional corrections to minimize errors due to non-geophysical sources. An example of a required correction is the removal of ionospheric propagation delays using split-band processing.

### 3.2  Configure Local Processing Environment

In [None]:
#Load Packages
import math
import numpy as np
import os
import pandas as pd
from datetime import datetime as dt
from matplotlib import pyplot as plt
from mintpy.utils import readfile, utils as ut
from mintpy.objects import gps
from mintpy.objects.gps import GPS
from scipy import special
from scipy.stats import binned_statistic
from pathlib import Path
import subprocess
import pyproj
import random
import copy
import pickle
import warnings
#from scipy.spatial.distance import pdist
#from itertools import combinations,chain

Set Custom Parameters in the cell below. Users only need to input the following parameters:
- `calval_location`: name of the study area (e.g., 'oklahoma');
- `download_region`: lat-lon box that falls into the study region (e.g., '"35.00 36.00 -100.00 -98.00"');
- `sentinel_track`: track number of sentinel-1 data (e.g., '135');
- `download_start_date` and `download_end_date`: data in this time interval will be downloaded and analysed (e.g., '20000101', '20230101').

**NOTE**: the `download_start_date` and `download_end_date` are set to download all available data by default, users can modified it to select the data they want.

In [None]:
# calval_location = 'central_valley'
# download_region = '"36.18 36.26 -119.91 -119.77"' #download box in S,N,W,E format
# sentinel_track = '144'

# calval_location = 'texas'
# download_region = '"31.00 33.00 -106.00 -103.00"' #download box in S,N,W,E format
# sentinel_track = '151'

# calval_location = 'oklahoma'
# download_region = '"35.00 36.00 -100.00 -98.00"' #download box in S,N,W,E format
# sentinel_track = '107'

calval_location = 'purtorico'
download_region = '"18.00 18.10 -67.00 -66.00"' #download box in S,N,W,E format
sentinel_track = '135'
download_start_date = '20000101'
download_end_date = '20220801'

Set up directories structure:

In [None]:
calval_dir = Path.cwd()/'calval'

#Set Directories
print("calval directory: ", str(calval_dir))
calval_dir.mkdir(exist_ok=True)
work_dir = calval_dir/calval_location
print("Work directory: ", str(work_dir))
work_dir.mkdir(exist_ok=True)
mint_dir = work_dir/'Mintpy'
print("Mintpy directory: ", str(mint_dir))
mint_dir.mkdir(exist_ok=True)
gunw_dir = work_dir/'products'

### 3.3  Download Interferogram

Download all interferograms that intersect download_region over specified time range by ARIA:

In [None]:
os.chdir(work_dir)
command = 'ariaDownload.py --bbox ' + download_region + ' --start ' + download_start_date + ' --end ' + download_end_date 
if sentinel_track != '':
    command = command + ' --track ' + sentinel_track

result = subprocess.run(command,capture_output=True,text=True,shell=True)
print(result.stdout)

#delete unnecessary files
(gunw_dir/"avg_rates.csv").unlink(missing_ok=True)
(gunw_dir/"ASFDataDload0.py").unlink(missing_ok=True)
(gunw_dir/"AvgDlSpeed.png").unlink(missing_ok=True)
(gunw_dir/"error.log").unlink(missing_ok=True)
(work_dir/"error.log").unlink(missing_ok=True)

Set up ARIA product and mask data with GSHHS water mask:

In [None]:
command = 'ariaTSsetup.py -f "products/*.nc" --mask Download'
result = subprocess.run(command,capture_output=True,text=True,shell=True)
print(result.stdout)

### 3.4  Set Up MintPy Configuration file

Time series analysis is not needed for validating Transient Requirement. We use Mintpy here only for data loading. The configuration file required by Mintpy needs to be created and written into `mint_dir`:

In [None]:
config_file_content = 'mintpy.load.processor = aria\n'
config_file_content += 'mintpy.load.unwFile = ../stack/unwrapStack.vrt\n'
config_file_content += 'mintpy.load.corFile = ../stack/cohStack.vrt\n'
config_file_content += 'mintpy.load.connCompFile = ../stack/connCompStack.vrt\n'
config_file_content += 'mintpy.load.demFile = ../DEM/SRTM_3arcsec.dem\n'
config_file_content += 'mintpy.load.incAngleFile = ../incidenceAngle/*.vrt\n'
config_file_content += 'mintpy.load.azAngleFile = ../azimuthAngle/*.vrt\n'
config_file_content += 'mintpy.load.waterMaskFile = ../mask/watermask.msk\n'
config_file_content += 'mintpy.reference.lalo = auto\n'
config_file_content += 'mintpy.topographicResidual.pixelwiseGeometry = no\n'
config_file_content += 'mintpy.troposphericDelay.method = no\n'
config_file_content += 'mintpy.topographicResidual = no\n'

In [None]:
config_file = mint_dir/(calval_location+'.cfg')
config_file.write_text(config_file_content)

### 3.5  Load data

In [None]:
os.chdir(mint_dir)

In [None]:
command = 'smallbaselineApp.py ' + str(config_file) + ' --dostep load_data'
result = subprocess.run(command,capture_output=True,text=True,shell=True)
print(result.stdout)

The output of this step is an "inputs" directory containing two HDF5 files:
- ifgramStack.h5: This file contains 6 dataset cubes (e.g. unwrapped phase, coherence, connected components etc.) and multiple metadata
- geometryGeo.h5: This file contains geometrical datasets (e.g., incidence/azimuth angle, masks, etc.)

In [None]:
ifgs_file = mint_dir/'inputs/ifgramStack.h5'
geom_file = mint_dir/'inputs/geometryGeo.h5'

**NOTE:** If the interferogram has a resolution lower than 100 m, we need multi-look the interferogram phase values before calculating the empirical semivarigram.

Load the date of interferograms:

In [None]:
ifgs_date = readfile.read(ifgs_file,datasetName='date')[0]

In [None]:
_ifgs_date = np.empty_like(ifgs_date,dtype=dt)
for i in range(ifgs_date.shape[0]):
    start_date = str(int(ifgs_date[i,0]))
    end_date = str(int(ifgs_date[i,1]))
    start_date = dt.strptime(start_date, "%Y%m%d")
    end_date = dt.strptime(end_date, "%Y%m%d")
    _ifgs_date[i] = [start_date,end_date]
    
ifgs_date = _ifgs_date
del _ifgs_date

Remove interferograms with time interval other than 12 days:

In [None]:
del_row_index = []
for i in range(ifgs_date.shape[0]):
    time_interval = (ifgs_date[i,1]-ifgs_date[i,0]).days
    if time_interval != 12:
        del_row_index.append(i)

In [None]:
ifgs_date = np.delete(ifgs_date,del_row_index,0)

Identify independent interferograms (i.e., selected inteferograms do NOT share common dates):

In [None]:
del_row_index = []
i = 0
while i<ifgs_date.shape[0]-1:
    if ifgs_date[i,1]==ifgs_date[i+1,0]:
        del_row_index.append(i+1)
        i = i+2
    else:
        i = i+1

In [None]:
ifgs_date = np.delete(ifgs_date,del_row_index,0)

Then the phase and coherence of selected interferograms, geometrical datasets, and attribution of them are loaded into numpy array:

In [None]:
unwrapPhaseName = ['unwrapPhase-'+i[0].strftime('%Y%m%d')+'_'+i[1].strftime('%Y%m%d') for i in ifgs_date]
coherenceName = ['coherence-'+i[0].strftime('%Y%m%d')+'_'+i[1].strftime('%Y%m%d') for i in ifgs_date]

In [None]:
ifgs_unw,atr = readfile.read(ifgs_file,datasetName=unwrapPhaseName)
insar_displacement = -ifgs_unw*float(atr['WAVELENGTH'])/(4*np.pi)*1000 # unit in mm

insar_coherence = readfile.read(ifgs_file,datasetName=coherenceName)[0]
del ifgs_unw

Change default missing phase values in interferograms from 0.0 to `np.nan`.

In [None]:
insar_displacement[insar_displacement==0.0] = np.nan

### 3.6 Optional interferograms correction

Phase distortions related to solid earth and ocean tidal effects as well as those due to temporal variations in the vertical stratification of the atmosphere can be mitigated using the approaches described below. At this point, it is expected that these corrections will not be needed to validate the mission requirements, but they may be used to produce the highest quality data products. Typically, these are applied to the estimated time series product rather than to the individual interferograms, since they are a function of the time of each radar acquisition.

#### 3.6.A Solid Earth Tide Correction
[MintPy provides functionality for this correction.]

#### 3.6.B Tropospheric Delay Correction
Optional atmospheric correction utilizes the PyAPS (Jolivet et al., 2011, Jolivet and Agram, 2012) module within GIAnT (or eventually a merged replacement for GIAnT and MintPy). PyAPS is well documented, maintained and can be freely downloaded. PyAPS is included in GIAnT distribution). PyAPS currently includes support for ECMWF’s ERA-Interim, NOAA’s NARR and NASA’s MERRA weather models. A final selection of atmospheric models to be used for operational NISAR processing will be done during Phase C.

[T]ropospheric delay maps are produced from atmospheric data provided by Global Atmospheric Models. This method aims to correct differential atmospheric delay correlated with the topography in interferometric phase measurements. Global Atmospheric Models (hereafter GAMs)... provide estimates of the air temperature, the atmospheric pressure and the humidity as a function of elevation on a coarse resolution latitude/longitude grid. In PyAPS, we use this 3D distribution of atmospheric variables to determine the atmospheric phase delay on each pixel of each interferogram.

The absolute atmospheric delay is computed at each SAR acquisition date. For a pixel a_i at an elevation z at acquisition date i, the four surrounding grid points are selected and the delays for their respective elevations are computed. The resulting delay at the pixel a_i is then the bilinear interpolation between the delays at the four grid points. Finally, we combine the absolute delay maps of the InSAR partner images to produce the differential delay maps used to correct the interferograms.

[MintPy provides functionality for this correction.]

#### 3.6.C Topographic Residual Correction 
[MintPy provides functionality for this correction.]

**NOTE:** Phase deramping is not appplied here.

**NOTE:** If the solid earth tide correction for interferogram is applied, it should also be applied for GNSS observation.

Preliminary summary: we have load all data we need for processing:
- `atr`: metadata, including incident angle, longitude and latitude step width, etc;
- `insar_displacement`: LOS measurement from InSAR;
- `insar_coherence`: coherence value of the interferograms:
- `ifgs_date`: list of date pairs of two SAR images that form a interferogram.

In order to prevent reruning the above preparing again, we save the data into disk. They can be loaded easily next time:

In [None]:
with open(work_dir/'base.pkl','wb') as f:
    pickle.dump((atr,insar_displacement,insar_coherence,ifgs_date),f)

In [None]:
with open(work_dir/'base.pkl','rb') as f:
    atr,insar_displacement,insar_coherence,ifgs_date = pickle.load(f)

## 4 Make GNSS LOS Measurements

### 4.1 Find Collocated GNSS Stations
The project will have access to L2 position data for continuous GNSS stations in third-party networks such NSF’s Plate Boundary Observatory, the HVO network for Hawaii, GEONET-Japan, and GEONET-New Zealand, located in target regions for NISAR solid earth calval. Station data will be post-processed by one or more analysis centers, will be freely available, and will have latencies of several days to weeks, as is the case with positions currently produced by the NSF’s GAGE Facility and separately by the University of Nevada Reno. Networks will contain one or more areas of high-density station coverage (2~20 km nominal station spacing over 100 x 100 km or more) to support validation of L2 NISAR requirements at a wide range of length scales.

Get space and time range for searching GNSS station:

In [None]:
length, width = int(atr['LENGTH']), int(atr['WIDTH'])
lat_step = float(atr['Y_STEP'])
lon_step = float(atr['X_STEP'])
N = float(atr['Y_FIRST'])
W = float(atr['X_FIRST'])
S = N+lat_step*(length-1)
E = W+lon_step*(width-1)

In [None]:
start_date_gnss = ifgs_date[0,0]
end_date_gnss = ifgs_date[-1,-1]

inc_angle = int(float(atr.get('incidenceAngle', None)))
az_angle = int(float(atr.get('azimuthAngle', None))) 

Search for collocated GNSS stations:

In [None]:
site_names, site_lats, site_lons = gps.search_gps(SNWE=(S,N,W,E), 
                                                  start_date=start_date_gnss.strftime('%Y%m%d'),
                                                  end_date=end_date_gnss.strftime('%Y%m%d'))
site_names = [str(stn) for stn in site_names]
print("Initial list of {} stations used in analysis:".format(len(site_names)))
print(site_names)

### 4.2 Make GNSS LOS Measurements

In this step, the 3-D GNSS observations are projected into LOS direction. The InSAR observations are averaged 3 by 3 near the station positions.

**NOTE:** the number of pixels used in calculating the averaged phase values at the GPS location depends on the resolution of input data.

Get daily position solutions for GNSS stations:

In [None]:
displacement = {}
gnss_time_series = {}
gnss_time_series_std = {}
bad_stn = {}  #stations to toss
pixel_radius = 1   #number of InSAR pixels to average for comparison with GNSS

for counter,stn in enumerate(site_names):
    gps_obj = GPS(site = stn, data_dir = str(mint_dir/'GPS'))
    gps_obj.open()
        
    # count number of dates in time range
    gps_obj.read_displacement
    dates = gps_obj.dates
    for i in range(insar_displacement.shape[0]):
        start_date = ifgs_date[i,0]
        end_date = ifgs_date[i,-1]
        
        range_days = (end_date - start_date).days
        gnss_count = np.histogram(dates, bins=[start_date,end_date])
        gnss_count = int(gnss_count[0])
        #print(gnss_count)

        # select GNSS stations based on data completeness, here we hope to select stations with data frequency of 1 day and no interruption
        if range_days == gnss_count-1:
        #if start_date in dates and end_date in dates:
            _,disp_gnss_time_series,disp_gnss_time_series_std,site_latlon,_ = gps_obj.read_gps_los_displacement(atr,
                                                                                                                    start_date=start_date.strftime('%Y%m%d'),
                                                                                                                    end_date=end_date.strftime('%Y%m%d'))
            x_value = round((site_latlon[1] - W)/lon_step)
            y_value = round((site_latlon[0] - N)/lat_step)

            #displacement from insar observation in the gnss station, averaged
            #Caution: If you expand the radius parameter farther than the bounding grid it will break. 
            disp_insar = insar_displacement[i,
                                            y_value-pixel_radius:y_value+pixel_radius, 
                                            x_value-pixel_radius:x_value+pixel_radius]
            if np.isfinite(disp_insar).sum() == 0:
                break
            disp_insar = np.nanmean(disp_insar)

            disp_gnss_time_series = disp_gnss_time_series*1000 # convert unit from meter to mm
            disp_gnss_time_series_std = disp_gnss_time_series_std*1000
            gnss_time_series[(i,stn)] = disp_gnss_time_series
            gnss_time_series_std[(i,stn)] = disp_gnss_time_series_std
            displacement[(i,stn)] = list(site_latlon)
            disp_gnss = disp_gnss_time_series[-1] - disp_gnss_time_series[0]

            displacement[(i,stn)].append(disp_gnss)
            displacement[(i,stn)].append(disp_insar)
        else:
            try:
                bad_stn[i].append(stn)
            except:
                bad_stn[i] = [stn]

Do some data structure transformation:

In [None]:
gnss_time_series = dict(sorted(gnss_time_series.items()))
gnss_time_series_std = dict(sorted(gnss_time_series_std.items()))
displacement = dict(sorted(displacement.items()))
bad_stn = dict(sorted(bad_stn.items()))

In [None]:
gnss_time_series = pd.DataFrame.from_dict(gnss_time_series)
gnss_time_series_std = pd.DataFrame.from_dict(gnss_time_series_std)

In [None]:
displacement = pd.DataFrame.from_dict(displacement,orient='index',
                                      columns=['lat','lon','gnss_disp','insar_disp'])
displacement.index = pd.MultiIndex.from_tuples(displacement.index,names=['ifg index','station'])

If there are less than 3 GNSS stations, don't conduct comparison:

In [None]:
drop_index = []
for i in displacement.index.get_level_values(0).unique():
    if len(displacement.loc[i]) < 3:
        drop_index.append(i)
displacement=displacement.drop(drop_index)

**NOTE:** 
- `Mintpy get_los_displacement()` only use the central incidence angle which may involve noticable bias for large area. We next will use pixel-dependent look vector.
- A more general critterion is needed for GNSS station selection. Here the stations with uninterrupted data are selected while, in Secular Requirement Validation, stations are selected by data completeness and standard variation.

### 4.3 Make GNSS and InSAR Relative Displacements

In this case, I select ref site randomly.

In [None]:
# reference GNSS stations to GNSS reference site
for i in displacement.index.get_level_values(0).unique():
    gps_ref_site_name = random.choice(displacement.loc[i].index.unique())
    displacement.loc[i,'gnss_disp'] = displacement.loc[i,'gnss_disp'].values - displacement.loc[(i,gps_ref_site_name),'gnss_disp']
    displacement.loc[i,'insar_disp'] = displacement.loc[i,'insar_disp'].values - displacement.loc[(i,gps_ref_site_name),'insar_disp']
    ref_x_value = round((displacement.loc[(i,gps_ref_site_name),'lon'] - W)/lon_step)
    ref_y_value = round((displacement.loc[(i,gps_ref_site_name),'lat'] - N)/lat_step)

    ref_disp_insar = insar_displacement[i,
                                        ref_y_value-pixel_radius:ref_y_value+1+pixel_radius, 
                                        ref_x_value-pixel_radius:ref_x_value+1+pixel_radius]
    ref_disp_insar = np.nanmean(ref_disp_insar)
    insar_displacement[i] -= ref_disp_insar

Here is the data to be validated next. The measurements are in milimeters.

In [None]:
# plot GNSS stations on InSAR velocity field
cmap = copy.copy(plt.get_cmap('RdBu'))
#cmap.set_bad(color='black')
vmin, vmax = np.nanmin(insar_displacement), np.nanmax(insar_displacement)
for i in displacement.index.get_level_values(0).unique():
    fig, ax = plt.subplots()
    img1 = ax.imshow(insar_displacement[i], cmap=cmap,vmin=vmin,vmax=vmax, interpolation='nearest', extent=(W, E, S, N))
    ax.set_title(ifgs_date[i,0].strftime('%Y%m%d')+'-'+ifgs_date[i,1].strftime('%Y%m%d'))
    cbar1 = fig.colorbar(img1, ax=ax)
    cbar1.set_label('LOS displacement [mm]')

    for stn in displacement.loc[i].index:
        lon,lat = displacement.loc[(i,stn),'lon'],displacement.loc[(i,stn),'lat']
        color = cmap((displacement.loc[(i,stn),'gnss_disp']-vmin)/(vmax-vmin))
        ax.scatter(lon,lat,s=8**2,color=color,edgecolors='k')
        ax.annotate(stn,(lon,lat),color='black')

## 5 NISAR Validation: GNSS-InSAR Direct Comparison

In [None]:
insar_disp = {}
gnss_disp = {}
ddiff_dist = {}
ddiff_disp = {}
abs_ddiff_disp = {}
for i in displacement.index.get_level_values(0).unique():
    displacement_i = displacement.loc[i]
    insar_disp_i = []
    gnss_disp_i = []
    ddiff_dist_i = []
    ddiff_disp_i = []

    for sta1 in displacement_i.index:
        for sta2 in displacement_i.index:
            if sta2 == sta1:
                break
            insar_disp_i.append(displacement_i.loc[sta1,'insar_disp']-displacement_i.loc[sta2,'insar_disp'])
            gnss_disp_i.append(displacement_i.loc[sta1,'gnss_disp']-displacement_i.loc[sta2,'gnss_disp'])
            ddiff_disp_i.append(gnss_disp_i[-1]-insar_disp_i[-1])
            g = pyproj.Geod(ellps="WGS84")
            _,_,distance = g.inv(displacement_i.loc[sta1,'lon'],displacement_i.loc[sta1,'lat'],
                                 displacement_i.loc[sta2,'lon'],displacement_i.loc[sta2,'lat'])
            distance = distance/1000 # convert unit from m to km
            ddiff_dist_i.append(distance)
    insar_disp[i]=np.array(insar_disp_i)
    gnss_disp[i]=np.array(gnss_disp_i)
    ddiff_dist[i]=np.array(ddiff_dist_i)
    ddiff_disp[i]=np.array(ddiff_disp_i)
    abs_ddiff_disp[i]=abs(np.array(ddiff_disp_i))

### 5.1 Compare Displacement

In [None]:
for i in displacement.index.get_level_values(0).unique():
    plt.figure(figsize=(11,7))
    disp_range = (min([*insar_disp[i],*gnss_disp[i]]),max([*insar_disp[i],*gnss_disp[i]]))
    plt.hist(insar_disp[i],bins=100,range=disp_range,color = "green",label='D_InSAR')
    plt.hist(gnss_disp[i],bins=100,range=disp_range,color="orange",label='D_GNSS', alpha=0.5)
    plt.legend(loc='upper right')
    plt.title(f"Displacements \n Date range {ifgs_date[i,0].strftime('%Y%m%d')}-{ifgs_date[i,1].strftime('%Y%m%d')} \n Number of station pairs used: {len(insar_disp[i])}")
    plt.xlabel('LOS Displacement (mm)')
    plt.ylabel('Number of Station Pairs')
    plt.show()

### 5.2 Plot Displacement Residuals Distribution

In [None]:
for i in displacement.index.get_level_values(0).unique():
    plt.figure(figsize=(11,7))
    plt.hist(ddiff_disp[i],bins = 100, color="darkblue",linewidth=1,label='D_gnss - D_InSAR')
    plt.legend(loc='upper right')
    plt.title(f"Residuals \n Date range {ifgs_date[i,0].strftime('%Y%m%d')}-{ifgs_date[i,1].strftime('%Y%m%d')} \n Number of stations pairs used: {len(ddiff_disp[i])}")
    plt.xlabel('Displacement Residual (mm)')
    plt.ylabel('N Stations')
    plt.show()

### 5.3 Plot Absolute Displacement Residuals As a Function of Distance

In [None]:
for i in displacement.index.get_level_values(0).unique():
    dist_th = np.linspace(min(ddiff_dist[i]),max(ddiff_dist[i]),100)
    acpt_error = 3*(1+np.sqrt(dist_th))
    plt.figure(figsize=(11,7))
    plt.scatter(ddiff_dist[i],abs_ddiff_disp[i],s=1)
    plt.plot(dist_th, acpt_error, 'r')
    plt.xlabel("Distance (km)")
    plt.ylabel("Amplitude of Displacement Residuals (mm)")
    plt.title(f"Residuals \n Date range {ifgs_date[i,0].strftime('%Y%m%d')}-{ifgs_date[i,1].strftime('%Y%m%d')} \n Number of stations pairs used: {len(ddiff_dist[i])}")
    plt.legend(["Mission Reqiurement","Measuement"])
    #plt.xlim(0,5)
    plt.show()

Save data used for approach 1:
- `ddiff_dist`: distance of GNSS pairs,
- `abs_ddiff_disp`: absolute value of measurement redisuals,
- `ifgs_date`: list of date pairs of two SAR images that form a interferogram.

In [None]:
with open(work_dir/'approach1.pkl','wb') as f:
    pickle.dump((ddiff_dist,abs_ddiff_disp,ifgs_date),f)

The validation approach 1 is implemented in `Transient_approach1.ipynb`.

In that notebook, the number of GNSS pairs which meet the mission requirement as a percentage of the total number of GNSS pairs are calculated.

## 6 NISAR Validation: Noise Level Validation

**Note:** Now we simply assume there is no deformation in this study area and time interval. But in fact, it is hard to find a enough large area without any deformation. An more realistic solution is to apply a mask to mask out deformed regions. But this may introduce bias for emperical variation estimation.

In [None]:
def load_geo(attr_geo) -> np.array:
    """This program calculate the coordinate of the geocoded files 
    and perform coordinate transformation from longitude and latitude to local coordinate in kilometers.
    
    Parameters:
    ######
    geo_attr:attribute of the geocoded data

    Returns:
    ######
    X:coordinates in east direction in km
    Y:coordinates in north direction in km
    """
    
    X0=float(attr_geo['X_FIRST'])
    Y0=float(attr_geo['Y_FIRST'])
    X_step=float(attr_geo['X_STEP'])
    Y_step=float(attr_geo['Y_STEP'])
    length=int(attr_geo['LENGTH'])
    width=int(attr_geo['WIDTH'])
    
    # 
    Y_local_end_ix = math.floor(length/2)
    Y_local_first_ix = -(length-Y_local_end_ix-1)
    X_local_end_ix = math.floor(width/2)
    X_local_first_ix = -(width-X_local_end_ix-1)
    
    Y_origin = Y0+Y_step*(-Y_local_first_ix)
    
    X_step_local = math.cos(math.radians(Y_origin))*X_step*111.1
    Y_step_local = Y_step*111.1
    

    X=np.linspace(X_local_first_ix*X_step_local,X_local_end_ix*X_step_local,width)
    Y=np.linspace(Y_local_first_ix*Y_step_local,Y_local_end_ix*Y_step_local,length)

    return X,Y

def rand_samp(data:np.ndarray,X:np.ndarray,Y:np.ndarray,num_samples:int=10000):
    """
    This function select points used for calculation of structure function using a random point sampling 
    
    Parameters:
    data: np.ndarray
        input data array
    X: np.ndarray
        input X location of the data points
    Y: np.ndarray
        input Y location of the data points
    num_samples: int
        number of points to be sampled
    percent_samples: float
        percentage of points to be samples
        
    Returns: 
    tuple [np.ndarray, np.ndarray, np.ndarray]
        (data, X, Y)
    """
    length=np.size(data)
    if length<num_samples:
        n_points=length
        warnings.warn(f'Using all data points: {n_points}')
    else:
        n_points=num_samples
        
    ind=np.random.choice(length,n_points,replace=False)
    sampled_data=data[ind]
    sampled_X=X[ind]
    sampled_Y=Y[ind]

    return sampled_data, sampled_X, sampled_Y

### 6.1 Mask Pixels with Low Coherence (optional)

In [None]:
#insar_displacement[insar_coherence <0.6] = np.nan

Plot the coherence and InSAR measurements:

In [None]:
n_ifgs = insar_displacement.shape[0]

In [None]:
cmap = plt.get_cmap('gray')

for i in range(n_ifgs):
    fig, ax = plt.subplots(figsize=[18, 5.5])
    img1 = ax.imshow(insar_coherence[i],cmap=cmap, interpolation='nearest',extent=(W, E, S, N))
    ax.set_title(f"Coherence \n Date range {ifgs_date[i,0].strftime('%Y%m%d')}-{ifgs_date[i,1].strftime('%Y%m%d')}")
    cbar1 = fig.colorbar(img1, ax=ax)
    cbar1.set_label('coherence')

In [None]:
cmap = plt.get_cmap('RdBu')
for i in range(n_ifgs):
    fig, ax = plt.subplots(figsize=[18, 5.5])
    img1 = ax.imshow(insar_displacement[i], cmap=cmap, interpolation='nearest', extent=(W, E, S, N))
    ax.set_title(f"Interferogram \n Date range {ifgs_date[i,0].strftime('%Y%m%d')}-{ifgs_date[i,1].strftime('%Y%m%d')}")
    cbar1 = fig.colorbar(img1, ax=ax)
    cbar1.set_label('LOS displacement [mm]')

Calculate the coordinate for every pixel:

In [None]:
X0,Y0 = load_geo(atr)
X0_2d,Y0_2d = np.meshgrid(X0,Y0)

Reshape data to 1d and remove value of `nan`:

In [None]:
X = []
Y = []
data = []

In [None]:
for i in range(n_ifgs):
    mask = np.isnan(insar_displacement[i])
    data_i = insar_displacement[i][~mask]
    X_i = X0_2d[~mask]
    Y_i = Y0_2d[~mask]
    data.append(data_i)
    X.append(X_i)
    Y.append(Y_i)

### 6.2 Remove Trend (Optional)

In [None]:
#for i in range(n_ifgs):
#    data[i]=variogram.remove_trend(X[i],Y[i],data[i])

### 6.3 Randomly Sample Pixel

In [None]:
for i in range(n_ifgs):
    data[i],X[i],Y[i]=rand_samp(data[i],X[i],Y[i],num_samples=1000000)

### 6.4 Pair up sample

For each interferogram, randomly selected pixels need to be paired up. In order to keep measurements independent, different pixel pairs can not share same pixel. This is achieved by pairing up in sequence, i.e., pairing up pixel number 1 and number 2, 3 and 4...

In [None]:
dist = [] # distance
rel_measure = [] # relative measurement
for i in range(n_ifgs):
    x_odd = X[i][1::2]
    y_odd = Y[i][1::2]
    data_odd = data[i][1::2]
    if (X[i].shape[0] % 2) == 0:
        x_even = X[i][0::2]
        y_even = Y[i][0::2]
        data_even = data[i][0::2]
    else:
        x_even = X[i][0:-1:2]
        y_even = Y[i][0:-1:2]
        data_even = data[i][0:-1:2]
        
    distance = np.sqrt((x_odd-x_even)**2+(y_odd-y_even)**2)
    rel_measure_i = abs(data_odd-data_even)
    
    dist.append(distance)
    rel_measure.append(rel_measure_i)

In [None]:
# dist = [] # distance
# rel_measure = [] # relative measurement
# for i in range(n_ifgs):
#     x = X[i]
#     y = Y[i]
#     data_i = data[i]
#     xy = np.stack((x,y),axis=-1)
#     distance = pdist(xy)
#     index_iter = chain.from_iterable(combinations(range(len(x)),2))
#     index = np.fromiter(index_iter,int).reshape(-1,2)
#     rel_measure_i = abs(data_i[index[:,0]] - data_i[index[:,1]])
    
#     dist.append(distance)
#     rel_measure.append(rel_measure_i)

Show the statistical property of selected pixel pairs:

In [None]:
for i in range(n_ifgs):
    fig, ax = plt.subplots(figsize=[18, 5.5])
    img1 = ax.hist(dist[i], bins=100)
    ax.set_title(f"Histogram of distance \n Date range {ifgs_date[i,0].strftime('%Y%m%d')}-{ifgs_date[i,1].strftime('%Y%m%d')}")
    ax.set_xlabel(r'Distance ($km$)')
    ax.set_ylabel('Frequency')
    ax.set_xlim(0,50)

In [None]:
for i in range(n_ifgs):
    fig, ax = plt.subplots(figsize=[18, 5.5])
    img1 = ax.hist(rel_measure[i], bins=100)
    ax.set_title(f"Histogram of Relative Measurement \n Date range {ifgs_date[i,0].strftime('%Y%m%d')}-{ifgs_date[i,1].strftime('%Y%m%d')}")
    ax.set_xlabel(r'Relative Measurement ($mm$)')
    ax.set_ylabel('Frequency')

In [None]:
dist_th = np.linspace(0,50,100)
rqmt = 3*(1+np.sqrt(dist_th))
for i in range(n_ifgs):
    fig, ax = plt.subplots(figsize=[18, 5.5])
    ax.plot(dist_th, rqmt, 'r')
    ax.scatter(dist[i], rel_measure[i], s=1, alpha=0.25)
    ax.set_title(f"Comparation between Relative Measurement and Requirement Curve \n Date range {ifgs_date[i,0].strftime('%Y%m%d')}-{ifgs_date[i,1].strftime('%Y%m%d')}")
    ax.set_ylabel(r'Relative Measurement ($mm$)')
    ax.set_xlabel('Distance (km)')
    ax.set_xlim(0,50)

Save data used of approach 2:
- `dist`: distance of pixel pairs,
- `rel_measure`: relative measurement of pixel pairs,
- `ifgs_date`: list of date pairs of two SAR images that form a interferogram.

In [None]:
with open(work_dir/'approach2.pkl','wb') as f:
    pickle.dump((dist,rel_measure,ifgs_date),f)

The validation is implemented in `Transient_approach2.1.ipynb` and `Transient_approach2.2.ipynb`.

`Transient_approach2.1.ipynb` counts the number of pixel pairs which meet the mission requirement as a percentage of the total number of pixel pairs selected.

`Transient_approach2.2.ipynb` models the noise as random variable with a normal distribution so the standard deviation and its lower confidence bound of this distribution is estimated.

## Appendix: GPS Position Plots

In [None]:
gnss_time_series[1]

In [None]:
for i in range(len(gnss_time_series)):
    print(ifgs_date[i,0].strftime('%Y%m%d')+'-'+ifgs_date[i,1].strftime('%Y%m%d'))
    for stn in gnss_time_series[i].columns:
        series = gnss_time_series[i,str(stn)]
        plt.figure(figsize=(15,5))
        plt.title(f"station name: {stn}")
        plt.scatter(pd.date_range(start=ifgs_date[i,0], end=ifgs_date[i,1]),series)
        plt.xlabel('Time')
        plt.ylabel('Displacement (mm)')
        plt.show()