<img src="GEOS_Logo.png" width="700" />

# Step 6: <font color=blue>"correct_steps.ipynb"</font>
#### July 30, 2021  <font color=red>(still working)</font> 
##### Jeonghyeop Kim (jeonghyeop.kim@gmail.com)

> input file(s): **`time_vector.dat`, `steps.txt` ,`list_full.dat` , `list_extra.dat` & `edited_i`** \
> output files: **`edited_i_corrected`**

0. This code is a part of GPS2FNL process 
1. A GNSS timeseries for years usually has a few discontinuous steps related to maintenance of equipments
2. The Nevada Geodetic Lab provides metadata that provide information about these two types of steps. 
> http://geodesy.unr.edu/NGLStationPages/steps.txt (download this file in the begining of the master shell script)
3. This code downloads metadata and will uses to correct listed steps. 
4. Step correction algorithm is proposed by *Johnson et al., 2021 (Earth and Space Science)*
5. This algorithm does NOT correct for coseismic signals

In [1]:
# 1. import python modules
import numpy as np
import pandas as pd
from datetime import datetime

In [16]:
## BUILD a list of stations (907 stations for california)
## > For some reasons, two list files exist (7/30/2021) 
## > change this later after the STEP 1 and STEP 2 codes are ready

list1 = "list_full.dat"
df_list1=pd.read_csv(list1, header=None)

list2 = "list_extra.dat"
df_list2=pd.read_csv(list2, header=None)

frames=[df_list1,df_list2]
df_list=pd.concat(frames,ignore_index=True) #combine two DFs as a DF
N_list = len(df_list) #length of the combine Df?
print(N_list)

907


In [17]:
## Read 'time_vector.dat' 
## > Here 'datenum' will be defined as df_time_vector.index+1.
## > This provides consecutive integers that are equivalent to all of the daily time steps within the analysis.
## > These integers serve as time flags, and they will be used in this code for regressions for functions of time.
## > For instance, 2 = 2006-01-02; 3 = 2006-01-03; 5658 = 2021-06-29; ....

inputfile = 'time_vector.dat'
df_time_vector = pd.read_csv(inputfile,header=None)
df_time_vector.columns=['date']
df_time_vector['datenum'] = df_time_vector.index + 1 #consecutive integers


In [18]:

earliest_time=df_time_vector.loc[0,['date']]
earliest_time=int(earliest_time)
lastest_time=df_time_vector.iloc[-1,0]
lastest_time=int(lastest_time)

print("steps before %i and after %i will be ignored" % (earliest_time,lastest_time))

steps before 20060101 and after 20210630 will be ignored


In [25]:
## READ metadata and separate them into 
## >(1) equipment-related steps : 'df_steps_man_made_interest'
## >(2) coseismic steps : 'df_steps_earthquakes_interest'
## This algorithm only deals with steps within the analysis time defined in 'time_vector.dat'


metadata = "steps.txt" #file name
df_metadata=pd.read_csv(metadata, header=None, names=list('0123456'), sep=r'(?:,|\s+)', \
                        comment='#', engine='python')
    ## steps.txt is in an irregular shape
    ## 'names=list('0123456')' is to fill empty spots with NaN 


df_steps_man_made = df_metadata[df_metadata['2'] == 1]
df_steps_man_made = df_steps_man_made[['0', '1', '2', '3']]
df_steps_man_made.columns=['stID','time','flag','log'] #time is in yyMMMdd format

    ## date format conversion
date_old = df_steps_man_made.time.tolist() # A DataFrame to a list
date_new = pd.to_datetime(date_old, format='%y%b%d').strftime('%Y%m%d') # convert date format
df_steps_man_made.loc[:,'time'] = date_new # replaces with the new date  in YYYYMMDD
df_steps_man_made['time']=df_steps_man_made['time'].astype(int) #str to int


df_steps_man_made_interest=df_steps_man_made.loc[(df_steps_man_made['time'] >= earliest_time) & (df_steps_man_made['time'] <= lastest_time)] 
df_steps_man_made_interest.reset_index(drop=True)



    ## for column names, see the readme file (http://geodesy.unr.edu/NGLStationPages/steps_readme.txt)
df_steps_earthquakes = df_metadata[df_metadata['2'] == 2].reset_index(drop=True)
df_steps_earthquakes.columns=['stID','time','flag','threshold','distance','mag','eventID'] 
    ## time is in yyMMMdd format
    ## date format conversion
date_old2 = df_steps_earthquakes.time.tolist() # A DataFrame to a list
date_new2 = pd.to_datetime(date_old2, format='%y%b%d').strftime('%Y%m%d') # convert date format
df_steps_earthquakes.loc[:,'time'] = date_new2 # replaces with the new date  in YYYYMMDD 
df_steps_earthquakes['time']=df_steps_earthquakes['time'].astype(int) #str to int

df_steps_earthquakes_interest=df_steps_earthquakes.loc[(df_steps_earthquakes['time'] >= earliest_time) & (df_steps_earthquakes['time'] <= lastest_time)] 
df_steps_earthquakes_interest.reset_index(drop=True)



Unnamed: 0,stID,time,flag,threshold,distance,mag,eventID
0,UMLH,20060101,2,89.125,41.449,5.5,usp000e7b6
1,GUAM,20060103,2,125.893,53.181,5.8,usp000e7gy
2,GUUG,20060103,2,125.893,69.244,5.8,usp000e7gy
3,HER2,20060104,2,316.228,152.490,6.6,usp000e7jp
4,SA27,20060104,2,316.228,152.131,6.6,usp000e7jp
...,...,...,...,...,...,...,...
54223,PFRJ,20210502,2,125.893,59.420,5.8,us7000dzfk
54224,PVCA,20210502,2,125.893,14.556,5.8,us7000dzfk
54225,SSIA,20210512,2,141.254,125.035,5.9,us7000e2c6
54226,SSSV,20210512,2,141.254,112.964,5.9,us7000e2c6


In [None]:
## Correcting the steps!

## (a) READ input data 'edited_i'

for i in range(0,2):
    target_data="edited_"+str(i+1)
    df_GPS=pd.read_csv(target_data, header=None, sep=' ')
    df_GPS.columns=['time','lon','lat','e','n','z','se','sn','sz','corr_en','flag']
    time_list=df_GPS.time.to_list()