<img src="GEOS_Logo.png" width="700" />

# Step 6: <font color=blue>"correct_steps.ipynb"</font>
#### July 31, 2021  <font color=red>(still working)</font> 
##### Jeonghyeop Kim (jeonghyeop.kim@gmail.com)

> input file(s)  : **`time_vector.dat`, `steps.txt` ,`list_full.dat` , `list_extra.dat` & `edited_i`** \
> output file(s) : **`edited_i_corrected`** \
> module(s) used : **`ismember.py`**

0. This code is a part of GPS2FNL process 
1. A GNSS timeseries for years usually has a few discontinuous steps related to maintenance of equipments
2. The Nevada Geodetic Lab provides metadata that provide information about these two types of steps. 
> http://geodesy.unr.edu/NGLStationPages/steps.txt (download this file in the begining of the master shell script)
3. This code downloads metadata and will uses to correct listed steps. 
4. Step correction algorithm is proposed by *Johnson et al., 2021 (Earth and Space Science)*
5. This algorithm does NOT correct for coseismic signals

In [1]:
# 1. import python modules
import numpy as np
import pandas as pd
from datetime import datetime
from ismember import ismember

## 1. Build a list of all stations (e.g., 907 stations for california)
> For some reasons, two list files exist (7/30/2021) 
> change this later after the STEP 1 and STEP 2 codes are ready

In [31]:
list1 = "list_full.dat"
df_list1=pd.read_csv(list1, header=None)

list2 = "list_extra.dat"
df_list2=pd.read_csv(list2, header=None)

frames=[df_list1,df_list2]
df_list=pd.concat(frames,ignore_index=True) #combine two DFs as a DF

N_list = len(df_list) #length of the combine Df?
print("the total number of stations for the analysis is %i" % N_list)
df_list.columns=['stID']

the total number of stations for the analysis is 907


## 2. Read 'time_vector.dat' & Define datenum
> Here **`datenum`** will be defined as **df_time_vector.index+1**. \
> This provides consecutive integers that are equivalent to all of the daily time steps within the analysis. \
> These integers serve as time flags, and they will be used in this code for regressions for functions of time. \
> For instance, 2 = 2006-01-02; 3 = 2006-01-03; 5658 = 2021-06-29; ....

In [42]:
inputfile = 'time_vector.dat'
df_full_time_vector = pd.read_csv(inputfile,header=None)
df_full_time_vector.columns=['date']
df_full_time_vector['datenum'] = df_full_time_vector.index + 1 #consecutive integers

earliest_time=df_full_time_vector.loc[0,['date']]
earliest_time=int(earliest_time)
lastest_time=df_full_time_vector.iloc[-1,0]
lastest_time=int(lastest_time)
print("steps before %i and after %i will be ignored" % (earliest_time,lastest_time))


full_date_list=df_full_time_vector['date'].tolist() # a list
full_date_df=df_full_time_vector['date'] # a df
full_datenum_list=df_full_time_vector['datenum'].tolist() # a list
full_datenum_df=df_full_time_vector['datenum'] # a df

steps before 20060101 and after 20210630 will be ignored


## 3. READ metadata and separate them into 
>(1) equipment-related steps : `df_steps_man_made_interest` \
>(2) coseismic steps : `df_steps_earthquakes_interest` \
>This algorithm only deals with **steps within the analysis time** defined in `time_vector.dat`

**`This code also finds datenum of steps!`**

In [43]:
metadata = "steps.txt" #file name
df_metadata=pd.read_csv(metadata, header=None, names=list('0123456'), sep=r'(?:,|\s+)', \
                        comment='#', engine='python')
## steps.txt is in an irregular shape
## 'names=list('0123456')' is to fill empty spots with NaN 

df_steps_man_made = df_metadata[df_metadata['2'] == 1]
df_steps_man_made = df_steps_man_made[['0', '1', '2', '3']]
df_steps_man_made.columns=['stID','time','flag','log'] #time is in yyMMMdd format

## date format conversion
date_old = df_steps_man_made.time.tolist() # A DataFrame to a list
date_new = pd.to_datetime(date_old, format='%y%b%d').strftime('%Y%m%d') # convert date format
df_steps_man_made.loc[:,'time'] = date_new # replaces with the new date  in YYYYMMDD
df_steps_man_made['time']=df_steps_man_made['time'].astype(int) #str to int


df_steps_man_made_interest=df_steps_man_made.loc[(df_steps_man_made['time'] >= earliest_time) & (df_steps_man_made['time'] <= lastest_time)] 
df_steps_man_made_interest=df_steps_man_made_interest.reset_index(drop=True)

## ADD datenum to df_steps_man_made_interest

man_time_list=df_steps_man_made_interest.time.tolist()# To list 
man_time_index=ismember(man_time_list,full_date_list) # Find time index 
man_new_time_vector = df_full_time_vector.iloc[man_time_index] # Find values corresponding to the time index
man_new_time_vector = man_new_time_vector.reset_index(drop=True) # Reset index
    
df_steps_man_made_interest['datenum']=man_new_time_vector['datenum'] # add equivalent datenum 
    ## (datenum will be used to match with steps and will be used for inversions)
df_steps_man_made_interest=df_steps_man_made_interest[['stID','time','datenum','flag','log']]
    ## change column orders    
    
#----------------------------------------------------------------------------------------------#   

################################################################################################     
#######################           *v1.0.0.*       ##############################################
################################################################################################ 
###############  This algorithm does NOT correct co-seismic steps.  ############################ 
###############  But one can modify the code to correct such steps. ############################ 
###############  Now the coseismic-step data will be just saved as  ############################ 
###############  in a DataFrame 'df_steps_earthquakes_interest'.    ############################ 
###############                                                     ############################ 
###############  You can make a step list made of both equipment-   ############################ 
###############  related and earthquakes, sort ascending in time,   ############################ 
###############  and then correct in the order of time later.       ############################ 
###############  Save step flag {1=man-made; 2=earthquake} together ############################ 
###############  because you may need two different ways to correct ############################ 
###############  steps depending on their types!                    ############################ 
################################################################################################ 
#######################         J.K. (yy-mm-dd)       ##########################################
################################################################################################ 

## for column names, see the readme file (http://geodesy.unr.edu/NGLStationPages/steps_readme.txt)
df_steps_earthquakes = df_metadata[df_metadata['2'] == 2].reset_index(drop=True)
df_steps_earthquakes.columns=['stID','time','flag','threshold','distance','mag','eventID'] 
## time is in yyMMMdd format
## date format conversion
date_old2 = df_steps_earthquakes.time.tolist() # A DataFrame to a list
date_new2 = pd.to_datetime(date_old2, format='%y%b%d').strftime('%Y%m%d') # convert date format
df_steps_earthquakes.loc[:,'time'] = date_new2 # replaces with the new date  in YYYYMMDD 
df_steps_earthquakes['time']=df_steps_earthquakes['time'].astype(int) #str to int

df_steps_earthquakes_interest=df_steps_earthquakes.loc[(df_steps_earthquakes['time'] >= earliest_time) & (df_steps_earthquakes['time'] <= lastest_time)] 
df_steps_earthquakes_interest=df_steps_earthquakes_interest.reset_index(drop=True)

## ADD datenum to df_steps_earthquakes_interest
EQ_time_list=df_steps_earthquakes_interest.time.tolist()# To list 
EQ_time_index=ismember(EQ_time_list,full_date_list) # Find time index 
EQ_new_time_vector = df_full_time_vector.iloc[EQ_time_index] # Find values corresponding to the time index
EQ_new_time_vector = EQ_new_time_vector.reset_index(drop=True) # Reset index
    
df_steps_earthquakes_interest['datenum']=EQ_new_time_vector['datenum'] # add equivalent datenum 
    ## (datenum will be used to match with steps and will be used for inversions)
df_steps_earthquakes_interest=df_steps_earthquakes_interest[['stID','time','datenum','flag','threshold','distance','mag','eventID']]
    ## change column orders



## 4. Correct steps! 
> (a) Read input data **`edited_i`** \
> (b) Find and add **`datenum`** for the time-axis of the input data \
> (c) Check if the target station has unwanted steps. 
>> if no, continue the for loop \
>> if yes, keep going 

> (d) Save datenum for all steps \
> (e) For loop j in range(steps) \
> (f) Corrections! \
> (g) PLOT or NOT?

In [320]:
## Correcting the steps!

for i in range(21,22): #range(N_list) later
    
## (a) Read input data 'edited_i'
    target_data="edited_"+str(i+1)
    df_GPS=pd.read_csv(target_data, header=None, sep=' ')
    df_GPS.columns=['time','lon','lat','e','n','z','se','sn','sz','corr_en','flag']
    
    station=df_list.loc[i,['stID']].to_string(index=False)
    SearchSt=station[1:5] # a space in the first byte of the string
    #SearchSt is the target station for corrections
    
## (b) Add datenum for the input data 'edited_i'    
    time_list=df_GPS.time.to_list() #to list    
    time_index=ismember(time_list,full_date_list) #find time index 
    
    #Check if everything is okay
    if len(time_index)-len(df_GPS) != 0:
        print("something is wrong")    
    new_time_vector =df_full_time_vector.iloc[time_index]
    
    new_time_vector=new_time_vector.reset_index() # reset index
    
    df_GPS['datenum']=new_time_vector['datenum'] # add equivalent datenum 
    #(datenum will be used to match with steps and will be used for inversions)
    df_GPS=df_GPS[['time','datenum','lon','lat','e','n','z','se','sn','sz','corr_en','flag']]
    #change column orders
    
    
    # Unit [m] to [mm]
    df_GPS.e = df_GPS.e*1000
    df_GPS.n = df_GPS.n*1000
    df_GPS.z = df_GPS.z*1000
    df_GPS.se = df_GPS.se*1000
    df_GPS.sn = df_GPS.sn*1000
    df_GPS.sz = df_GPS.sz*1000
    
    
    
## (c) Check is the target station with unwatned step(s) 
## >> if no, continue the for loop
## >> if yes, keep going 

    itemindex = np.where(df_steps_man_made_interest['stID']==SearchSt) # similar to find() in MATLAB
    HowManySteps=itemindex[0].size  # the number of steps
    
    if HowManySteps==0:        
        continue  
    else: 
        event_idx = itemindex[0]
## (d) Save datenum for all steps 

        all_datenum = df_steps_man_made_interest.datenum
        event_datenum = all_datenum.iloc[event_idx]
        event_datenum = pd.unique(event_datenum) 
        # In a day, more than a job can be done.
        # In this case, the steps.txt saves all the jobs in multiple logs.
        # But the date in the logs are the same. 
        # Here, the code gets rid of all the overlaps
        
## >> delete position estimates for the day of steps, if there exist.        
        
        data_datenum = df_GPS.datenum
        
        IDX_event_data_overlap=ismember(event_datenum,data_datenum) # Find if we have position estimates
        cleanedList = [x for x in IDX_event_data_overlap if x == x] # get rid of float('NaN') from the list
        if len(cleanedList)!=0:
            df_GPS.loc[cleanedList]=np.nan
            print("why this makes other values ****.0 ???")
        
        
        
        N_events = len(event_datenum) #without the overlaps
        
## (e) For loop j in range(N_events)
        for j in range(N_events):
            step_standards=event_datenum[j]





## (f) Obtain before_step and after_step (14 days for each)
## > 4 cases. 
## >> (  i) len(before_step) <  10 and len(after_step) >= 10
## >> ( ii) len(before_step) >= 10 and len(after_step) <  10
## >> (iii) len(before_step) >= 10 and len(after_step) >= 10
## >> ( iv) len(before_step) <  10 and len(after_step) <  10


## (g) Plot the time-series and check if the correction works properly. 
## > Make a plot on and off flag 
## >> If this code tries to plot 100 time series, it would be very slow to do so ** 


why this makes other values ****.0 ???


In [322]:
#df_GPS



In [323]:
print("why this makes other values ****.0 ???")

why this makes other values ****.0 ???
