# **RPNet Tutorial**
## 1. Installation
The various dependencies required to run the **RPNet** can be easily installed using **pip**.<br>
It is recommended to run the program in a separate virtual environment using Anaconda with python version 3.9.<br>

**Note**: If you want to use a GPU, you must install **CUDA** libaray. The RPNet was developed using CUDA version 11.1.74.

**In terminal:**<br>
conda create -n rpnet python=3.9<br>
conda activate rpnet<br>
pip install rpnet<br>

RPNet supports multiprocessing-based preprocessing using the **parmap** module.<br>


Before using RPNet, download the pre-trained model from the link below and place it in the "./model" directory.<br>
[click here for pre-trained models](https://drive.google.com/drive/folders/1VlhPiLEx6XKBkmLdkc9RJ6fFTcSD0-0B?usp=sharing)

---
## 2. Example (polarity prediction)

Before running the RPNet, <span style="color:red"> you need to configure various variables and options in the `hyperparams.py` file.</span><br>
For detailed explanations of each variable, please refer to the comments within the file.

### 2.1 load modules

In [13]:
"""
# RPNet (v.0.0.1)
https://github.com/jongwon-han/RPNet

RPNet: Robust P-wave first-motion polarity determination using deep learning (Han et al., 2025; SRL)
doi: https://doi.org/10.1785/0220240384

Example script to run the sample Hi-net dataset

- Jongwon Han (@KIGAM)
- jwhan@kigam.re.kr
- Last update: 2025. 2. 24.
"""

#########################################################################################################

import h5py
import pandas as pd
import numpy as np
import tensorflow as tf
import parmap
from keras_self_attention import SeqSelfAttention
import matplotlib.pyplot as plt
import tqdm
import matplotlib.ticker as ticker
from mpl_toolkits.axes_grid1 import make_axes_locatable
import os
import subprocess
import shutil
from obspy import Stream, Trace
from obspy import UTCDateTime
from sklearn.model_selection import train_test_split
import plotly.figure_factory as ff
import matplotlib
import fnmatch
import time
from rpnet import *
from hyperparams import *

#########################################################################################################

---
### 2.2 Preparation for the prediction

In [14]:
# set gpu number
os.environ['CUDA_VISIBLE_DEVICES'] = str(gpu_num)

stime=time.time()

# make output directory / if exist remove it
if os.path.exists(out_dir):
    shutil.rmtree(out_dir)
os.makedirs(out_dir)

# load raw data (catalog, phase, station files)
cat_df=pd.read_csv(event_catalog)
pha_df=pd.read_csv(phase_metadata)
sta_df=pd.read_csv(sta_metadata).sort_values(['sta']).reset_index(drop=True)
sta_df['sta0']=sta_df['sta'] # for reset station code of Hi-net dataset

#### Earthquake catalog

In [15]:
cat_df

Unnamed: 0,event_id,data_id,jst,lat,lon,dep,mag,tmag,mag2,tmag2,strike,dip,slip,dist
0,J2016042016013028,D20160420001172,2016-04-20T16:01:30.280000Z,32.837667,130.799333,15.23,4.0,D,4.1,W,267,35,-91,9.34
1,J2016041507465204,D20160415000600,2016-04-15T07:46:52.040000Z,32.73,130.797,10.52,4.4,D,4.4,V,20,69,172,8.54


#### Phase information

In [16]:
pha_df

Unnamed: 0,event_id,data_id,sta,ptime,stime,pol
0,J2016041507465204,D20160415000600,N.TYNH,2016-04-15T07:46:54.740000Z,2016-04-15T07:46:56.650000Z,U
1,J2016041507465204,D20160415000600,KU.KMP,2016-04-15T07:46:55.820000Z,2016-04-15T07:46:58.470000Z,U
2,J2016041507465204,D20160415000600,KUIZU3,2016-04-15T07:46:56.080000Z,2016-04-15T07:46:59.080000Z,U
3,J2016041507465204,D20160415000600,KUIZU3,2016-04-15T07:46:56.080000Z,,K
4,J2016041507465204,D20160415000600,N.YABH,2016-04-15T07:46:56.410000Z,2016-04-15T07:46:59.620000Z,U
...,...,...,...,...,...,...
124,J2016042016013028,D20160420001172,G.SIBI,2016-04-20T16:01:48.060000Z,2016-04-20T16:02:01.300000Z,U
125,J2016042016013028,D20160420001172,TAKAZA,2016-04-20T16:01:48.690000Z,2016-04-20T16:02:01.250000Z,U
126,J2016042016013028,D20160420001172,TAKAZA,2016-04-20T16:01:48.690000Z,,K
127,J2016042016013028,D20160420001172,IKI,2016-04-20T16:01:53.980000Z,,K


#### Station locations

In [17]:
sta_df

Unnamed: 0,net,sta,chan,lat,lon,elv,sta0
0,,AGUNI,U,26.5927,127.2403,12.0,AGUNI
1,,AIDA,U,34.9435,134.1653,170.0,AIDA
2,,AIOI,U,33.7957,134.4488,165.0,AIOI
3,,AKAIKE,U,33.7153,130.7928,130.0,AKAIKE
4,,AKKESH,U,42.9987,144.6925,20.0,AKKESH
...,...,...,...,...,...,...,...
257,,YONAGK,U,24.4511,122.9452,15.0,YONAGK
258,,YONAGU,U,24.4672,123.0113,32.0,YONAGU
259,,YORONJ,U,27.0246,128.4504,26.0,YORONJ
260,,YTOYOT,U,34.2658,131.0622,120.0,YTOYOT


---
Use "add_sta" option when you want to utilize data from stations without picking information by estimating the P arrival time using TauP.

In [18]:
# add empty pick stations
if add_sta:
    z_files=sorted(glob.glob(wf_dir+'/*/*Z'))
    def get_add(z):
        id=z.split('/')[-2]
        sta=z.split('/')[-1].split('.')[0]
        if len(pha_df[(pha_df[fwfid]==id)&(pha_df['sta']==sta)])!=0:
            return pd.DataFrame()
        if not id in cat_df[fwfid].to_list():
            return pd.DataFrame()
        return pd.DataFrame({'pick':[id],'sta':[sta],'time':[np.nan],'pha':['P']})
    print('# get list of additional stations')
    results=parmap.map(get_add,z_files, pm_pbar=True, pm_processes=cores,pm_chunksize=1)
    pha_df=pd.concat([pha_df]+results)

In [19]:
# Add station metadata to phase df
print('\n# Arrange metadata')
pha_df=pha_df[pha_df['sta'].isin(sta_df['sta'].to_list())].reset_index(drop=True)
pha_df['lat']=[sta_df[sta_df.sta==i]['lat'].iloc[0] for i in pha_df['sta'].to_list()]
pha_df['lon']=[sta_df[sta_df.sta==i]['lon'].iloc[0] for i in pha_df['sta'].to_list()]
pha_df['elv']=[sta_df[sta_df.sta==i]['elv'].iloc[0] for i in pha_df['sta'].to_list()]
# pha_df['net']=[sta_df[sta_df.sta==i]['net'].iloc[0] for i in pha_df['sta'].to_list()]
# pha_df['chan']=[sta_df[sta_df.sta==i]['chan'].iloc[0] for i in pha_df['sta'].to_list()]
sta_df['net']='HI' # Renaming, just for consistency

# make UTCDateTime objects
cat_df[ftime]=[UTCDateTime(i) for i in cat_df[ftime].to_list()]
pha_df[fptime]=[UTCDateTime(i) for i in pha_df[fptime].to_list()]
print('- Done')


# Arrange metadata
- Done


---
Use the 'change2taup' option when you want to estimate the P arrival time using TauP.

In [20]:
# Change to TauP P arrival times (OPTION; considering pick uncertainty)
if change2taup:
    print('\n\n# change to TauP arrival')
    pha_df['ptime0']=pha_df[fptime]
    results=parmap.map(change2taup,[[idx,val,cat_df[cat_df[fwfid]==val[fwfid]].iloc[0],ftime] for idx,val in pha_df.iterrows()]
                       , pm_pbar=True, pm_processes=cores,pm_chunksize=1)
    pha_df[fptime]=results
    print('- Done')

---
The input data (numpy matrix) is generated in 4-second segments from continuous waveform data (SAC or MSEED).

In [21]:
# Make input data matrix
print('\n\n# Make input matrix from waveform data')
results=parmap.map(wf2matrix,[[idx,val,fwfid,fptime,wf_dir,out_dir] for idx, val in pha_df.iterrows()], pm_pbar=True, pm_processes=cores,pm_chunksize=1)
results=[i for i in results if i is not None]
a,b=zip(*results)
pha_df=pha_df.iloc[list(a)].reset_index(drop=True)
in_mat=np.vstack(b)
print('% calculation time (min): ','%.2f'%((time.time()-stime)/60))
print('- Done')



# Make input matrix from waveform data


100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 106/106 [00:00<00:00, 1068.47it/s]

% calculation time (min):  0.16
- Done





---
### 2.3 Polarity prediction (main)

In [22]:
# RPNet main prediction
print('\n\n# Predict polarity (RPNet)')
r_df=pred_rpnet(model,in_mat,pha_df,batch_size=batch_size,iteration=iteration,gpu_num=gpu_num,time_shift=0.0,mid_point=250)
print('% calculation time (min): ','%.2f'%((time.time()-stime)/60))
r_df.to_csv(out_dir+'/pol_result.csv',index=False)
print('- Done')



# Predict polarity (RPNet)
# iterate prediction


100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:08<00:00, 11.32it/s]

% calculation time (min):  0.33
- Done





---
### 2.4 From RPNet's polarity result to SKHASH input format

In [23]:
# Renaming station code (only for Hi-net)
sta_df['net']='HI'
sta_df['chan']=sta_df['chan'].replace('U','HHZ')
sta_df['sta']=['S'+str(i+1).rjust(3,'0') for i in range(len(sta_df))]
cat_df=cat_df
pha_df=pha_df[pha_df['data_id'].isin(cat_df['data_id'].to_list())].reset_index(drop=True)
r_df=pd.read_csv(out_dir+'/pol_result.csv')
r_df=r_df[r_df['data_id'].isin(cat_df['data_id'].to_list())].reset_index(drop=True)


# Make SKHASH input setting
if iteration!=0:
    r_df.loc[r_df['std'] > std_threshold, 'predict'] = 'K'
if rm_unknwon:
    r_df=r_df[r_df['predict']!='K'].reset_index(drop=True)
# make threshold for mean
if iteration!=0 and mean_threshold!=0:
    r_df=r_df[r_df['prob']>=mean_thresuld].reset_index(drop=True)

r_df=r_df.drop_duplicates(['sta',fwfid]).reset_index(drop=True)
prep_skhash(cat_df=cat_df,pol_df=r_df,sta_df=sta_df,ftime=ftime,fwfid=fwfid,ctrl0=ctrl0,out_dir=out_dir)
print('% calculation time (min): ','%.2f'%((time.time()-stime)/60))
print('\n\n@ ALL DONE!')

% calculation time (min):  0.37


@ ALL DONE!


---
## 3. Focal Mechanism Calculation (SKHASH)

RPNet does not provide the source code for **SKHASH** directly.<br>
Please refer to the link below to download and install SKHASH.<br>

[click here for SKHASH](https://code.usgs.gov/esc/SKHASH)

However, the RPNet's conda environment is configured to support SKHASH, so you can run the SKHASH.py script within the same RPNet environment without the need for an additional virtual environment setup.

Before running the RPNet, properly configure the **control_file0.txt** for SKHASH under **"ctrl0"** in hyperparams.py. The default file will be copied to the RPNet result directory, where the necessary settings for running SKHASH will be automatically generated.

**Run in terminal:**<br>
python SKHASH.py ./output01/hash2/control_file.txt<br>

Please check "./output01/hash2/OUT" directory after running SKHASH for the focal mechanism result.

The MSEED files in the output directory contain the trimmed 4-second waveform segments that were actually used by RPNet for polarity prediction.
