# **RPNet Tutorial**
The various dependencies required to run the **RPNet** can be easily installed using **pip**.<br>
It is recommended to run the program in a separate virtual environment using Anaconda with python version 3.9.<br>

**Note**: If you want to use a GPU, you must install **CUDA** libaray. The RPNet was developed using CUDA version 11.1.74.

**In terminal:**<br>
conda create -n rpnet python=3.9<br>
conda activate rpnet<br>
pip install rpnet<br>

RPNet supports multiprocessing-based preprocessing using the **parmap** module.<br>


Before using RPNet, download the pre-trained model from the link below and place it in the "../model" directory.<br>
[click here for pre-trained models](https://drive.google.com/drive/folders/1VlhPiLEx6XKBkmLdkc9RJ6fFTcSD0-0B?usp=sharing)

---
## 2. Example2 (polarity prediction & S/P amplitude ratio)

Example 2 presents data from the 2024 Buan earthquake (South Korea), with magnitudes of 4.8 and 3.1.<br>
The example waveform data are obtained from permanent seismic stations of the Korea Meteorological Administration (KMA).<br>
Original event waveform can be downloaded from the NECIS (https://necis.kma.go.kr/).<br><br>
Before running the RPNet, <span style="color:red"> you need to configure various variables and options in the `hyperparams.py` file.</span><br>
For detailed explanations of each variable, please refer to the comments within the file.

### 2.1 load modules

In [13]:
"""
# RPNet (v.0.1.0)
https://github.com/jongwon-han/RPNet

RPNet: Robust P-wave first-motion polarity determination using deep learning (Han et al., 2025; SRL)
doi: https://doi.org/10.1785/0220240384

Example2 script to run the sample dataset (hash driver3; S/P ratio)
2024 M4.8 and M3.1 Buan Earthquake dataset (South Korea)
Original event waveform can be downloaded from the NECIS (https://necis.kma.go.kr/)

- Jongwon Han (@KIGAM)
- jwhan@kigam.re.kr
- Last update: 2025. 5. 15.
"""

#########################################################################################################

import h5py
import pandas as pd
import numpy as np
import tensorflow as tf
import parmap
from keras_self_attention import SeqSelfAttention
import matplotlib.pyplot as plt
import tqdm
import matplotlib.ticker as ticker
from mpl_toolkits.axes_grid1 import make_axes_locatable
import os
import subprocess
import shutil
from obspy import Stream, Trace
from obspy import UTCDateTime
from sklearn.model_selection import train_test_split
import plotly.figure_factory as ff
import matplotlib
import fnmatch
import time
from rpnet import *
from hyperparams import *
import glob
#########################################################################################################

---
### 2.2 Preparation for the prediction

In [14]:
# set gpu number
os.environ['CUDA_VISIBLE_DEVICES'] = gpu_num

stime=time.time()

# make output directory / if exist remove it
if os.path.exists(out_dir):
    shutil.rmtree(out_dir)
os.makedirs(out_dir)

# load raw data (catalog, phase, station files)
cat_df=pd.read_csv(event_catalog)
pha_df=pd.read_csv(phase_metadata)
sta_df=pd.read_csv(sta_metadata).sort_values(['sta']).reset_index(drop=True)
sta_df['sta0']=sta_df['sta']
pha_df['source']='original'

#### Earthquake catalog

In [15]:
cat_df

Unnamed: 0,utc,lat,lon,dep,mag,pick
0,2024-06-11T23:26:49.830000Z,35.6989,126.7222,10.2,4.8,2024T000000013
1,2024-06-12T04:55:43.040000Z,35.7006,126.7248,9.23,3.1,2024T000000056


#### Phase information

In [16]:
pha_df

Unnamed: 0,pick,net,sta,chan,ptime,stime,source
0,2024T000000013,KS,GIJA,HG,2024-06-11T23:26:53.595446Z,2024-06-11T23:26:56.205087Z,original
1,2024T000000013,KS,JEU2,EL,2024-06-11T23:26:55.221369Z,,original
2,2024T000000013,KS,MND,HH,,2024-06-11T23:26:58.938927Z,original
3,2024T000000013,KS,MSMB,HH,2024-06-11T23:26:52.932921Z,2024-06-11T23:26:55.043948Z,original
4,2024T000000013,KS,SENA,HG,2024-06-11T23:26:53.338576Z,2024-06-11T23:26:55.790000Z,original
5,2024T000000013,KS,SMWA,HG,2024-06-11T23:26:54.376208Z,2024-06-11T23:26:57.504626Z,original
6,2024T000000056,KS,GIJA,HG,2024-06-12T04:55:46.714401Z,2024-06-12T04:55:49.246721Z,original
7,2024T000000056,KS,JEU2,EL,2024-06-12T04:55:48.395369Z,,original
8,2024T000000056,KS,MND,HH,,2024-06-12T04:55:52.139442Z,original
9,2024T000000056,KS,MSMB,HH,2024-06-12T04:55:46.114629Z,2024-06-12T04:55:48.176712Z,original


#### Station locations

In [17]:
sta_df

Unnamed: 0,net,sta,start,end,lat,lon,elv,digitizer,vel,acc,chan,sta0
0,KS,ADOA,2015-09-30,,36.572600,128.700900,162.0,q330hrs,,ES-DH-A,HG,ADOA
1,KS,AGSA,2018-03-09,,37.091700,127.808000,114.0,q330sp,,ES-DH-A,HG,AGSA
2,KS,AMD,2015-09-30,,35.343600,126.030000,79.0,q330hrs,STS-2.5-A,ES-T-A,HH,AMD
3,KS,ANDB,2022-11-29,,33.256580,126.328750,84.0,Centaur,Trilium120PH-A,Titan-PH-A,HH,ANDB
4,KS,ANHA,2022-11-29,,37.464696,128.155024,540.0,Centaur,,Titan-PH-A,HG,ANHA
...,...,...,...,...,...,...,...,...,...,...,...,...
285,KS,YOW2,2016-05-31,,37.181200,128.456900,287.0,q330hrs,CMG-40T-1-B,ES-T-A,EL,YOW2
286,KS,YPDB,2015-09-30,,37.608000,125.710000,97.0,q330hrs,CMG-3TB-A,ES-DH-A,HH,YPDB
287,KS,YSAB,2016-05-31,,36.742000,126.815600,85.0,q330hrs,STS-2.5-A,ES-DH-A,HH,YSAB
288,KS,YSDA,2019-03-18,,33.986600,126.920500,57.0,q330sp,,ES-DH-A,HG,YSDA


---
Use "add_sta" option when you want to utilize data from stations without picking information by estimating the P arrival time using TauP.

In [18]:
# add empty pick stations
if add_sta:
    z_files=sorted(glob.glob(wf_dir+'/*/*Z'))
    def get_add(z):
        id=z.split('/')[-2]
        sta=z.split('/')[-1].split('.')[0]
        if len(pha_df[(pha_df[fwfid]==id)&(pha_df['sta']==sta)])!=0:
            return pd.DataFrame()
        if not id in cat_df[fwfid].to_list():
            return pd.DataFrame()
        return pd.DataFrame({'pick':[id],'sta':[sta],fptime:[np.nan],fstime:[np.nan]})
    print('# get list of additional stations')
    results=parmap.map(get_add,z_files, pm_pbar=True, pm_processes=cores,pm_chunksize=1)
    pha_df0=pd.concat(results)
    pha_df0['source']='add'
    pha_df=pd.concat([pha_df,pha_df0]).reset_index(drop=True)

sta_df=sta_df[sta_df['sta'].isin(pha_df['sta'].to_list())].reset_index(drop=True)

# get list of additional stations


100%|██████████| 579/579 [00:00<00:00, 14193.71it/s]


In [19]:
# Add station metadata to phase df
print('\n# Arrange metadata')
pha_df=pha_df[pha_df['sta'].isin(sta_df['sta'].to_list())].reset_index(drop=True)
pha_df['lat']=[sta_df[sta_df.sta==i]['lat'].iloc[0] for i in pha_df['sta'].to_list()]
pha_df['lon']=[sta_df[sta_df.sta==i]['lon'].iloc[0] for i in pha_df['sta'].to_list()]
pha_df['elv']=[sta_df[sta_df.sta==i]['elv'].iloc[0] for i in pha_df['sta'].to_list()]
pha_df['net']=[sta_df[sta_df.sta==i]['net'].iloc[0] for i in pha_df['sta'].to_list()]
pha_df['chan']=[sta_df[sta_df.sta==i]['chan'].iloc[0] for i in pha_df['sta'].to_list()]
pha_df=pha_df.drop_duplicates(['pick','net','sta','chan',fptime,fstime])
pha_df=pha_df[pha_df['pick'].isin(cat_df.drop_duplicates(['pick'])['pick'].to_list())]

# make UTCDateTime objects
cat_df[ftime]=[UTCDateTime(i) for i in cat_df[ftime].to_list()]



# Arrange metadata


---
Use the 'change2taup' option when you want to estimate the P arrival time using TauP.

In [20]:
# Change to TauP P arrival times (OPTION; considering pick uncertainty)
if change2taup:
    print('\n\n# change to TauP arrival')
    pha_df['ptime0']=pha_df[fptime]
    results=parmap.map(est_taup,[[idx,val,cat_df[cat_df[fwfid]==val[fwfid]].iloc[0],ftime,'P',taup_model,keep_initial_phase] for idx,val in pha_df.iterrows()]
                    , pm_pbar=True, pm_processes=cores,pm_chunksize=1)
    pha_df[fptime]=results
    print('- TauP (P) Done')

    print('# change to TauP S arrival')
    pha_df['stime0']=pha_df[fstime]
    results=parmap.map(est_taup,[[idx,val,cat_df[cat_df[fwfid]==val[fwfid]].iloc[0],ftime,'S',taup_model,keep_initial_phase] for idx,val in pha_df.iterrows()]
                    , pm_pbar=True, pm_processes=cores,pm_chunksize=1)
    pha_df[fstime]=results
    print('- TauP (S) Done')

pha_df=pha_df.sort_values(['pick','net','sta']).reset_index(drop=True)
pha_df



# change to TauP arrival


100%|██████████| 579/579 [00:04<00:00, 144.51it/s]

- TauP (P) Done
# change to TauP S arrival



100%|██████████| 579/579 [00:04<00:00, 143.54it/s]

- TauP (S) Done





Unnamed: 0,pick,net,sta,chan,ptime,stime,source,lat,lon,elv,ptime0,stime0
0,2024T000000013,KS,ADOA,HG,2024-06-11T23:27:21.173953Z,2024-06-11T23:27:45.127986Z,add,36.572600,128.700900,162.0,,
1,2024T000000013,KS,AGSA,HG,2024-06-11T23:27:18.740363Z,2024-06-11T23:27:40.750944Z,add,37.091700,127.808000,114.0,,
2,2024T000000013,KS,AMD,HH,2024-06-11T23:27:02.709708Z,2024-06-11T23:27:12.063004Z,add,35.343600,126.030000,79.0,,
3,2024T000000013,KS,ANDB,HH,2024-06-11T23:27:30.010491Z,2024-06-11T23:28:01.020741Z,add,33.256580,126.328750,84.0,,
4,2024T000000013,KS,ANHA,HG,2024-06-11T23:27:25.111029Z,2024-06-11T23:27:52.209054Z,add,37.464696,128.155024,540.0,,
...,...,...,...,...,...,...,...,...,...,...,...,...
574,2024T000000056,KS,YOW2,EL,2024-06-12T04:56:17.414092Z,2024-06-12T04:56:43.770085Z,add,37.181200,128.456900,287.0,,
575,2024T000000056,KS,YPDB,HH,2024-06-12T04:56:17.974470Z,2024-06-12T04:56:44.777950Z,add,37.608000,125.710000,97.0,,
576,2024T000000056,KS,YSAB,HH,2024-06-12T04:56:03.103001Z,2024-06-12T04:56:17.672569Z,add,36.742000,126.815600,85.0,,
577,2024T000000056,KS,YSDA,HG,2024-06-12T04:56:13.128773Z,2024-06-12T04:56:36.062615Z,add,33.986600,126.920500,57.0,,


---
The input data (numpy matrix) is generated in 4-second segments from continuous waveform data (SAC or MSEED).

In [21]:
# Make input data matrix
print('\n\n# Make input matrix from waveform data')
results=parmap.map(wf2matrix,[[idx,val,fwfid,fptime,wf_dir,out_dir] for idx, val in pha_df.iterrows()], pm_pbar=True, pm_processes=cores,pm_chunksize=1)
results=[i for i in results if i is not None]
a,b=zip(*results)
pha_df=pha_df.iloc[list(a)].reset_index(drop=True)
in_mat=np.vstack(b)
print('% calculation time (min): ','%.2f'%((time.time()-stime)/60))
print('- Done')



# Make input matrix from waveform data


100%|██████████| 579/579 [00:00<00:00, 1136.46it/s]

% calculation time (min):  0.16
- Done





---
### 2.3 Polarity prediction (main)

In [22]:
# RPNet main prediction
print('\n\n# Predict polarity (RPNet)')
r_df=pred_rpnet(model,in_mat,pha_df,batch_size=batch_size,iteration=iteration,gpu_num=gpu_num,time_shift=0.0,mid_point=250)
print('% calculation time (min): ','%.2f'%((time.time()-stime)/60))
r_df.to_csv(out_dir+'/pol_result.csv',index=False)
print('- Done')



# Predict polarity (RPNet)
# iterate prediction


100%|██████████| 100/100 [00:26<00:00,  3.71it/s]

% calculation time (min):  0.61
- Done





---
### 2.4 Estimation of S/P amplitude ratio

In [23]:
# let's make amplitude file for hash3
r_df['sta0'] = r_df['sta']
if hash_version=='hash3':
    print('# Prep for amplitude ratio')
    picks=r_df.drop_duplicates([fwfid]).sort_values([fwfid])[fwfid].to_list()
    # amps=parmap.map(prepare_amplitudes,[[r_df[r_df[fwfid]==p].reset_index(drop=True),p,p,wf_dir] for p in picks], pm_pbar=True, pm_processes=cores,pm_chunksize=1)
    amps=parmap.map(prepare_amplitudes,[[r_df[r_df[fwfid]==p].reset_index(drop=True),p,p,wf_dir,sp_freq,sp_win] for p in picks]
                    , pm_pbar=True, pm_processes=cores,pm_chunksize=1)
    amp=sum(amps,[])
else:
    amp=None

# Prep for amplitude ratio


100%|██████████| 2/2 [00:05<00:00,  2.94s/it]


---
### 2.5 From RPNet's polarity result and S/P ratio to SKHASH input format

In [24]:
# Make SKHASH input setting
if iteration!=0:
    r_df.loc[r_df['std'] > std_threshold, 'predict'] = 'K'
# make threshold for mean
if iteration!=0 and mean_threshold!=0:
    r_df.loc[r_df['prob'] < mean_threshold, 'predict'] = 'K'
    # r_df=r_df[r_df['prob']>=mean_thresuld].reset_index(drop=True)
if rm_unknwon:
    r_df=r_df[r_df['predict']!='K'].reset_index(drop=True)

print('\n\n# Final result:')
print(r_df)

r_df=r_df.drop_duplicates(['sta',fwfid]).reset_index(drop=True)
prep_skhash(cat_df=cat_df,pol_df=r_df,amp=amp,sta_df=sta_df,ftime=ftime,fwfid=fwfid,ctrl0=ctrl0,out_dir=out_dir,hash_version=hash_version)
print('% calculation time (min): ','%.2f'%((time.time()-stime)/60))
print('\n\n@ ALL DONE!')



# Final result:
               pick net   sta chan                        ptime  \
0    2024T000000013  KS  ADOA   HG  2024-06-11T23:27:21.173953Z   
1    2024T000000013  KS   AMD   HH  2024-06-11T23:27:02.709708Z   
2    2024T000000013  KS  ANHA   HG  2024-06-11T23:27:25.111029Z   
3    2024T000000013  KS  ASNA   HG  2024-06-11T23:27:11.819722Z   
4    2024T000000013  KS  BAR2   HH  2024-06-11T23:27:34.466795Z   
..              ...  ..   ...  ...                          ...   
310  2024T000000056  KS  YKDB   HH  2024-06-12T04:56:12.323451Z   
311  2024T000000056  KS  YNDB   HH  2024-06-12T04:56:10.690130Z   
312  2024T000000056  KS   YOA   EL  2024-06-12T04:56:01.025831Z   
313  2024T000000056  KS  YSAB   HH  2024-06-12T04:56:03.103001Z   
314  2024T000000056  KS  YUGA   HG  2024-06-12T04:55:59.573268Z   

                           stime source        lat         lon    elv ptime0  \
0    2024-06-11T23:27:45.127986Z    add  36.572600  128.700900  162.0    NaN   
1    2024-06-11T2

---
## 3. Focal Mechanism Calculation (SKHASH)

RPNet does not provide the source code for **SKHASH** directly.<br>
Please refer to the link below to download and install SKHASH.<br>

[click here for SKHASH](https://code.usgs.gov/esc/SKHASH)

However, the RPNet's conda environment is configured to support SKHASH, so you can run the SKHASH.py script within the same RPNet environment without the need for an additional virtual environment setup.

Before running the RPNet, properly configure the **control_file0.txt** for SKHASH under **"ctrl0"** in hyperparams.py. The default file will be copied to the RPNet result directory, where the necessary settings for running SKHASH will be automatically generated.

**Run in terminal:**<br>
python SKHASH.py ./output01/hash3/control_file.txt<br>

Please check "./output01/hash3/OUT" directory after running SKHASH for the focal mechanism result.

The MSEED files in the output directory contain the trimmed 4-second waveform segments that were actually used by RPNet for polarity prediction.
