<a href="https://colab.research.google.com/github/maihao14/EQTransformer/blob/colab/EQTransformerviaGoogleColab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# EQTransformer via Google Colab
**Author:** [Hao Mai](https://github.com/maihao14)<br>
**Date created:** 2021/11/14<br>
**Last modified:** 2021/11/14<br>

## Description

EQTransformer is an AI-based earthquake signal detector and phase (P&S) picker based on a deep neural network with an attention mechanism. It has a hierarchical architecture specifically designed for earthquake signals. EQTransformer has been trained on global seismic data and can perform detection and arrival time picking simultaneously and efficiently. In addition to the prediction probabilities, it can also provide estimated model uncertainties.

The EQTransformer python 3 package includes modules for downloading continuous seismic data, preprocessing, performing earthquake signal detection, and phase (P & S) picking using pre-trained models, building and testing new models, and performing a simple phase association.

**Developer:** [S. Mostafa Mousavi](https://github.com/smousavi05/EQTransformer#Contributing) <br>

**Reference:** 

Mousavi, S.M., Ellsworth, W.L., Zhu, W., Chuang, L, Y., and Beroza, G, C. Earthquake transformer—an attentive deep-learning model for simultaneous earthquake detection and phase picking. Nat Commun 11, 3952 (2020). https://doi.org/10.1038/s41467-020-17591-w

## Installation from Source
The sources for EQTransformer can be downloaded from the [Github repo](https://github.com/smousavi05/EQTransformer).

### Prerequisite package: ObsPy
When ObsPy installed, restart the runtime.

Menu -> Runtime -> Restart Runtime

In [1]:
!pip install obspy



### Clone the public repository:



In [1]:
! git clone https://github.com/smousavi05/EQTransformer

Cloning into 'EQTransformer'...
remote: Enumerating objects: 2044, done.[K
remote: Counting objects: 100% (276/276), done.[K
remote: Compressing objects: 100% (200/200), done.[K
remote: Total 2044 (delta 149), reused 152 (delta 75), pack-reused 1768[K
Receiving objects: 100% (2044/2044), 51.30 MiB | 28.20 MiB/s, done.
Resolving deltas: 100% (1106/1106), done.


### Once you have a copy of the source, you can cd to EQTransformer directory 

In [2]:
%cd EQTransformer

/Users/mmtf/p/research/plan/plan-private/EQTransformer/EQTransformer


  self.shell.db['dhist'] = compress_dhist(dhist)[-100:]


### Rewrite `setup.py`  :
```
'numpy==1.20.3' -> 'numpy==1.19.2'
```

### Install

Need Restart Runtime again when this cell running is done. 

Menu -> Runtime -> Restart Runtime

In [4]:
!pip install -e .

Obtaining file:///content/EQTransformer
Collecting numpy==1.19.2
  Downloading numpy-1.19.2-cp37-cp37m-manylinux2010_x86_64.whl (14.5 MB)
[K     |████████████████████████████████| 14.5 MB 34 kB/s 
[?25hCollecting keyring>=15.1
  Downloading keyring-23.2.1-py3-none-any.whl (33 kB)
Collecting pkginfo>=1.4.2
  Downloading pkginfo-1.7.1-py2.py3-none-any.whl (25 kB)
Collecting tensorflow==2.5.1
  Downloading tensorflow-2.5.1-cp37-cp37m-manylinux2010_x86_64.whl (454.4 MB)
[K     |████████████████████████████████| 454.4 MB 9.8 kB/s 
[?25hCollecting keras==2.3.1
  Downloading Keras-2.3.1-py2.py3-none-any.whl (377 kB)
[K     |████████████████████████████████| 377 kB 59.6 MB/s 
Collecting tqdm==4.48.0
  Downloading tqdm-4.48.0-py2.py3-none-any.whl (67 kB)
[K     |████████████████████████████████| 67 kB 8.0 MB/s 
Collecting obspy
  Downloading obspy-1.2.2.zip (24.7 MB)
[K     |████████████████████████████████| 24.7 MB 1.3 MB/s 
[?25h  Installing build dependencies ... [?25l[?25hdone
  G

## Downloading Continuous Data
The following will download the information on the stations that are available based on your search criteria:



In [8]:
import os
json_basepath = os.path.join(os.getcwd(),"json/station_list.json")

from EQTransformer.utils.downloader import makeStationList

makeStationList(json_path=json_basepath, client_list=["SCEDC"], min_lat=35.50, max_lat=35.60, min_lon=-117.80, max_lon=-117.40, start_time="2019-09-01 00:00:00.00", end_time="2019-09-03 00:00:00.00", channel_list=["HH[ZNE]", "HH[Z21]", "BH[ZNE]"], filter_network=["SY"], filter_station=[])

GS--CA06
GS--CA10
PB--B921
ZY--SV08


The above function will generate station_list.json file containing the station information. Next, you can use this file and download 1 day of data for the available stations at Ridgecrest, California from Southern California Earthquake Data Center or IRIS using the following:

In [9]:
from EQTransformer.utils.downloader import downloadMseeds
downloadMseeds(client_list=["SCEDC", "IRIS"], stations_json=json_basepath, output_dir="downloads_mseeds", min_lat=35.50, max_lat=35.60, min_lon=-117.80, max_lon=-117.40, start_time="2019-09-01 00:00:00.00", end_time="2019-09-03 00:00:00.00", chunk_size=1, channel_list=[], n_processor=2)

[2024-12-01 19:04:19,290] - obspy.clients.fdsn.mass_downloader - INFO: Initializing FDSN client(s) for SCEDC, IRIS.


####### There are 4 stations in the list. #######


[2024-12-01 19:04:20,363] - obspy.clients.fdsn.mass_downloader - INFO: Successfully initialized 2 client(s): SCEDC, IRIS.
[2024-12-01 19:04:20,367] - obspy.clients.fdsn.mass_downloader - INFO: Total acquired or preexisting stations: 0
[2024-12-01 19:04:20,368] - obspy.clients.fdsn.mass_downloader - INFO: Total acquired or preexisting stations: 0
[2024-12-01 19:04:20,368] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Requesting unreliable availability.
[2024-12-01 19:04:20,369] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Requesting unreliable availability.





[2024-12-01 19:06:20,572] - obspy.clients.fdsn.mass_downloader - ERROR: Client 'SCEDC' - Failed getting availability: Timed Out
[2024-12-01 19:06:20,580] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - No data available.
[2024-12-01 19:06:20,574] - obspy.clients.fdsn.mass_downloader - ERROR: Client 'SCEDC' - Failed getting availability: Timed Out
[2024-12-01 19:06:20,582] - obspy.clients.fdsn.mass_downloader - INFO: Total acquired or preexisting stations: 0
[2024-12-01 19:06:20,583] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - No data available.
[2024-12-01 19:06:20,585] - obspy.clients.fdsn.mass_downloader - INFO: Client 'IRIS' - Requesting reliable availability.
[2024-12-01 19:06:20,585] - obspy.clients.fdsn.mass_downloader - INFO: Total acquired or preexisting stations: 0
[2024-12-01 19:06:20,588] - obspy.clients.fdsn.mass_downloader - INFO: Client 'IRIS' - Requesting reliable availability.
[2024-12-01 19:06:21,455] - obspy.clients.fdsn.mass_downloader

** done with --> CA10 -- GS -- 2019-09-01


[2024-12-01 19:06:48,465] - obspy.clients.fdsn.mass_downloader - INFO: Total acquired or preexisting stations: 0
[2024-12-01 19:06:48,467] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Requesting unreliable availability.
[2024-12-01 19:06:49,009] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Successfully requested availability (0.54 seconds)
[2024-12-01 19:06:49,010] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Found 0 stations (0 channels).
[2024-12-01 19:06:49,010] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - No data available.
[2024-12-01 19:06:49,011] - obspy.clients.fdsn.mass_downloader - INFO: Total acquired or preexisting stations: 0
[2024-12-01 19:06:49,011] - obspy.clients.fdsn.mass_downloader - INFO: Client 'IRIS' - Requesting reliable availability.
[2024-12-01 19:06:49,436] - obspy.clients.fdsn.mass_downloader - INFO: Client 'IRIS' - Successfully requested availability (0.42 seconds)
[2024-12-01 19:06:49,437] 

** done with --> CA10 -- GS -- 2019-09-02


[2024-12-01 19:07:18,446] - obspy.clients.fdsn.mass_downloader - INFO: Total acquired or preexisting stations: 0
[2024-12-01 19:07:18,446] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Requesting unreliable availability.




[2024-12-01 19:07:19,047] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Successfully requested availability (0.60 seconds)
[2024-12-01 19:07:19,064] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Found 1 stations (3 channels).
[2024-12-01 19:07:19,065] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Will attempt to download data from 1 stations.
[2024-12-01 19:07:19,066] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Status for 3 time intervals/channels before downloading: NEEDS_DOWNLOADING
[2024-12-01 19:07:52,302] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Successfully downloaded 1 channels (of 1)
[2024-12-01 19:08:02,637] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Successfully downloaded 1 channels (of 1)
[2024-12-01 19:08:05,126] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Successfully downloaded 1 channels (of 1)
[2024-12-01 19:08:05,126] - obspy.clients.fdsn.mass_do

** done with --> B921 -- PB -- 2019-09-01


[2024-12-01 19:08:38,488] - obspy.clients.fdsn.mass_downloader - INFO: Total acquired or preexisting stations: 0
[2024-12-01 19:08:38,488] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Requesting unreliable availability.
[2024-12-01 19:08:39,437] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Successfully requested availability (0.95 seconds)
[2024-12-01 19:08:39,455] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Found 1 stations (3 channels).
[2024-12-01 19:08:39,455] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Will attempt to download data from 1 stations.
[2024-12-01 19:08:39,456] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Status for 3 time intervals/channels before downloading: NEEDS_DOWNLOADING
[2024-12-01 19:08:51,107] - obspy.clients.fdsn.mass_downloader - INFO: Client 'IRIS' - Successfully downloaded 1 channels (of 1)
[2024-12-01 19:08:52,466] - obspy.clients.fdsn.mass_downloader - INFO: Client 

** done with --> CA06 -- GS -- 2019-09-01


[2024-12-01 19:09:05,637] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Successfully downloaded 1 channels (of 1)
[2024-12-01 19:09:06,675] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Successfully downloaded 1 channels (of 1)
[2024-12-01 19:09:07,949] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Successfully downloaded 1 channels (of 1)
[2024-12-01 19:09:07,950] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Launching basic QC checks...
[2024-12-01 19:09:07,960] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Downloaded 27.1 MB [975.42 KB/sec] of data, 0.0 MB of which were discarded afterwards.
[2024-12-01 19:09:07,960] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Status for 3 time intervals/channels after downloading: DOWNLOADED
[2024-12-01 19:09:07,962] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - No station information to download.
[2024-12-01 19:09:07,962] - obspy.clien

** done with --> B921 -- PB -- 2019-09-02


[2024-12-01 19:09:23,330] - obspy.clients.fdsn.mass_downloader - INFO: Total acquired or preexisting stations: 0
[2024-12-01 19:09:23,331] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Requesting unreliable availability.
[2024-12-01 19:09:34,627] - obspy.clients.fdsn.mass_downloader - INFO: Total acquired or preexisting stations: 0
[2024-12-01 19:09:34,627] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Requesting unreliable availability.




[2024-12-01 19:09:35,133] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Successfully requested availability (0.51 seconds)
[2024-12-01 19:09:35,134] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Found 1 stations (3 channels).
[2024-12-01 19:09:35,134] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Will attempt to download data from 1 stations.
[2024-12-01 19:09:35,135] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Status for 3 time intervals/channels before downloading: NEEDS_DOWNLOADING
[2024-12-01 19:09:58,465] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Successfully downloaded 1 channels (of 1)
[2024-12-01 19:10:01,029] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Successfully downloaded 1 channels (of 1)
[2024-12-01 19:10:02,822] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Successfully downloaded 1 channels (of 1)
[2024-12-01 19:10:02,822] - obspy.clients.fdsn.mass_do

** done with --> SV08 -- ZY -- 2019-09-01


[2024-12-01 19:10:35,623] - obspy.clients.fdsn.mass_downloader - INFO: Total acquired or preexisting stations: 0
[2024-12-01 19:10:35,624] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Requesting unreliable availability.
[2024-12-01 19:11:23,528] - obspy.clients.fdsn.mass_downloader - ERROR: Client 'SCEDC' - Failed getting availability: Timed Out
[2024-12-01 19:11:23,531] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - No data available.
[2024-12-01 19:11:23,531] - obspy.clients.fdsn.mass_downloader - INFO: Total acquired or preexisting stations: 0
[2024-12-01 19:11:23,532] - obspy.clients.fdsn.mass_downloader - INFO: Client 'IRIS' - Requesting reliable availability.
[2024-12-01 19:11:24,364] - obspy.clients.fdsn.mass_downloader - INFO: Client 'IRIS' - Successfully requested availability (0.83 seconds)
[2024-12-01 19:11:24,366] - obspy.clients.fdsn.mass_downloader - INFO: Client 'IRIS' - Found 1 stations (3 channels).
[2024-12-01 19:11:24,366] - obspy.clie

** done with --> SV08 -- ZY -- 2019-09-02


KeyboardInterrupt: 

# Detection and Picking
To perform detection & picking you need a pre-trained model of EQTransformer which you can get from folder: `EQTransformer/ModelsAndSampleData/`.


EQTransformer provides two different option for performing the detection & picking on the continuous data:
## Option (I) using pre-processed data (hdf5 files):
This option is recommended for smaller periods (a few days to a month). This allows you to test the performance and explore the effects of different parameters while the provided hdf5 file makes it easy to access the waveforms.

For this option, you first need to convert your MiniSeed files for each station into 1-min long Numpy arrays in a single hdf5 file and generated a CSV file containing the list of traces in the hdf5 file. You can do this using the following command:

In [6]:
from EQTransformer.utils.hdf5_maker import preprocessor

preprocessor(preproc_dir="preproc", mseed_dir='downloads_mseeds', stations_json=json_basepath, overlap=0.3, n_processor=2)

NameError: name 'json_basepath' is not defined

In [1]:
%load_ext autoreload
%autoreload 2
import numpy as np
import warnings
np.warnings = warnings
from EQTransformer.core.predictor import predictor

predictor(input_dir= 'downloads_mseeds_processed_hdfs', input_model='ModelsAndSampleData/EqT_original_model.h5', output_dir='detections', detection_threshold=0.3, P_threshold=0.1, S_threshold=0.1, number_of_plots=100, plot_mode='time', batch_size=32)

Running EqTransformer  0.1.61
 *** Loading the model ...
Model: "model_1"
__________________________________________________________________________________________________
 Layer (type)                Output Shape                 Param #   Connected to                  
 input (InputLayer)          [(None, 6000, 3)]            0         []                            
                                                                                                  
 conv1d_1 (Conv1D)           (None, 6000, 8)              272       ['input[0][0]']               
                                                                                                  
 max_pooling1d_1 (MaxPoolin  (None, 3000, 8)              0         ['conv1d_1[0][0]']            
 g1D)                                                                                             
                                                                                                  
 conv1d_2 (Conv1D)           (None,



*** Loading is complete!
 *** /Users/mmtf/p/research/plan/plan-private/EQTransformer/detections already exists!
######### There are files for 1 stations in downloads_mseeds_processed_hdfs directory. #########
  0%|                                                                        | 0/63 [00:00<?, ?it/s]batch X.shape: (32, 6000, 3)


Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/mmtf/p/research/plan/plan-private/EQTransformer/.pixi/envs/default/lib/python3.9/multiprocessing/spawn.py", line 116, in spawn_main
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/mmtf/p/research/plan/plan-private/EQTransformer/.pixi/envs/default/lib/python3.9/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "/Users/mmtf/p/research/plan/plan-private/EQTransformer/.pixi/envs/default/lib/python3.9/multiprocessing/spawn.py", line 126, in _main
    exitcode = _main(fd, parent_sentinel)
  File "/Users/mmtf/p/research/plan/plan-private/EQTransformer/.pixi/envs/default/lib/python3.9/multiprocessing/spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
  File "/Users/mmtf/p/research/plan/plan-private/EQTransformer/EQTransformer/__init__.py", line 7, in <module>
    self = reduction.pickle.load(

batch X.shape: (32, 6000, 3)
batch X.shape: (32, 6000, 3)
  3%|██                                                              | 2/63 [00:56<28:43, 28.25s/it]batch X.shape: (32, 6000, 3)
batch X.shape: (32, 6000, 3)
  5%|███                                                             | 3/63 [01:05<20:13, 20.23s/it]batch X.shape: (32, 6000, 3)
batch X.shape: (32, 6000, 3)
  6%|████                                                            | 4/63 [01:15<16:02, 16.31s/it]batch X.shape: (32, 6000, 3)
batch X.shape: (32, 6000, 3)
  8%|█████                                                           | 5/63 [01:24<13:19, 13.78s/it]batch X.shape: (32, 6000, 3)
batch X.shape: (32, 6000, 3)
 10%|██████                                                          | 6/63 [01:34<11:50, 12.47s/it]batch X.shape: (32, 6000, 3)
batch X.shape: (32, 6000, 3)
 11%|███████                                                         | 7/63 [01:43<10:40, 11.45s/it]batch X.shape: (32, 6000, 3)
batch X.shape: (32, 600

This will generate one `station_name.hdf5` and one `station_name.csv` file for each of your station’s data and put them into a directory named mseed_dir+_hdfs. Then you need to pass the name of this directory (which contains all of your hdf5 & CSV files) and a model to the following command:

In [5]:
from EQTransformer.core.mseed_predictor import mseed_predictor

mseed_predictor(input_dir='downloads_mseeds',
                input_model='EQTransformer/ModelsAndSampleData/EqT_model.h5',
                stations_json='json/station_list.json',
                output_dir='detection_results',
                detection_threshold=0.2,
                P_threshold=0.1,
                S_threshold=0.1,
                number_of_plots=10,
                plot_mode='time_frequency',
                batch_size=500,
                overlap=0.3)

ImportError: C extension: None not built. If you want to import pandas from the source directory, you may need to run 'python setup.py build_ext' to build the C extensions first.

## Option (II) directly from mseed files:
You can perform the detection & phase picking directly on downloaded MiniSeed files. This saves both preprocessing time and the extra space needed for the hdf5 file and is recommended for larger (longer) datasets. However, it can be more memory intensive. So it is better to have your MiniSeed fils being shorter than one month or so.

This option also does not allow you to estimate the uncertainties, save the prediction probabilities, or use the advantages of having hdf5 files which makes it easy to access the raw event waveforms based on detection results.

In [5]:
import numpy as np
import warnings
np.warnings = warnings
from EQTransformer.core.mseed_predictor import mseed_predictors

ImportError: cannot import name 'mseed_predictors' from 'EQTransformer.core.mseed_predictor' (/Users/mmtf/p/research/plan/plan-private/EQTransformer/EQTransformer/core/mseed_predictor.py)

In [10]:

mseed_predictor(input_dir='downloads_mseeds',
                input_model='EQTransformer/ModelsAndSampleData/EqT_original_model.h5',
                stations_json='jason/station_list.json',
                output_dir='detection_results',
                detection_threshold=0.2,
                P_threshold=0.1,
                S_threshold=0.1,
                number_of_plots=10,
                plot_mode='time_frequency',
                batch_size=500,
                overlap=0.3)

11-15 01:35 [INFO] [EQTransformer] Running EqTransformer  0.1.61
11-15 01:35 [INFO] [EQTransformer] *** Loading the model ...
11-15 01:35 [INFO] [EQTransformer] *** Loading is complete!
11-15 01:35 [INFO] [EQTransformer] *** /content/detection_results already exists!


 --> Type (Yes or y) to create a new empty directory! This will erase your previous results so make a copy if you want them.y


11-15 01:35 [INFO] [EQTransformer] There are files for 3 stations in downloads_mseeds directory.
11-15 01:35 [INFO] [EQTransformer] Started working on B921, 1 out of 3 ...
11-15 01:35 [INFO] [EQTransformer] 20190901T000000Z__20190902T000000Z.mseed
11-15 01:36 [INFO] [EQTransformer] 20190902T000000Z__20190903T000000Z.mseed






11-15 01:36 [INFO] [EQTransformer] Finished the prediction in: 0 hours and 1 minutes and 32.14 seconds.
11-15 01:36 [INFO] [EQTransformer] *** Detected: 2946 events.
11-15 01:36 [INFO] [EQTransformer]  *** Wrote the results into --> " /content/detection_results/B921_outputs "
11-15 01:36 [INFO] [EQTransformer] Started working on CA06, 2 out of 3 ...
11-15 01:36 [INFO] [EQTransformer] 20190901T000000Z__20190902T000000Z.mseed
11-15 01:37 [INFO] [EQTransformer] 20190902T000000Z__20190903T000000Z.mseed






11-15 01:38 [INFO] [EQTransformer] Finished the prediction in: 0 hours and 1 minutes and 25.18 seconds.
11-15 01:38 [INFO] [EQTransformer] *** Detected: 2862 events.
11-15 01:38 [INFO] [EQTransformer]  *** Wrote the results into --> " /content/detection_results/CA06_outputs "
11-15 01:38 [INFO] [EQTransformer] Started working on SV08, 3 out of 3 ...
11-15 01:38 [INFO] [EQTransformer] 20190901T000000Z__20190902T000000Z.mseed
11-15 01:39 [INFO] [EQTransformer] 20190902T000000Z__20190903T000000Z.mseed






11-15 01:39 [INFO] [EQTransformer] Finished the prediction in: 0 hours and 1 minutes and 23.81 seconds.
11-15 01:39 [INFO] [EQTransformer] *** Detected: 1672 events.
11-15 01:39 [INFO] [EQTransformer]  *** Wrote the results into --> " /content/detection_results/SV08_outputs "


<Figure size 720x720 with 0 Axes>

In [16]:
import h5py

f = h5py.File('downloads_mseeds_processed_hdfs/B921.hdf5')
list(f.keys())

['data']

In [22]:
(f["data/B921_PB_EH_2019-09-01T00:00:00.008300Z"].shape)

(6000, 3)

In [4]:
import h5py
import pandas as pd

# Check HDF5 file structure
def inspect_hdf5(input_dir, station_name):
    hdf5_path = f"{input_dir}/{station_name}.hdf5"
    csv_path = f"{input_dir}/{station_name}.csv"

    print("=== HDF5 File Structure ===")
    with h5py.File(hdf5_path, 'r') as f:
        # Print structure
        def print_structure(name, obj):
            print(f"{'  ' * name.count('/')}{name}: {type(obj).__name__}")
            if isinstance(obj, h5py.Dataset):
                print(f"{'  ' * (name.count('/') + 1)}Shape: {obj.shape}")
                print(f"{'  ' * (name.count('/') + 1)}Dtype: {obj.dtype}")
                if len(obj.attrs) > 0:
                    print(f"{'  ' * (name.count('/') + 1)}Attributes: {dict(obj.attrs)}")

        f.visititems(print_structure)

        # Check first dataset
        first_dataset = list(f['data'].keys())[0]
        sample_data = f[f'data/{first_dataset}'][:]
        print("\n=== Sample Data Statistics ===")
        print(f"Min value: {sample_data.min()}")
        print(f"Max value: {sample_data.max()}")
        print(f"Mean value: {sample_data.mean()}")
        print(f"Shape: {sample_data.shape}")

    # Check CSV file
    print("\n=== CSV File Structure ===")
    df = pd.read_csv(csv_path)
    print("\nColumns:", df.columns.tolist())
    print("\nFirst few rows:")
    print(df.head())
    print("\nData types:")
    print(df.dtypes)

    return df, sample_data

# Usage:
df, sample_data = inspect_hdf5('downloads_mseeds_processed_hdfs', 'ADVT')

=== HDF5 File Structure ===
data: Group
  data/ADVT_KO_HH_2023-02-06T00:00:00.000000: Dataset
    Shape: (6000, 3)
    Dtype: float32
    Attributes: {'network_code': 'KO', 'receiver_code': 'ADVT', 'receiver_elevation_m': 193.0, 'receiver_latitude': 40.4332, 'receiver_longitude': 29.7383, 'trace_name': 'ADVT_KO_HH_2023-02-06T00:00:00.000000', 'trace_start_time': '2023-02-06 00:00:00.000000'}
  data/ADVT_KO_HH_2023-02-06T00:00:42.000000: Dataset
    Shape: (6000, 3)
    Dtype: float32
    Attributes: {'network_code': 'KO', 'receiver_code': 'ADVT', 'receiver_elevation_m': 193.0, 'receiver_latitude': 40.4332, 'receiver_longitude': 29.7383, 'trace_name': 'ADVT_KO_HH_2023-02-06T00:00:42.000000', 'trace_start_time': '2023-02-06 00:00:42.000000'}
  data/ADVT_KO_HH_2023-02-06T00:01:24.000000: Dataset
    Shape: (6000, 3)
    Dtype: float32
    Attributes: {'network_code': 'KO', 'receiver_code': 'ADVT', 'receiver_elevation_m': 193.0, 'receiver_latitude': 40.4332, 'receiver_longitude': 29.7383, 

In [6]:
def validate_dataset(df, sample_data):
    print("=== Dataset Validation ===")

    # 1. Check for missing values
    print("\n1. Missing values in CSV:")
    print(df.isnull().sum())

    # 2. Check data dimensions
    print("\n2. Data dimensions:")
    print(f"Number of samples in CSV: {len(df)}")
    print(f"Sample data shape: {sample_data.shape}")

    # 3. Check value ranges
    print("\n3. Value ranges in sample data:")
    for i in range(sample_data.shape[1]):
        print(f"Channel {i}:")
        print(f"  Min: {sample_data[:, i].min()}")
        print(f"  Max: {sample_data[:, i].max()}")
        print(f"  Mean: {sample_data[:, i].mean()}")
        print(f"  Std: {sample_data[:, i].std()}")

    # 4. Check trace categories
    print("\n4. Trace categories:")
    if 'trace_category' in df.columns:
        print(df['trace_category'].value_counts())

    # 5. Check for data consistency
    print("\n5. Data consistency:")
    with h5py.File('downloads_mseeds_processed_hdfs/ADVT.hdf5', 'r') as f:
        # Check if all CSV entries have corresponding HDF5 data
        missing_data = []
        for trace_name in df['trace_name']:
            if f'data/{trace_name}' not in f:
                missing_data.append(trace_name)

        if missing_data:
            print(f"Warning: {len(missing_data)} traces in CSV missing from HDF5")
            print("First few missing:", missing_data[:5])
        else:
            print("All CSV entries have corresponding HDF5 data")

# Usage:
validate_dataset(df, sample_data)

=== Dataset Validation ===

1. Missing values in CSV:
trace_name    0
start_time    0
dtype: int64

2. Data dimensions:
Number of samples in CSV: 2015
Sample data shape: (6000, 3)

3. Value ranges in sample data:
Channel 0:
  Min: -10025.0
  Max: -4501.0
  Mean: -7089.3837890625
  Std: 1142.6761474609375
Channel 1:
  Min: -8247.0
  Max: -2645.0
  Mean: -5240.078125
  Std: 1160.1837158203125
Channel 2:
  Min: -10381.0
  Max: -5932.0
  Mean: -8101.54736328125
  Std: 626.5115356445312

4. Trace categories:

5. Data consistency:
All CSV entries have corresponding HDF5 data


In [9]:
import matplotlib.pyplot as plt

def plot_samples(hdf5_path, csv_path, num_samples=3):
    """
    Plot a few sample waveforms to visually inspect the data
    """
    df = pd.read_csv(csv_path)
    with h5py.File(hdf5_path, 'r') as f:
        for i in range(min(num_samples, len(df))):
            trace_name = df['trace_name'].iloc[i]
            data = f[f'data/{trace_name}'][:]

            fig, axes = plt.subplots(3, 1, figsize=(15, 10))
            fig.suptitle(f"Sample {i+1}: {trace_name}")

            for j in range(3):
                axes[j].plot(data[:, j])
                axes[j].set_ylabel(f'Channel {j}')

                # If it's an earthquake trace, plot P and S picks
                if 'p_arrival_sample' in df.columns:
                    p_sample = df['p_arrival_sample'].iloc[i]
                    s_sample = df['s_arrival_sample'].iloc[i]
                    if not pd.isna(p_sample):
                        axes[j].axvline(x=p_sample, color='r', linestyle='--', label='P arrival')
                    if not pd.isna(s_sample):
                        axes[j].axvline(x=s_sample, color='g', linestyle='--', label='S arrival')

                if j == 0:
                    axes[j].legend()

            plt.tight_layout()
            plt.show()

# Usage:
plot_samples('downloads_mseeds_processed_hdfs/ADVT.hdf5',
            'downloads_mseeds_processed_hdfs/ADVT.csv')

