## Tutorial 7: Building an earthquake catalogue using GaMMA

In this demonstration we will show how to automatically build an earthquake catalogue using SeisBench and the GaMMA associator with OBS data to create an earthquake catalog from raw waveforms. We will use example data from the [YH](https://ds.iris.edu/gmap/#network=YH&starttime=2014-01-01&endtime=2015-06-01&planet=earth) network from the HOBITSS deployment in 2014-2015.

First, we import the seisbench and gamma packages, as well as other required packages and modules to import data and make plots.

In [None]:
# SeisBench models
import seisbench.models as sbm

# GaMMA associator
from gamma.utils import association

# ObsPy classes
from obspy.clients.fdsn import Client
from obspy import UTCDateTime, Stream

# Other ancillary packages
from pyproj import CRS, Transformer
import pandas as pd
import numpy as np
from collections import Counter
from tqdm import tqdm
import matplotlib.pyplot as plt

### Configuration

The following code cells contain all configurations. The first cell configures the local coordinate projection. In this case, we use a transverse mercator projection for New Zealand, as we will be using data from northern New Zealand. 

In [None]:
# Projections - Hard coded for YH and North Island grid
wgs84 = CRS.from_epsg(4326)
local_crs = CRS.from_epsg(27291)  # NZGD49 / North Island Grid: https://epsg.io/27291#google_vignette
transformer = Transformer.from_crs(wgs84, local_crs)

The second, third and fourth cells configure the gamma associator. Please see it's documentation for details.

In [None]:
# Gamma
config = {}
config["dims"] = ['x(km)', 'y(km)', 'z(km)']

# Use dbscan to cut a long sequence of picks into segments to speed up associaiton using small windows.
config["use_dbscan"] = True
config["use_amplitude"] = False

# Extent of box to investigate - events occurring outside this box will localize to the edges
# Check with extent of stations to adjust these numbers
config["x(km)"] = (100, 600)
config["y(km)"] = (250, 500)
config["z(km)"] = (0, 150)

# Average crustal P- and S-wave velocities
config["vel"] = {"p": 5.0, "s": 5.0 / 1.75}  
config["method"] = "BGMM"
if config["method"] == "BGMM":
    config["oversample_factor"] = 4
if config["method"] == "GMM":
    config["oversample_factor"] = 1

In [None]:
# DBSCAN
config["bfgs_bounds"] = (
    (config["x(km)"][0] - 1, config["x(km)"][1] + 1),  # x
    (config["y(km)"][0] - 1, config["y(km)"][1] + 1),  # y
    (0, config["z(km)"][1] + 1),  # x
    (None, None),  # t
)

# The maximum time between two picks for one to be considered as a neighbor of the other. See details in DBSCAN
config["dbscan_eps"] = 25  # seconds

# The number of samples in a neighborhood for a point to be considered as a core point. See details in DBSCAN
config["dbscan_min_samples"] = 3

In [None]:
# Filtering
# Minimum picks per earthquake
config["min_picks_per_eq"] = 5

# Max phase time residual (s)
config["max_sigma11"] = 2.0

# Max phase amplitude residual (in log scale)
config["max_sigma22"] = 1.0

# Max covariance term. (Usually not used)
config["max_sigma12"] = 1.0

### Download data

We download waveform data for 60 minutes from the YH network offshore northern New Zealand (HOBITSS deployment). We use October 2, 2014, which corresponds to a magnitude [4.6 earthquake](https://www.geonet.org.nz/earthquake/technical/2014p742742) in the vicinity of the HOBITSS deployment. Therefore, we expect to detect low-magnitude aftershocks in this region.

In [None]:
# Initialize the FDSN client
client = Client()

# Start time in UTC at the mainshock
t0 = UTCDateTime("2014-10-02T19:33:00")

# End time, 60 minutes later
t1 = t0 + 60 * 60

# Get waveform data
stream = client.get_waveforms(network="YH", station="*", location="*", channel="?H?", starttime=t0, endtime=t1)

# Get station metadata
inv = client.get_stations(network="YH", station="*", location="*", channel="?H?", starttime=t0, endtime=t1)

### Annotate waveforms

For this example, we use OBSTransformer model. However, in principle any picker could be used for obtaining the picks with only minimal changes. 

> Warning: This will take some time and requires sufficient main memory. If you are short on memory, reduce the study time in the cell before.

In [None]:
# Load the pre-trained OBSTransformer model weights
#picker = sbm.OBSTransformer.from_pretrained('obst2024')
picker = sbm.PickBlue()

# We tuned the thresholds a bit - Feel free to play around with these values
picks = picker.classify(stream, batch_size=256, P_threshold=0.075, S_threshold=0.1).picks

# Output number of P and S picks
Counter([p.phase for p in picks])

We now convert the picks into pandas dataframes in the format required for the GaMMA associator.

In [None]:
# Initialize empty list and fill it with pick info
pick_df = []
for p in picks:
    pick_df.append({
        "id": p.trace_id,
        "timestamp": p.peak_time.datetime,
        "prob": p.peak_value,
        "type": p.phase.lower()
    })

# Store it into a DataFrame
pick_df = pd.DataFrame(pick_df)

Let's have a look at the picks generated by the model. Note that we retained the probabilities from the deep learning model. It will be used by the associator later on.

In [None]:
pick_df.sort_values("timestamp")

Convert station metadata into pandas dataframe in the format required for the GaMMA associator

In [None]:
# Initialize empty list and fill it with station info
station_df = []
for station in inv[0]:
    station_df.append({
        "id": f"YH.{station.code}.",
        "longitude": station.longitude,
        "latitude": station.latitude,
        "elevation(m)": station.elevation
    })

# Store it as a DataFrame
station_df = pd.DataFrame(station_df)

# Convert map coordinates to km
station_df["x(km)"] = station_df.apply(lambda x: transformer.transform(x["latitude"], x["longitude"])[0] / 1e3, axis=1)
station_df["y(km)"] = station_df.apply(lambda x: transformer.transform(x["latitude"], x["longitude"])[1] / 1e3, axis=1)
station_df["z(km)"] = - station_df["elevation(m)"] / 1e3

# Create station dictionary for easier handling of later plots
station_dict = {station: (x, y) for station, x, y in zip(station_df["id"], station_df["x(km)"], station_df["y(km)"])}

Inspect the station dataframe

In [None]:
station_df

### GaMMA association

We now run the phase association. This will take a moment. We convert the output into two dataframes, one for the catalogue and one for the assignment of picks to the catalog.

In [None]:
# Run the associator
catalogs, assignments = association(pick_df, station_df, config, method=config["method"])

# Store the catalog and assignments into a dataframe
catalog = pd.DataFrame(catalogs)
assignments = pd.DataFrame(assignments, columns=["pick_idx", "event_idx", "prob_gamma"])

Make a plot of catalog and stations

In [None]:
# Initialize figure
fig = plt.figure(figsize=(10, 10))
ax = fig.add_subplot(111)
ax.set_aspect("equal")

# Show catalog events in km coordinates
cb = ax.scatter(catalog["x(km)"], catalog["y(km)"], c=catalog["z(km)"], s=8, cmap="viridis")

# Add color bar for focal depth
cbar = fig.colorbar(cb)
cbar.ax.set_ylim(cbar.ax.get_ylim()[::-1])
cbar.set_label("Depth[km]")

# Add stations
ax.plot(station_df["x(km)"], station_df["y(km)"], "r^", ms=10, mew=1, mec="k")
ax.set_xlabel("Easting [km]")
ax.set_ylabel("Northing [km]")

plt.show()

Make a plot of traces and picks for a random earthquake in the catalogue

In [None]:
# Choose random event in catalog
# event_idx = np.random.choice(catalog["event_index"])
# event_idx = catalog["event_index"][5]
for event_idx in catalog["event_index"]:

    # Extract picks and event info for this event
    event_picks = [picks[i] for i in assignments[assignments["event_idx"] == event_idx]["pick_idx"]]
    event = catalog[catalog["event_index"] == event_idx].iloc[0]
    
    # Extract window to include min and max pick times according to all picks
    first, last = min(pick.peak_time for pick in event_picks), max(pick.peak_time for pick in event_picks)
    
    # Initialize empty stream
    sub = Stream()
    
    # Add streams for stations that contain a pick - here we extract the "Z" component
    for station in np.unique([pick.trace_id for pick in event_picks]):
        sub.append(stream.select(station=station[3:-1], channel="?HZ")[0])
    
    # Extend window by +/- 5 seconds
    sub = sub.slice(first - 5, last + 5)
    
    # Pre-process the data: detrend and high-pass filter at 1 Hz
    sub = sub.copy()
    sub.detrend()
    sub.filter("highpass", freq=2)
    
    # Initialize figure
    fig = plt.figure(figsize=(15, 10))
    ax = fig.add_subplot(111)
    
    # Plot the seismograms, ordered by total distance (hypocentral) from earthquake
    for i, trace in enumerate(sub):
        # Remove mean
        normed = trace.data - np.mean(trace.data)
        # Divide by max of absolute value
        normed = normed / np.max(np.abs(normed))
        # Extract x and y position of station
        station_x, station_y = station_dict[trace.id[:-4]]
        # Calculate euclidian hypocentral distance
        y = np.sqrt((station_x - event["x(km)"]) ** 2 + (station_y - event["y(km)"]) ** 2 + event["z(km)"] ** 2)
        # Plot trace at corresponding distance
        ax.plot(trace.times(), 3 * normed + y)
    
    # Plot the picks as vertical bars lined up on each seismogram
    for pick in event_picks:
        # Extract x and y position of station
        station_x, station_y = station_dict[pick.trace_id]
        # # Calculate euclidian hypocentral distance
        y = np.sqrt((station_x - event["x(km)"]) ** 2 + (station_y - event["y(km)"]) ** 2 + event["z(km)"] ** 2)
        # Calculate arrival time wrt window time
        x = pick.peak_time - trace.stats.starttime
        if pick.phase == "P":
            ls = '-'
        else:
            ls = '--'
        # Plot as vertical bars
        ax.plot([x, x], [y - 3, y + 3], 'k', ls=ls)
        
    # ax.set_ylim(0)
    ax.set_xlim(0, np.max(trace.times()))
    ax.set_ylabel("Hypocentral distance [km]")
    ax.set_xlabel("Time [s]")
    
    # Print out event information
    print("Event information")
    print(event)