# Phase picker neural network

It's time to up our earthquake detection game and not only do binary classification, but add phase picking capabilities. Following the approach of the [PhaseNet](https://academic.oup.com/gji/article/216/1/261/5129142) model from 2018, improved on by Andreas K and Erik last year with the [TPhaseNet](https://academic.oup.com/gji/article/239/2/862/7740467) model, we'll be a bit clever on how we encode the target labels. 

But first, download some data. Again we use a selection of events from the STEAD dataset, but we're not too interested in pure-noise waveforms this time, so we use training and testing files containing only events with identified P and S arrival times.

In [None]:
! wget https://storage.googleapis.com/norsar-ml-ws/events_phases_Zonly_TRAIN.h5
! wget https://storage.googleapis.com/norsar-ml-ws/events_phases_Zonly_TEST.h5

Import the usual suspects.

In [None]:
import numpy as np
import h5py
import scipy.signal
import tensorflow as tf
from tensorflow import keras
import matplotlib.pyplot as plt

## Create target labels

As shown on today's slides, a good way of framing the phase picking problem is to make the labels be a time series too, which at any given time shows whether the seismogram contains the beginning of a P arrival, the beginning of an S arrival, or no arrival at all -- the latter we'll refer to as noise. 

Ideally, our target timeseries for a recorded earthquake will then contain almost only 'noise' labels, exept for the exact microsecond of the P and S arrivals. This is a little strict, so we can allow ourselves to instead insert the arrival labels as distributions, centred around the labelled arrival time. In a couple of cells below, we'll show graphically how this works. First, let's just write the relevant code:

In [None]:
# This thing is a *generator* -- everytime it's called, it returns a new event.
class Hdf5DataGenerator:

    def __call__(self, filename, batchsize, normalise=True):

        if isinstance(filename, bytes):
            filename = filename.decode()    # Because of technical reasons

        # Open file, get datasets
        with h5py.File(filename, "r") as fin:

            waveforms = fin.get('waveforms')
            event_types = fin.get('type')
            p_start = fin.get('p_start')
            s_start = fin.get('s_start')

            waveform_length = waveforms[0].shape[0]
            istart = 0
            istop = batchsize
            exhausted = False

            # This is where we create the distribution around the pick
            pick_width = 100   # equals 1 sec
            half_pick_width = pick_width // 2
            pick = scipy.signal.windows.gaussian(pick_width, 12)

            while not exhausted:

                # Load a batch (= group) of data
                data = waveforms[istart:istop]
                targets = []

                if normalise:
                    max_vals = np.max(np.abs(data), axis=1, keepdims=True)
                    data /= (max_vals + 1e-8)

                # Create the target class waveforms 
                for i in range(len(data)):

                    p_true = np.zeros(shape=(waveform_length))    # P pick, set to [0.0, 0.0, 0.0, ...]
                    s_true = np.zeros(shape=(waveform_length))    # S pick, set to [0.0, 0.0, 0.0, ...]
                    n_true = np.ones(shape=(waveform_length))     #  Noise, set to [1.0, 1.0, 1.0, ...]

                    p_pos = p_start[istart + i]
                    s_pos = s_start[istart + i]

                    # Ensure there is a valid pick                    
                    if p_pos > half_pick_width and s_pos > half_pick_width:
                    
                        # Insert pick 
                        p_true[p_pos - half_pick_width : p_pos + half_pick_width] = pick
                        s_true[s_pos - half_pick_width : s_pos + half_pick_width] = pick
                        
                        n_true -= p_true
                        n_true -= s_true

                    targets.append(
                        np.dstack([p_true, s_true, n_true])
                    )

                # Return this batch of data (and then continue)
                yield (data, np.vstack(targets))

                istart += batchsize
                istop += batchsize

                # No more events in file.
                if istop > len(waveforms):
                    exhausted = True
                    return 

Let't try out the fancy generator thing:

In [None]:
# Instantiate it
gen = Hdf5DataGenerator()

# Now we can loop over batches
for batch in gen('events_phases_Zonly_TRAIN.h5', 1):

    batch_data = batch[0]
    batch_targets = batch[1]
    
    print('batch_data.shape:', batch_data.shape)
    print('batch_targets.shape:', batch_targets.shape)

    # Plot the first event in the batch:
    data = batch_data[0]
    targets = batch_targets[0]

    
    _, ax = plt.subplots(4, 1, sharex=True, figsize=(12, 8))
    xvals = np.arange(data.shape[0])
    
    ax[0].plot(xvals, data[:, 0])
    ax[1].plot(xvals, targets[:, 0], color='blue')
    ax[2].plot(xvals, targets[:, 1], color='red')
    ax[3].plot(xvals, targets[:, 2], color='green')
    
    ax[0].set_ylabel('Waveform')
    ax[1].set_ylabel('P arrival')
    ax[2].set_ylabel('S arrival')
    ax[3].set_ylabel('Noise')
    

    break    # Ok let's stop it here already 

Have a look at the plot and convince yourself if this makes conceptual sense or not.

## Make a `tf.data.Dataset` (for performance)

Last time we saw that loading all data into memory at once _could_ lead to some problems, and this time around, it surely will. For this reason we'd like to load only one _batch_ at a time, train on it, and then remove it from memory to make room for the next batch.

Here we can also make the realisation that while training on the GPU, the CPU is mostly idle in-between loading batches. So we can in fact use it to construct the label timeseries on the fly -- while one batch is training, we can run the above code simultaneously to prepare the next batch, instead of sequentially first to produce all the label timeseries, and then start training. This is why writing the above code as a _generator_ was a useful thing.

But how do we schedule when to start processing which batch? Luckily TensorFlow can do this part for us, if we put everything in a `tf.data.Dataset`.

In [None]:
# Decide on a batch size
batch_size = 128

# Input file
filename_train = 'events_phases_Zonly_TRAIN.h5'

# Now build the Dataset:
train_dataset = tf.data.Dataset.from_generator(
    Hdf5DataGenerator(),
    output_signature=(
        tf.TensorSpec(shape=(None, 6000, 1), dtype=tf.float32, name='data'),
        tf.TensorSpec(shape=(None, 6000, 3), dtype=tf.float32, name='targets')
    ), 
    args=(filename_train, batch_size)
)

# Tell TensorFlow to prepare batches in parallell during training
train_dataset = train_dataset.prefetch(tf.data.AUTOTUNE)

Cool. To check that it does the job, it's always nice to make a plot, for visual verification.

In [None]:
# We can loop over batches in the dataset
for batch_data, batch_target in train_dataset:

    # Get the first event in the batch
    data = batch_data[0]
    target = batch_target[0]

    # Check the shapes
    print('data.shape:', data.shape)
    print('target.shape:', target.shape)

    # Make a plot
    _, ax = plt.subplots(4, 1, sharex=True)
    xvals = np.arange(data.shape[0])
    
    ax[0].plot(xvals, data[:, 0])
    ax[1].plot(xvals, target[:, 0])
    ax[2].plot(xvals, target[:, 1])
    ax[3].plot(xvals, target[:, 2])

    # Let's end it here
    break

Last thing: Do the same for the test dataset too.

In [None]:
filename_test = 'events_phases_Zonly_TEST.h5'

test_dataset = tf.data.Dataset.from_generator(
    Hdf5DataGenerator(),
    output_signature=(
        tf.TensorSpec(shape=(None, 6000, 1), dtype=tf.float32, name='data'),
        tf.TensorSpec(shape=(None, 6000, 3), dtype=tf.float32, name='targets')
    ), 
    args=(filename_test, batch_size)
)

test_dataset = train_dataset.prefetch(tf.data.AUTOTUNE)

## Construct a model

With data and labels in place, we are ready to construct the first model. We'll give you this nearly for free, but not entirely:

### Exercise: 

Identify how many outputs the model should have.

In [None]:
number_of_outputs = ???

In [None]:
def create_sequential_model(num_outputs):

    model = keras.Sequential([
        keras.layers.InputLayer(shape=(6000, 1)),
        
        # Downsample
        keras.layers.Conv1D(16, 3, padding='same', activation='relu'),
        keras.layers.MaxPooling1D(2),
        keras.layers.Conv1D(32, 3, padding='same', activation='relu'),
        keras.layers.MaxPooling1D(2),
        keras.layers.Conv1D(64, 3, padding='same', activation='relu'),
        keras.layers.MaxPooling1D(2),
        
        # Process
        keras.layers.Conv1D(128, 3, padding='same', activation='relu'),
        keras.layers.Conv1D(64, 3, padding='same', activation='relu'),
        
        # Upsample
        keras.layers.Conv1DTranspose(32, 3, strides=2, padding='same', activation='relu'),
        keras.layers.Conv1DTranspose(16, 3, strides=2, padding='same', activation='relu'),
        keras.layers.Conv1DTranspose(8, 3, strides=2, padding='same', activation='relu'),
        
        # Output
        keras.layers.Conv1D(num_outputs, 1, activation='softmax')
    ])

    return model

Compile and train it:

In [None]:
sequential_model = create_sequential_model(number_of_outputs)

sequential_model.compile(
    optimizer=keras.optimizers.Adam(1e-4), loss="categorical_crossentropy", metrics=['accuracy']
)

sequential_model.summary()

In [None]:
sequential_model.fit(
    train_dataset,
    validation_data=test_dataset,
    epochs=5,
    verbose=1
)

## Examine the model

Let's see visually how well the model can pick incoming phases. Since we want to look at different models, it's convenient to write a function for it:

In [None]:
def plot_test_event(model):
    
    for batch_data, batch_target in train_dataset.take(1):
        
        data = batch_data[1]
        data = np.expand_dims(data, axis=0)
        target = batch_target[1]
        target = np.expand_dims(target, axis=0)
    
        preds = model.predict(data)
    
        _, ax = plt.subplots(4, 1, sharex=True, figsize=(12,8))
        xvals = np.arange(data.shape[1])
    
        ax[0].plot(xvals, data[0, :, 0])
        ax[0].set_ylabel('Data')
        
        ax[1].plot(xvals, preds[0, :, 0], color='blue')
        ax[1].set_ylabel('P similarity')
        ax[1].set_ylim(0, 1)
        
        ax[2].plot(xvals, preds[0, :, 1], color='red')
        ax[2].set_ylabel('S similarity')
        ax[2].set_ylim(0, 1)
        
        ax[3].plot(xvals, preds[0, :, 2], color='green')
        ax[3].set_ylabel('Noise')
        ax[3].set_ylim(0, 1)

        

And call it:

In [None]:
plot_test_event(sequential_model)

For the "real" test, we want to use all the events in the testing dataset, and compute the average performance:

In [None]:
sequential_model.evaluate(test_dataset)

### Exercise

See if adding more layers will considerably improve this model!

## Exercise

Is "accuracy" the correct metric to use for evaluating performance in this case?

## Construct a more complicated model 

It's time to go beyond the limits of the sequential model, and create something with more complicated flow, like this:

![](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/gji/239/2/10.1093_gji_ggae298/1/ggae298fig2.jpeg?Expires=1761155332&Signature=ELkc-i-qYmbYZuw8HU9hdyjj2VIchEEOQpjNLJVpSl8ZS2yhyBxsgIwa5hl2fWWwb6rA2cdk6I9T-DJyAVmoiyzgFacZvtZymKYY-y0D4xdymmD7BT5Z-eZQgcmxPZMdiaxhRPFbqSTbGpv-KeY1bR5a5VYobth-T0nXmKqwtt94LrzZsniGoWKJlsIkGVXW5Z6zW0OmVrsTIHTKUxlnFsPz9fRqw~A4YwdQe0arxYrltPzu5waMvynDfYfv4to6i2496ltey1~c~6QqpIU7U7wNWKNK6jlVpZU5t3XRXYYaw6JAw8XeD~cWQvuN8riTf1GEH5RGL6Ymz7TsuFKQJw__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)

This structure requires the use of Keras' [_functional API_](https://keras.io/guides/functional_api/). It's somewhat complicated to get into, so we can start from a minimal example.

In [None]:
def create_small_phasenet(num_outputs):

    inputs = keras.layers.Input(shape=(6000, 1))

    # Encoder
    # Level 1
    c1 = keras.layers.Conv1D(16, 9, padding='same', activation='relu')(inputs)
    c1 = keras.layers.Conv1D(16, 9, padding='same', activation='relu')(c1)
    p1 = keras.layers.MaxPooling1D(2)(c1)

    # Level 2
    c2 = keras.layers.Conv1D(32, 7, padding='same', activation='relu')(p1)
    c2 = keras.layers.Conv1D(32, 7, padding='same', activation='relu')(c2)
    p2 = keras.layers.MaxPooling1D(2)(c2)

    # Bottleneck
    c3 = keras.layers.Conv1D(64, 5, padding='same', activation='relu')(p2)
    c3 = keras.layers.Conv1D(64, 5, padding='same', activation='relu')(c3)

    # Decoder
    # Level 2
    u2 = keras.layers.UpSampling1D(2)(c3)
    u2 = keras.layers.Concatenate()([u2, c2])
    c4 = keras.layers.Conv1D(32, 7, padding='same', activation='relu')(u2)
    c4 = keras.layers.Conv1D(32, 7, padding='same', activation='relu')(c4)

    # Level 1
    u1 = keras.layers.UpSampling1D(2)(c4)
    u1 = keras.layers.Concatenate()([u1, c1])
    c5 = keras.layers.Conv1D(16, 9, padding='same', activation='relu')(u1)
    c5 = keras.layers.Conv1D(16, 9, padding='same', activation='relu')(c5)

    # Output
    outputs = keras.layers.Conv1D(num_outputs, 1, activation='softmax')(c5)

    model = keras.Model(inputs=inputs, outputs=outputs, name='small_phasenet')

    return model

In [None]:
small_phasenet = create_small_phasenet(number_of_outputs)

# Optional: Plot the model as a graph
keras.utils.plot_model(small_phasenet, show_shapes=True, show_layer_names=True)

Time to train it:

In [None]:
sequential_model.fit(
    train_dataset,
    validation_data=test_dataset,
    epochs=5,
    verbose=1
)

### Exercise

Plot an event, and evaluate results for the testing dataset.

### Exercise

With bigger models comes bigger risk of numerical instability during training. To mitigate this, we can introduce [`BatchNormalization`](https://keras.io/api/layers/normalization_layers/batch_normalization/) layers. Usually we would put these between a convolutional layer and the activation function, like so:

```
c1 = keras.layers.Conv1D(16, 9, padding='same')(inputs)
c1 = keras.layers.BatchNormalization()(c1)
c1 = keras.layers.Activation('relu')(c1)
```

Add such layers to the model, and see if things improve, especially if you add additional Conv1D blocks as well. 

## Construct an _even_ more complicated model

To show how the _functional API_ can make it somewhat easier to build complex models, have a look at the code below. This is also a PhaseNet-type model, but where we add layers in _loops_, instead of writing each layer explicitly:

In [None]:
def create_bigger_phasenet():
    
    inputs = keras.Input(shape=(6000, 1))

    # First half of the network: downsampling inputs

    # Entry block
    x = keras.layers.Conv1D(32, 3, strides=2, padding="same")(inputs)
    x = keras.layers.BatchNormalization()(x)
    x = keras.layers.Activation("relu")(x)

    previous_block_activation = x  # Set aside for residual (aka. skip) connection

    # Blocks 1, 2, 3 are identical apart from the feature depth.
    for filters in [64, 64, 64]:
        x = keras.layers.Activation("relu")(x)
        x = keras.layers.Conv1D(filters, 3, padding="same")(x)
        x = keras.layers.BatchNormalization()(x)

        x = keras.layers.Activation("relu")(x)
        x = keras.layers.Conv1D(filters, 3, padding="same")(x)
        x = keras.layers.BatchNormalization()(x)

        x = keras.layers.MaxPooling1D(3, strides=2, padding="same")(x)

        # Project residual, and add it
        residual = keras.layers.Conv1D(filters, 1, strides=2, padding="same")(
            previous_block_activation
        )
        x = keras.layers.add([x, residual])
        previous_block_activation = x  # Set aside next residual

    # Second half: upsampling inputs

    for filters in [64, 64, 64, 32]:
        x = keras.layers.Activation("relu")(x)
        x = keras.layers.Conv1DTranspose(filters, 3, padding="same")(x)
        x = keras.layers.BatchNormalization()(x)

        x = keras.layers.Activation("relu")(x)
        x = keras.layers.Conv1DTranspose(filters, 3, padding="same")(x)
        x = keras.layers.BatchNormalization()(x)

        x = keras.layers.UpSampling1D(2)(x)

        # Project residual
        residual = keras.layers.UpSampling1D(2)(previous_block_activation)
        residual = keras.layers.Conv1D(filters, 1, padding="same")(residual)
        x = keras.layers.add([x, residual])  # Add back residual
        previous_block_activation = x  # Set aside next residual

    # Get the correct output shape
    outputs = keras.layers.Conv1D(3, 3, activation="softmax", padding="same")(x)

    # Define the model
    model = keras.Model(inputs, outputs)

    return model


In [None]:
big_phasenet = create_bigger_phasenet()
big_phasenet.summary()

## Exercise

Train the bigger model! 