Add max-stack option to reassemble array predictions #99

calum-chamberlain · 2022-06-14T00:43:10Z

Possible enhancement/discussion point

Currently prediction time-series from overlapping windows are combined into one prediction time-series spanning the entire stream fed to the model. To combine these overlapping windows the average of these windows is taken. For EQTransformer combining by taking the mean is not ideal as the prediction value depends quite strongly on when the phase occurs within the 60s window used for prediction.

Consider the below case for the same earthquake recorded on site SNZO (and reported in the EQTransformer paper:

from obspy import UTCDateTime
from obspy.clients.fdsn import Client
import numpy as np
import seisbench.models as sbm
import matplotlib.pyplot as plt

def plot_predictions(tr, cft, p_time):
    colors = {"Detection": "green", "P": "red", "S": "blue"}
    fig, ax = plt.subplots()
    ax.plot(tr.times(reftime=p_time), tr.data / np.max(np.abs(tr.data)), color="grey", alpha=0.4, label=tr.id)
    for _tr in cft:
        cft_type = _tr.stats.channel.split('_')[-1]
        color = colors.get(cft_type, 'k')
        ax.plot(_tr.times(reftime=p_time), _tr.data, color=color, label=cft_type)
    ax.legend()
    return fig

p_time = UTCDateTime(2001, 10, 13, 12, 47, 39, 257)
client = Client("IRIS")
st = client.get_waveforms(
    network="IU", station="SNZO", location="10", channel="BH?",
    starttime=p_time - 360, endtime=p_time + 420)
st = st.interpolate(sampling_rate=100.0)

model = sbm.EQTransformer.from_pretrained("original")

# First detect with P-arrival at 10-seconds into window
chunk = st.slice(p_time - 10, p_time + 50)
cft = model.annotate(chunk)
fig_1 = plot_predictions(chunk.select(component="Z")[0], cft, p_time)
fig_1.suptitle('P at 10s')

# Try again with the P-arrival at 20-seconds
chunk = st.slice(p_time - 20, p_time + 40)
cft = model.annotate(chunk)
fig_2 = plot_predictions(chunk.select(component="Z")[0], cft, p_time)
fig_2.suptitle('P at 20s')

# Try again with the P-arrival at 30-seconds
chunk = st.slice(p_time - 30, p_time + 30)
cft = model.annotate(chunk)
fig_3 = plot_predictions(chunk.select(component="Z")[0], cft, p_time)
fig_3.suptitle('P at 30s')
plt.show()

P at 10s

t 10s

P at 20s

P at 30s

In this example the main issue is likely that the whole event does not fit in the detection window when the P arrival is much later than 20s into the window. The result of this when overlapping windows and stacking as the mean of the overlaps is that the predictions are degraded by windows for which the P arrival is not at the optimum position:

# Using the default 18s overlap
chunk = st.slice(p_time - 10, p_time + 68)
cft = model.annotate(chunk)
fig_1 = plot_predictions(chunk.select(component="Z")[0], cft, p_time)
fig_1.show()

This gets worse for larger overlaps:

model.default_args['overlap'] = int(50 * model.sampling_rate)
cft = model.annotate(chunk)
fig_1 = plot_predictions(chunk.select(component="Z")[0], cft, p_time)
fig_1.show()

And even worse for a more general case where the first window is not the "ideal" window:

chunk = st.slice(p_time - 40, p_time + 68)
cft = model.annotate(chunk)
fig_1 = plot_predictions(chunk.select(component="Z")[0], cft, p_time)
fig_1.show()

The key point arising from this is that stacking as the average of overlapping prediction windows, at least for EQTransformer, results in degraded prediction arrays and resulting predictions. This can result in missing detections or picks. In EQTransformer itself it looks like this issue is circumvented by picking on each window individually and combining the picks later, although this does seem to result in pick duplication.

I suggest that instead of either of the above options it might be better to pick on the combined predictions, but combined by taking the maximum values in overlapping windows. This assumes that each discrete window is either "correct" (if the arrivals occur in "optimal" locations in the window) or under predicting.

Using the adaptation in the PR (you might prefer a different implementation, and I would suggest that max_stack is the default at least for EQTransformer) the final example above results in:

model.default_args["max_stack"] = True
chunk = st.slice(p_time - 40, p_time + 68)
cft = model.annotate(chunk)
fig_1 = plot_predictions(chunk.select(component="Z")[0], cft, p_time)
fig_1.show()

I think that this gives a "better" result that is at least closer to what I think the original intention of EQTransformer was. In the examples below with a closer earthquake you can see that the degradation in prediction quality isn't just happening when events do not fully fit in the window used. These events use a short-period station in the SAMBA network:

p_time = UTCDateTime(2008, 11, 11, 0, 57, 49, 324612) # rough P-arrival time, eyeballed
client = Client("IRIS")
st = client.get_waveforms(
    network="9F", station="EORO", location="*", channel="*",
    starttime=p_time - 360, endtime=p_time + 420)
st.detrend()

# First detect with P-arrival at 10-seconds into window
chunk = st.slice(p_time - 10, p_time + 50)
cft = model.annotate(chunk)
fig_1 = plot_predictions(chunk.select(component="Z")[0], cft, p_time)
fig_1.suptitle('P at 10s')

# Try again with the P-arrival at 20-seconds
chunk = st.slice(p_time - 20, p_time + 40)
cft = model.annotate(chunk)
fig_2 = plot_predictions(chunk.select(component="Z")[0], cft, p_time)
fig_2.suptitle('P at 20s')

# Try again with the P-arrival at 30-seconds
chunk = st.slice(p_time - 30, p_time + 30)
cft = model.annotate(chunk)
fig_3 = plot_predictions(chunk.select(component="Z")[0], cft, p_time)
fig_3.suptitle('P at 30s')

# Try again with the P-arrival at 30-seconds
chunk = st.slice(p_time - 40, p_time + 20)
cft = model.annotate(chunk)
fig_4 = plot_predictions(chunk.select(component="Z")[0], cft, p_time)
fig_4.suptitle('P at 40s')
plt.show()

P-arrival 10s into window:

P-arrival 20s into window, degradation apparent:

P-arrival 30s into window, S-pick unlikely to be made.

P-arrival at 40s, detection would not be made, and no P or S picks.

An aside on overlaps in EQTransformer

I ran into this when playing with overlaps - A student at VUW (Olivia) noticed that when she put her P arrivals in different places in her window for testing she got very different results and sometimes did not make a pick. This can be seen in the video below where a clear earthquake recorded at SAMBA site EORO is sometimes picked and sometimes missed when using an 18s overlap. Note that using the max-stack does not solve this issue because, depending on when your data start relative to the earthquake, there may never be a window that has the arrival in the optimal position. In the video below the data and predictions are as above, Each frame is the result of a prediction from a different chunk of data - each frame steps that by 5 seconds. A 120s window is used to ensure that multiple overlapping windows are used. The dashed lines are the thresholds of 0.5 and 0.3 suggested at one point in the EQTransformer paper.

seisbench_EQT_overlap_18.0s.mp4

The video below shows the same for a 30s overlap:

seisbench_EQT_overlap_30.0s.mp4

And finally with a 55s overlap. In this example the frame step is 0.1s so the video is much slower, but this serves to show more realistically the variability in pick skill and quality based on uncontrollable event timing:

seisbench_EQT_overlap_55.0s.mp4

Using the max-stack option creates a more stable result with the 55s overlap:

seisbench_EQT_overlap_55.0s_max_stack.mp4

yetinam · 2022-06-14T08:26:04Z

Hey Calum,

thanks for the very comprehensive PR! That report is really interesting to see. I had seen some differences depending on the window position for EQTransformer myself yet, but I had never looked at it systematically. Out of interest, did you check what happens if you try models trained on different datasets? I could imagine the effect is less strong when using INSTANCE, because the waveforms windows in the dataset are longer, increasing the natural variability in pick positions during training.

Regarding the PR, I think the max option absolutely makes sense. And I agree, it would be a good choice to make this the default behaviour and I'd even set this as default for all models. It's a breaking change, but we'll only put it in v0.3 and as the old behaviour is still available, I think that's acceptable.

I only have three minor points:

The PR needs to be reformatted to the black code style. Easiest option is the pre-commit hook described here.
Instead of max_stack, I'd introduce a parameter stacking with default value max and avg as an alternative. This allows to add further stacking methods in the future.
There should be a test for the different options. Probably, it would be easiest to extend this one.

Last remark: It's not necessary to overwrite the entries in model.default_args. Instead, parameters passed to annotate will automatically supersede the default_args for that invocation. To use a different overlap, just pass it to annotate. I also opened an issue to improve the documentation on this funtionality #100.

calum-chamberlain · 2022-06-15T00:08:59Z

Great, thanks for that feedback, and apologies for the length of the text for such a small code change! I thought I would use this as an opportunity to document the motivation for this.

I will have a play with INSTANCE now and post an update below this. Related to that and to #96 - the main difference between the two EQTransformer models apparently is the difference in the amount of data augmentation done. I don't know how comparable the augmentation in the seisbench trained models is to either of the EQTransformer models, but that would be interesting to think about further.

Thanks for those comments, and apologies for not meeting the black and test needs! That was just lazy.

Regarding the pre-commit hook: I had already created a python 3.10 env and attempted to set up the pre-commit hook in there, but the strict requirement of targeting python 3.8 in the .pre-commit-config.yml configuration meant that that was not possible in that env. I haven't tested whether this works in the recommended python 3.9 env in the contributing guide, but I wonder if you can set the language_version to python3? Happy to change my env though if this isn't possible.

I added a test for the different options and set the default to "max" stacking. Feel free to change that.
I also added a _stack_options to the WaveformModel - I don't know if you want this, but I thought that it might make it easier to extend with other stack options in the future, either within the base WaveformModel or subclasses that might want to alter or restrict the available stack options. I also hoped that this could help with documentation in Clean documentation for auxiliary parameters in annotate/classify #100 by getting the model._stack_options when building the documentation. Because of Clean documentation for auxiliary parameters in annotate/classify #100 I haven't made any docs changes and hope that you can include the required changes when the docs changes for Clean documentation for auxiliary parameters in annotate/classify #100 are made.

yetinam · 2022-06-15T09:08:32Z

Perfect, thanks for the comprehensive updates. I'll merge now. Thanks once more for this contribution! And don't worry about the long report, I think it's really interesting to explicitly visualize the differences and I guess the question of translation invariance for pickers will also pop up in papers eventually.

Thanks also for pointing out the pre-commit issue. I'll adjust the language version to python3 after merging your commit.

For details see discussion in comments to #99

calum-chamberlain · 2022-06-16T02:02:47Z

Thanks for merging that. I made a couple of videos of the INSTANCE trained model for the same two earthquakes and stations (with my obvious NZ bias having two NZ based stations). For both of these each frame is showing a single 60s window, so there is no overlap.

SNZO (INSTANCE weights) - regional, broadband sensor

For the SNZO regional earthquake there are some variations between windows, but the P and S peak predictions are mostly quite stable. The detection prediction window is quite different in both examples to the detection windows given by the original EQTransformer model. In some windows there is a secondary spike in the P prediction value around the S-phase.

INSTANCE_SNZO_shifts.mp4

EORO (INSTANCE weights) - local, short-period sensor

For the local earthquake case there is quite a bit of variation. Some windows have spikes in the P prediction value around the S-arrival despite both phases being within the window, and there are generally quite strong variations between windows. Stacking using the max-stack method for this would probably give a poor result.

INSTANCE_EORO_shifts.mp4

We are planning on doing more testing of the various trained weights available using some of our well-picked manual catalogues, and I don't think that these two examples are necessarily representative of the general output for these models, but they do seem to have quite different performance to the original EQTransformer models.

The "STEAD" weights retrained in seisbench show slightly different results, and still have a lot of variability. They never reach the quite the same target detection prediction shape, but the P and S picks seem robust:

SNZO (STEAD weights)

STEAD_EQT_overlap_0.0s_SNZO.mp4

EORO (STEAD weights)

STEAD_EQT_overlap_0.0sEORO.mp4

Edit (@yetinam, 22/06/16 10:23): Slightly fixed formatting

yetinam · 2022-06-16T09:16:32Z

Thanks for the follow-up. Really interesting to see these differences. At least parts of them (the better invariance to pick location in INSTANCE) match my intuition. Looking forward to the results from your systematic analysis!

Add max-stack option

5711173

yetinam self-assigned this Jun 14, 2022

yetinam added the enhancement New feature or request label Jun 14, 2022

yetinam added this to the v0.3 milestone Jun 14, 2022

calum-chamberlain added 4 commits June 15, 2022 10:22

Change max_stack to stacking and add known options

ac7ee10

blacken

c2a912c

Add tests

0c1b4a4

Sanitize test

b6ff93e

yetinam merged commit fa6674e into seisbench:main Jun 15, 2022

yetinam added a commit that referenced this pull request Jun 15, 2022

Update pre-commit hook for flexibility in python version

9377ccc

For details see discussion in comments to #99

yetinam added a commit that referenced this pull request Jun 15, 2022

Update pre-commit hook for flexibility in python version

c2f66ff

For details see discussion in comments to #99

calum-chamberlain deleted the max-stack branch June 15, 2022 21:27

yetinam mentioned this pull request Jan 6, 2023

A bug for merging the PhaseNet (not PhaseNetLight) model outputs of continuous data: the sum of 'NPS' outputs exceeds 1 for overlapped waveform segments #160

Closed

yetinam mentioned this pull request Jan 18, 2024

P- and S-picks probability depend on start time #261

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add max-stack option to reassemble array predictions #99

Add max-stack option to reassemble array predictions #99

calum-chamberlain commented Jun 14, 2022 •

edited

yetinam commented Jun 14, 2022

calum-chamberlain commented Jun 15, 2022

yetinam commented Jun 15, 2022

calum-chamberlain commented Jun 16, 2022 •

edited by yetinam

yetinam commented Jun 16, 2022

Add max-stack option to reassemble array predictions #99

Add max-stack option to reassemble array predictions #99

Conversation

calum-chamberlain commented Jun 14, 2022 • edited

Possible enhancement/discussion point

P at 10s

P at 20s

P at 30s

An aside on overlaps in EQTransformer

yetinam commented Jun 14, 2022

calum-chamberlain commented Jun 15, 2022

yetinam commented Jun 15, 2022

calum-chamberlain commented Jun 16, 2022 • edited by yetinam

SNZO (INSTANCE weights) - regional, broadband sensor

EORO (INSTANCE weights) - local, short-period sensor

SNZO (STEAD weights)

EORO (STEAD weights)

yetinam commented Jun 16, 2022

calum-chamberlain commented Jun 14, 2022 •

edited

calum-chamberlain commented Jun 16, 2022 •

edited by yetinam