Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add max-stack option to reassemble array predictions #99

Merged
merged 5 commits into from Jun 15, 2022

Conversation

calum-chamberlain
Copy link
Contributor

@calum-chamberlain calum-chamberlain commented Jun 14, 2022

Possible enhancement/discussion point

Currently prediction time-series from overlapping windows are combined into one prediction time-series spanning the entire stream fed to the model. To combine these overlapping windows the average of these windows is taken. For EQTransformer combining by taking the mean is not ideal as the prediction value depends quite strongly on when the phase occurs within the 60s window used for prediction.

Consider the below case for the same earthquake recorded on site SNZO (and reported in the EQTransformer paper:

from obspy import UTCDateTime
from obspy.clients.fdsn import Client
import numpy as np
import seisbench.models as sbm
import matplotlib.pyplot as plt

def plot_predictions(tr, cft, p_time):
    colors = {"Detection": "green", "P": "red", "S": "blue"}
    fig, ax = plt.subplots()
    ax.plot(tr.times(reftime=p_time), tr.data / np.max(np.abs(tr.data)), color="grey", alpha=0.4, label=tr.id)
    for _tr in cft:
        cft_type = _tr.stats.channel.split('_')[-1]
        color = colors.get(cft_type, 'k')
        ax.plot(_tr.times(reftime=p_time), _tr.data, color=color, label=cft_type)
    ax.legend()
    return fig

p_time = UTCDateTime(2001, 10, 13, 12, 47, 39, 257)
client = Client("IRIS")
st = client.get_waveforms(
    network="IU", station="SNZO", location="10", channel="BH?",
    starttime=p_time - 360, endtime=p_time + 420)
st = st.interpolate(sampling_rate=100.0)

model = sbm.EQTransformer.from_pretrained("original")

# First detect with P-arrival at 10-seconds into window
chunk = st.slice(p_time - 10, p_time + 50)
cft = model.annotate(chunk)
fig_1 = plot_predictions(chunk.select(component="Z")[0], cft, p_time)
fig_1.suptitle('P at 10s')

# Try again with the P-arrival at 20-seconds
chunk = st.slice(p_time - 20, p_time + 40)
cft = model.annotate(chunk)
fig_2 = plot_predictions(chunk.select(component="Z")[0], cft, p_time)
fig_2.suptitle('P at 20s')

# Try again with the P-arrival at 30-seconds
chunk = st.slice(p_time - 30, p_time + 30)
cft = model.annotate(chunk)
fig_3 = plot_predictions(chunk.select(component="Z")[0], cft, p_time)
fig_3.suptitle('P at 30s')
plt.show()

P at 10s

SNZO_P10
t 10s

P at 20s

SNZO_P20

P at 30s

SNZO_P30

In this example the main issue is likely that the whole event does not fit in the detection window when the P arrival is much later than 20s into the window. The result of this when overlapping windows and stacking as the mean of the overlaps is that the predictions are degraded by windows for which the P arrival is not at the optimum position:

# Using the default 18s overlap
chunk = st.slice(p_time - 10, p_time + 68)
cft = model.annotate(chunk)
fig_1 = plot_predictions(chunk.select(component="Z")[0], cft, p_time)
fig_1.show()

SNZO_P10_overlapped

This gets worse for larger overlaps:

model.default_args['overlap'] = int(50 * model.sampling_rate)
cft = model.annotate(chunk)
fig_1 = plot_predictions(chunk.select(component="Z")[0], cft, p_time)
fig_1.show()

SNZO_P10_overlapped_50s

And even worse for a more general case where the first window is not the "ideal" window:

chunk = st.slice(p_time - 40, p_time + 68)
cft = model.annotate(chunk)
fig_1 = plot_predictions(chunk.select(component="Z")[0], cft, p_time)
fig_1.show()

SNZO_P-40_overlapped_50s

The key point arising from this is that stacking as the average of overlapping prediction windows, at least for EQTransformer, results in degraded prediction arrays and resulting predictions. This can result in missing detections or picks. In EQTransformer itself it looks like this issue is circumvented by picking on each window individually and combining the picks later, although this does seem to result in pick duplication.

I suggest that instead of either of the above options it might be better to pick on the combined predictions, but combined by taking the maximum values in overlapping windows. This assumes that each discrete window is either "correct" (if the arrivals occur in "optimal" locations in the window) or under predicting.

Using the adaptation in the PR (you might prefer a different implementation, and I would suggest that max_stack is the default at least for EQTransformer) the final example above results in:

model.default_args["max_stack"] = True
chunk = st.slice(p_time - 40, p_time + 68)
cft = model.annotate(chunk)
fig_1 = plot_predictions(chunk.select(component="Z")[0], cft, p_time)
fig_1.show()

SNZO_P-40_overlapped_50s_maxstack

I think that this gives a "better" result that is at least closer to what I think the original intention of EQTransformer was. In the examples below with a closer earthquake you can see that the degradation in prediction quality isn't just happening when events do not fully fit in the window used. These events use a short-period station in the SAMBA network:

p_time = UTCDateTime(2008, 11, 11, 0, 57, 49, 324612) # rough P-arrival time, eyeballed
client = Client("IRIS")
st = client.get_waveforms(
    network="9F", station="EORO", location="*", channel="*",
    starttime=p_time - 360, endtime=p_time + 420)
st.detrend()

# First detect with P-arrival at 10-seconds into window
chunk = st.slice(p_time - 10, p_time + 50)
cft = model.annotate(chunk)
fig_1 = plot_predictions(chunk.select(component="Z")[0], cft, p_time)
fig_1.suptitle('P at 10s')

# Try again with the P-arrival at 20-seconds
chunk = st.slice(p_time - 20, p_time + 40)
cft = model.annotate(chunk)
fig_2 = plot_predictions(chunk.select(component="Z")[0], cft, p_time)
fig_2.suptitle('P at 20s')

# Try again with the P-arrival at 30-seconds
chunk = st.slice(p_time - 30, p_time + 30)
cft = model.annotate(chunk)
fig_3 = plot_predictions(chunk.select(component="Z")[0], cft, p_time)
fig_3.suptitle('P at 30s')

# Try again with the P-arrival at 30-seconds
chunk = st.slice(p_time - 40, p_time + 20)
cft = model.annotate(chunk)
fig_4 = plot_predictions(chunk.select(component="Z")[0], cft, p_time)
fig_4.suptitle('P at 40s')
plt.show()

P-arrival 10s into window:
EORO_P10

P-arrival 20s into window, degradation apparent:
EORO_P20

P-arrival 30s into window, S-pick unlikely to be made.
EORO_P30

P-arrival at 40s, detection would not be made, and no P or S picks.
EORO_P40


An aside on overlaps in EQTransformer

I ran into this when playing with overlaps - A student at VUW (Olivia) noticed that when she put her P arrivals in different places in her window for testing she got very different results and sometimes did not make a pick. This can be seen in the video below where a clear earthquake recorded at SAMBA site EORO is sometimes picked and sometimes missed when using an 18s overlap. Note that using the max-stack does not solve this issue because, depending on when your data start relative to the earthquake, there may never be a window that has the arrival in the optimal position. In the video below the data and predictions are as above, Each frame is the result of a prediction from a different chunk of data - each frame steps that by 5 seconds. A 120s window is used to ensure that multiple overlapping windows are used. The dashed lines are the thresholds of 0.5 and 0.3 suggested at one point in the EQTransformer paper.

seisbench_EQT_overlap_18.0s.mp4

The video below shows the same for a 30s overlap:

seisbench_EQT_overlap_30.0s.mp4

And finally with a 55s overlap. In this example the frame step is 0.1s so the video is much slower, but this serves to show more realistically the variability in pick skill and quality based on uncontrollable event timing:

seisbench_EQT_overlap_55.0s.mp4

Using the max-stack option creates a more stable result with the 55s overlap:

seisbench_EQT_overlap_55.0s_max_stack.mp4

@yetinam yetinam self-assigned this Jun 14, 2022
@yetinam yetinam added the enhancement New feature or request label Jun 14, 2022
@yetinam
Copy link
Member

yetinam commented Jun 14, 2022

Hey Calum,

thanks for the very comprehensive PR! That report is really interesting to see. I had seen some differences depending on the window position for EQTransformer myself yet, but I had never looked at it systematically. Out of interest, did you check what happens if you try models trained on different datasets? I could imagine the effect is less strong when using INSTANCE, because the waveforms windows in the dataset are longer, increasing the natural variability in pick positions during training.

Regarding the PR, I think the max option absolutely makes sense. And I agree, it would be a good choice to make this the default behaviour and I'd even set this as default for all models. It's a breaking change, but we'll only put it in v0.3 and as the old behaviour is still available, I think that's acceptable.

I only have three minor points:

  • The PR needs to be reformatted to the black code style. Easiest option is the pre-commit hook described here.
  • Instead of max_stack, I'd introduce a parameter stacking with default value max and avg as an alternative. This allows to add further stacking methods in the future.
  • There should be a test for the different options. Probably, it would be easiest to extend this one.

Last remark: It's not necessary to overwrite the entries in model.default_args. Instead, parameters passed to annotate will automatically supersede the default_args for that invocation. To use a different overlap, just pass it to annotate. I also opened an issue to improve the documentation on this funtionality #100.

@yetinam yetinam added this to the v0.3 milestone Jun 14, 2022
@calum-chamberlain
Copy link
Contributor Author

Great, thanks for that feedback, and apologies for the length of the text for such a small code change! I thought I would use this as an opportunity to document the motivation for this.

I will have a play with INSTANCE now and post an update below this. Related to that and to #96 - the main difference between the two EQTransformer models apparently is the difference in the amount of data augmentation done. I don't know how comparable the augmentation in the seisbench trained models is to either of the EQTransformer models, but that would be interesting to think about further.

Thanks for those comments, and apologies for not meeting the black and test needs! That was just lazy.

Regarding the pre-commit hook: I had already created a python 3.10 env and attempted to set up the pre-commit hook in there, but the strict requirement of targeting python 3.8 in the .pre-commit-config.yml configuration meant that that was not possible in that env. I haven't tested whether this works in the recommended python 3.9 env in the contributing guide, but I wonder if you can set the language_version to python3? Happy to change my env though if this isn't possible.

  1. I added a test for the different options and set the default to "max" stacking. Feel free to change that.
  2. I also added a _stack_options to the WaveformModel - I don't know if you want this, but I thought that it might make it easier to extend with other stack options in the future, either within the base WaveformModel or subclasses that might want to alter or restrict the available stack options. I also hoped that this could help with documentation in Clean documentation for auxiliary parameters in annotate/classify #100 by getting the model._stack_options when building the documentation. Because of Clean documentation for auxiliary parameters in annotate/classify #100 I haven't made any docs changes and hope that you can include the required changes when the docs changes for Clean documentation for auxiliary parameters in annotate/classify #100 are made.

@yetinam
Copy link
Member

yetinam commented Jun 15, 2022

Perfect, thanks for the comprehensive updates. I'll merge now. Thanks once more for this contribution! And don't worry about the long report, I think it's really interesting to explicitly visualize the differences and I guess the question of translation invariance for pickers will also pop up in papers eventually.

Thanks also for pointing out the pre-commit issue. I'll adjust the language version to python3 after merging your commit.

@yetinam yetinam merged commit fa6674e into seisbench:main Jun 15, 2022
yetinam added a commit that referenced this pull request Jun 15, 2022
For details see discussion in comments to #99
yetinam added a commit that referenced this pull request Jun 15, 2022
For details see discussion in comments to #99
@calum-chamberlain calum-chamberlain deleted the max-stack branch June 15, 2022 21:27
@calum-chamberlain
Copy link
Contributor Author

calum-chamberlain commented Jun 16, 2022

Thanks for merging that. I made a couple of videos of the INSTANCE trained model for the same two earthquakes and stations (with my obvious NZ bias having two NZ based stations). For both of these each frame is showing a single 60s window, so there is no overlap.

SNZO (INSTANCE weights) - regional, broadband sensor

For the SNZO regional earthquake there are some variations between windows, but the P and S peak predictions are mostly quite stable. The detection prediction window is quite different in both examples to the detection windows given by the original EQTransformer model. In some windows there is a secondary spike in the P prediction value around the S-phase.

INSTANCE_SNZO_shifts.mp4

EORO (INSTANCE weights) - local, short-period sensor

For the local earthquake case there is quite a bit of variation. Some windows have spikes in the P prediction value around the S-arrival despite both phases being within the window, and there are generally quite strong variations between windows. Stacking using the max-stack method for this would probably give a poor result.

INSTANCE_EORO_shifts.mp4

We are planning on doing more testing of the various trained weights available using some of our well-picked manual catalogues, and I don't think that these two examples are necessarily representative of the general output for these models, but they do seem to have quite different performance to the original EQTransformer models.

The "STEAD" weights retrained in seisbench show slightly different results, and still have a lot of variability. They never reach the quite the same target detection prediction shape, but the P and S picks seem robust:

SNZO (STEAD weights)

STEAD_EQT_overlap_0.0s_SNZO.mp4

EORO (STEAD weights)

STEAD_EQT_overlap_0.0sEORO.mp4

Edit (@yetinam, 22/06/16 10:23): Slightly fixed formatting

@yetinam
Copy link
Member

yetinam commented Jun 16, 2022

Thanks for the follow-up. Really interesting to see these differences. At least parts of them (the better invariance to pick location in INSTANCE) match my intuition. Looking forward to the results from your systematic analysis!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants