<a href="https://colab.research.google.com/github/sgrubas/cats/blob/main/tutorials/DetectionTutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Earthquake detection via Cluster Analysis of Trimmed Spectrogram (CATS)

This notebook explains the usage of the CATS detector.

A minimalistic example would look like this:

```python
data = import_sample_data()
detector = cats.CATSDetector(**parameters)
result = detector.detect(data)
result.plot((1, 2))
```

Below is more detailed explanation. 

# Content

This notebook covers some of the principles of CATS and its extensions:

1. [CATS Detector](#CATS-Detector) - explains how to start, parameters, and outputs.
2. [Joint processing of three-component data](#Joint-processing-of-three-component-data)
3. [Multi-trace detection and voting](#Multi-trace-detection-and-voting)
4. [Cluster catalogs](#Cluster-catalogs) - more details on working with time-frequency attributes of labeled clusters
5. [Pre-tuned CATS models](#Pre-tuned-CATS-models)

# Installation

Uncomment and execute the cell to install CATS

In [48]:
# pip install git+https://github.com/sgrubas/cats.git

# Imports

In [1]:
import numpy as np
import holoviews as hv

First import may make up to 3-5 mins due to JIT compiled functions, but next time they will be cached and import will be faster ~10-20 secs

In [2]:
import cats

  prominences[cid_d] = (prominences[cid_d][0],  # index
  prominences[cid_d] = (prominences[cid_d][0],  # index


<hr>

# Import of synthetic dataset

Basically, we may have any number of traces/receivers and components so that shape of the data can be arbitrary, including only one trace `(N,)`. 

But in this example, we will consider multiple 3-component receivers

**Note**, the API supports data format in `numpy.ndarray` or `obspy.Stream`.

In [4]:
data = cats.import_sample_data()

Dclean = data['data']
time = data['time']  # time
dt = data['dt']      # sampling time
x = data['x']        # location of recievers 
dimensions = ["Component", "Receiver", "Time"]

In [5]:
print(f"Input dataset shape\t=\t{Dclean.shape}")
print(f"Dimensions correspond to\t{dimensions}")
print(*[f"{dim} : {shp}" for dim, shp in zip(dimensions, Dclean.shape)], sep='\t')

Input dataset shape	=	(3, 10, 70000)
Dimensions correspond to	['Component', 'Receiver', 'Time']
Component : 3	Receiver : 10	Time : 70000


In [6]:
# contamination with white gaussian noise
np.random.seed(132)
noise_scale = 0.1
Noise = np.random.randn(*Dclean.shape) * noise_scale   # colored noise
Noise += noise_scale * np.sin(time * 2 * np.pi * 50)[None, None, :]  # constant electric 50 Hz noise
D = Dclean + Noise

# CATS Detector

## Parameters

Detector is implemented as operator `cats.CATSDetector(parameters)`

Data parameters:

0. `dt_sec`                 - sampling time in **seconds**
1. `name`                   - defines the operator's name, default `'CATS'`

Main parameters:

1. `stft_window_type`       - type of STFT window like 'hann' or 'hamming'. See also `scipy.signal.get_window()` for more windows
1. `stft_window_sec`        - length of STFT window in **seconds**
2. `stft_overlap`           - overlap rate of STFT windows, range (0, 1) (e.g. `0.5` is 50%)
3. `minSNR`                 - minimum Signal-to-Noise Ratio, range ~ (4 - 12), used to estimate noise standard deviation
4. `stationary_frame_sec`   - frame length where noise is stationary, in **seconds**
5. `cluster_size_t_sec`     - minimum cluster size in time or **minimum time duration** of strongest phases in signal, in **seconds**
6. `cluster_size_f_Hz`      - minimum cluster size in frequency in **hertz** (frequency width of signal),
7. `cluster_size_f_octaves` - minimum cluster size in frequency in **octaves** (log2 scale). If > 0, it supersedes `cluster_size_f_Hz`. Default `-1`
8. `cluster_distance_t_sec`  - neighborhood distance for clustering in time or **minimum separation time** between different events, in **seconds**
9. `cluster_distance_f_Hz`   - neighborhood distance for clustering in frequency (minimum separation in frequency), in **hertz**
10. `cluster_distance_f_octaves`   - neighborhood distance for clustering in frequency in **octaves**. If > 0, it supersedes `cluster_distance_f_Hz`. Default `-1`
11. `freq_bandpass_Hz`        - bandpass frequencies in Hz (e.g. (5, 50)), everything outside will be zeroed.

Multi-component extension parameters (see more in Section **[Joint processing of three-component data](#Joint-processing-of-three-component-data)**):
1. `aggr_clustering_axis`  - axis of array where components are placed, enables multi-component masking, e.g. for 3C data with shape (3, 10, 5000), this would be `0` as components are on the 0th axis. Default `None`, not applied.

Multi-trace extension parameters (see more in Section **[Multi-trace detection and voting](#Multi-trace-detection-and-voting)**):
1. `clustering_multitrace`   - multitrace clustering, `True/False`, can improve precision for arrays of receivers, default `False`
3. `cluster_size_trace`      - minimum number of traces for multitrace clustering, {1, 2, ...} (e.g. `2`)
4. `cluster_distance_trace`  - neighborhood distance for connecting traces, {1, 2, ...} (e.g. `2`)

Cluster catalog parameters (see more in Section **[Cluster catalogs](#Cluster-catalogs)**):
1. `cluster_catalogs_funcs`  - list of functions calculating time-frequency attributes 
2. `cluster_feature_distributions` - list of distributions that can be used for time-frequency attributes calculation (e.g. `["spectrogram", "spectrogram_SNR", "coefficients"]`)
3. `cluster_catalogs_opts`     - secondary options for the cluster catalog calculation
4. `cluster_catalogs_filter`   - function that can filter cluster catalog by some time-frequency attributes (e.g. `catalog.Frequency_peak_Hz > 15`)

Additional STFT parameters:
1. `stft_nfft`       - zero-padding for STFT windows, recommended a power of 2 (e.g. `512`). Default is closest power of 2, bigger than the window.  
2. `stft_backend`    - backend for computing STFT, available are `['ssqueezepy', 'ssqueezepy_gpu', 'scipy']`. Default `ssqueezepy` as the fastest; GPU acceleration is available via 'ssqueezepy_gpu' and requires GPU drivers for PyTorch.
3. `stft_kwargs`     - additional parameters for `cats.timefrequency.STFTOperator`

Additional parameters for noise estimation of BEDATE:
1. `bedate_freq_grouping_Hz`       - width in **Hz** to group frequency bins for joint noise estimation 
2. `bedate_freq_grouping_octaves`  - width in **octaves** to group frequency bins for joint noise estimation. If > 0, it supersedes `bedate_freq_grouping_Hz`, default `-1`.
3. `date_Q`  - a constant defining the number of elements with low magnitude to not use for noise estimation, based on using Bienayme-Chebyshev's inequality (DATE algorithm)
4. `date_Nmin_percentile`  - a fraction of elements with low magnitude to not use for noise estimation (DATE algorithm). It supersedes `date_Q`
5. `date_original_mode`  - whether to use the original mode of DATE, default `False`. If `True`, it is prone to underestimate noise level when no signal is present.

Phase separation parameters (details will be added later)
1. `phase_separation` - a function that can separate a single cluster into many different based on a time-frequency distribution. 

In [7]:
detector = cats.CATSDetector(dt_sec=dt,
                             stft_window_type='hann',
                             stft_window_sec=0.5, 
                             stft_overlap=0.8,
                             minSNR=5.5,
                             stationary_frame_sec=200,
                             cluster_size_t_sec=0.2,
                             cluster_size_f_Hz=10,
                             cluster_distance_t_sec=0.2,
                             cluster_distance_f_Hz=2)

The instantiated `detector` has four main methods:
1. `.detect` - performs the detection of events
2. `.detect_to_file` - performs the detection and saves to a file
3. `.detect_on_files` - performs the detection by reading from files and saving to files
4. `.STFT` - link to the corresponding STFT operator, see more in `cats.STFTOperator(...)` 

As well, all the input paramaters such as `stationary_frame_sec`, `cluster_size_t_sec`, `cluster_size_f_Hz`, etc., are used to calculate indexed lengths `stationary_frame_len`, `cluster_size_t_len`, `cluster_size_f_len`, etc., according to the given samplings `dt_sec`, `stft_overlap`, and `stft_nfft`.

**Note**, other properties can be viewed via `.dict()`

To save figure, we need to use `hv.save(...)`

In [8]:
# dpi - increases resolution but slower to save and bigger file

# hv.save(fig_noise_psd, "../fig_noise_psd_sample.png", dpi=100)  

## Applying CATS Detector

To apply the detection, we need to pass the input data `x` to the function `detector.detect(x)`.

```python
detector.detect(
               x,  # input data (numpy.ndarray), the last axis is time
               verbose,  # True/False, print status messages (stages and timing)
               full_info, # True/False/'qc', save intermediate steps for quality control
               )
```

`x` is `numpy.ndarray` and may have any number of dimensions with shape `(..., N)`, but the last axis `N` must be Time.

In [9]:
print(D.shape)

(3, 10, 70000)


In [10]:
result = detector.detect(D, verbose=True, full_info=True)

1. STFT	...	Completed in 0.188 sec
2. B-E-DATE trimming	...	Completed in 0.0502 sec
3. Clustering	...	Completed in 0.00637 sec
4. Cluster catalog	...	Completed in 0.0381 sec
5. Projecting intervals	...	Completed in 0.0618 sec
Total elapsed time:	0.345 sec



The result of detection is written to `cats.CATSDetectionResult` object for convenience of looking at the results on each step of the detection. 

It has several attributes:
1. `.signal`  -  original input signal `x`, shape `(..., N)`
2. `.coefficients`  -  result of STFT transform, complex-valued coefficients, shape `(..., Nf, Nt)`, where `Nf` number of frequencies, `Nt` number of time samples after STFT
3. `.spectrogram`  -  absolute value of complex-valued `.coefficients`, shape `(..., Nf, Nt)`
4. `.time_frames`  -  time frames of where noise is assumed to be stationary, shape `(nt, 2)`, `[[frame1_start, frame1_end], ...]`, where `nt` is number of stationary time frames
5. `frequency_groups_indexes`  -  frequency groups that were processed in B-E-DATE for noise estimation together, shape `(gNf, 2)`, where `gNf` number of frequency groups
5. `.noise_std`  -  noise standard deviation as a function of time and frequency, shape `(..., gNf, nt)`
6. `.noise_threshold_conversion`  -  converions constants from `noise_std` to actual thresholds `(gNf,)`
7. `.spectrogram_SNR`  -  spectrogram of SNR values, shape `(..., Nf, Nt)`
8. `.spectrogram_trim_mask`  -  binary mask for trimming spectrogram, obtained from BEDATE, shape `(..., Nf, Nt)`
9. `.spectrogram_trim_mask_aggr`  -  aggregated binary mask for trimming spectrogram for multi-component data. If shape of `x` was `(Nc, Nr, N)`, it has shape `(1, Nr, Nf, Nt)`, unlike `.spectrogram_trim_mask` of shape `(Nc, Nr, Nf, Nt)`.
10. `.spectrogram_SNR_clustered`  -  clustered trimmed SNR spectrogram, shape `(..., Nf, Nt)`
11. `.spectrogram_event_ID`  -  cluster indexes of `.spectrogram_SNR_clustered`, shape `(..., Nf, Nt)`
12. `.spectrogram_cluster_ID`  -  alias of cluster `.spectrogram_event_ID`. But after phase separation, they are different.
13. `.cluster_catalogs`  -  dataframe with time-frequency attributes for each detected cluster.
14. `.detection`  -  detection result as a binary projection of clusters onto time domain, shape `(..., Nt)`, wherein `1` is seismic event, `0` is noise
15. `.likelihood`  -  likelihood curve obtained by projecting `.spectrogram_SNR_clustered` onto time, shape `(..., Nt)`
16. `.detected_intervals`  -  detected intervals from `.detection`, but now it is array of `[time_begin, time_end]`
18. `.frequencies`  -  frequency axis of STFT `(Nf,)`
19. `.minSNR` - minimum Signal-to-Noise Ratio used in detection
20. `.dt_sec` - sampling in seconds of the input `signal` 
21. `.tf_dt_sec` - sampling in seconds of the STFT operator
22. `.tf_t0_sec` - starting time of performed STFT, almost always `0` but rarely ~`-tf_dt_sec`
23. `.time_npts` - number of time points of the input `signal` -> `N`
10. `.tf_time_npts` - number of time points after the STFT -> `Nt`
21. `.history` - recorded timing of each processing stage (in seconds)
22. `.time()` - function, giving original time axis in seconds `(N,)`
23. `.tf_time()`  -  function, giving time axis after STFT in seconds `(Nt,)`

Additionally:
- `.save(filename)` - to save the result as PICKLE file (via pickle)
- `.load(filename)` - to load a saved result from PICKLE file (via pickle)

To save the results with all the info we specified in `full_info` argument, use `save` method.

In [11]:
result.save("test_save")

We can also import a saved result via `cats.CATSDetectionResult.load`

In [12]:
result = cats.CATSDetectionResult.load("test_save.pickle")

Each attribute can be accessed as follows:

In [13]:
# in seconds
result.time_frames

array([[  0.        , 136.73995599]])

In [14]:
# print the detection result on 'Z' component of 4th trace and associated time indexes

sample_ind = (2, 4)  # comp 2 - 'Z' and 5th receiver (indexing from 0)
result.detected_intervals[sample_ind], result.tf_time()

(array([[  3.80961987,   4.72392864],
        [  9.80342181,  11.22567989],
        [ 16.81312237,  17.42266155],
        [ 50.03300767,  51.76003535],
        [123.07611937, 124.3967876 ]]),
 array([0.00000000e+00, 1.01589863e-01, 2.03179727e-01, ...,
        1.36536776e+02, 1.36638366e+02, 1.36739956e+02]))

If we need only time intervals, we can look at the extracted time intervals

In [15]:
result.detected_intervals[0, 2]  # in seconds

array([[10.1081914 , 10.9209103 ],
       [15.39086429, 16.91471224],
       [51.35367589, 53.28388329],
       [94.52936779, 94.93572724]])

### Clusters catalog
From the detected events, we can retrieve a catalog of their cluster features.

In [16]:
result.cluster_catalogs

Unnamed: 0_level_0,Unnamed: 1_level_0,Event_ID,Cluster_ID,Time_start_sec,Time_end_sec,Time_peak_sec,Frequency_start_Hz,Frequency_end_Hz,Frequency_peak_Hz,Energy_peak,Energy_mean,Energy_sum,SNR_sum,SNR_mean,SNR_peak,Interval_ID,Interval_start_sec,Interval_end_sec
Trace_dim_0,Trace_dim_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
0,0,1,1,15.492784,16.000082,15.746429,5.998384,19.994613,13.996229,6.812304,4.883996,97.679924,97.679924,4.883996,6.812304,0.0,15.492454,16.000403
0,0,2,2,105.196352,105.501024,105.348688,203.945052,213.942359,205.944514,3.716320,3.523146,24.662020,24.662020,3.523146,3.716320,2.0,105.196303,105.501073
0,0,3,3,82.237057,82.643291,82.389379,209.943436,217.941282,217.941282,4.171212,3.550797,24.855576,24.855576,3.550797,4.171212,1.0,82.236994,82.643354
0,1,1,1,51.658545,53.690147,52.217190,0.000000,11.996768,5.998384,9.320030,5.161701,289.055237,289.055237,5.161701,9.320030,3.0,51.658445,53.690243
0,1,2,2,10.515035,11.326816,10.666936,3.998923,11.996768,5.998384,9.859459,5.324536,122.464333,122.464333,5.324536,9.859459,0.0,10.514551,11.327270
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2,8,4,4,4.928124,5.230901,5.079493,79.978452,87.976297,85.976836,4.167203,3.606102,21.636612,21.636612,3.606102,4.167203,0.0,4.927108,5.231878
2,9,1,1,49.626752,51.048805,49.880623,0.000000,11.996768,5.998384,26.566648,8.476290,389.909363,389.909363,8.476290,26.566648,2.0,49.626648,51.048906
2,9,2,2,122.771392,123.584027,123.126914,0.000000,9.997306,3.998923,42.092949,14.550985,465.631531,465.631531,14.550985,42.092949,3.0,122.771350,123.584069
2,9,3,3,11.429305,12.139565,11.682834,5.998384,13.996229,7.997845,6.044577,4.147384,58.063374,58.063374,4.147384,6.044577,0.0,11.428860,12.139989


## Reset parameters
If needed, some parameters can be reset

In [17]:
detector.reset_params(minSNR=6)

## Visualization

### Single trace and workflow steps

Another available option from `cats.CATSDetectionResult` is method `.plot(ind)`. Which can display each step of the detection workflow on a 1D trace indicated by argument `ind` which must be `tuple of ints`.

**Note**, to plot the workflow stages, `full_info` must be `'qc'` or `True` in `.detect(..., full_info='qc')` (`'qc'` is equivalent to `.get_qc_keys()`)

**\*** `'qc'` will save only what is needed for quality control `.plot` function

In [18]:
# to see how `full_info` works
result = detector.detect(D, verbose=True, 
                         full_info='qc'  # to save only necessary stages for quality control plotting
                        )

1. STFT	...	Completed in 0.175 sec
2. B-E-DATE trimming	...	Completed in 0.0503 sec
3. Clustering	...	Completed in 0.00442 sec
4. Cluster catalog	...	Completed in 0.0276 sec
5. Projecting intervals	...	Completed in 0.0613 sec
Total elapsed time:	0.318 sec



In [19]:
# Testing Save and load again
result.save("test_save")
result = cats.CATSDetectionResult.load("test_save.pickle")

In [20]:
# retrieved statistics of clusters 
result.cluster_catalogs.loc[0, 5]  # component 0, station 5

Unnamed: 0_level_0,Unnamed: 1_level_0,Event_ID,Cluster_ID,Time_start_sec,Time_end_sec,Time_peak_sec,Frequency_start_Hz,Frequency_end_Hz,Frequency_peak_Hz,Energy_peak,Energy_mean,Energy_sum,SNR_sum,SNR_mean,SNR_peak,Interval_ID,Interval_start_sec,Interval_end_sec
Trace_dim_0,Trace_dim_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
0,5,1,1,49.931521,51.556756,51.099701,0.0,11.996768,3.998923,39.854404,11.920083,631.764404,631.764404,11.920083,39.854404,2.0,49.931418,51.556856
0,5,2,2,10.210279,11.123628,10.565346,1.999461,11.996768,5.998384,15.575294,6.70651,228.021347,228.021347,6.70651,15.575294,0.0,10.209781,11.12409
0,5,3,3,17.219779,18.235098,17.778226,3.998923,23.993536,15.99569,10.559553,6.484961,440.977325,440.977325,6.484961,10.559553,1.0,17.219482,18.23538


In [21]:
fig = result.plot(ind=(0, 5), time_interval_sec=(7, 22))
fig

In [22]:
# hv.save(fig, 'CATS_detection_demo.png', dpi=250)

### Visualization of the detection result on multiple traces

In [23]:
fig = cats.plot_traces(data=Dclean, time=time, time_interval_sec=(7, 22), gain=0.5, per_station_scale=False)
fig = fig.opts(ylabel='Location (km)', title='Clean data')
# hv.save(fig, "../clean_traces_sample.png", dpi=250)  # use this to save figure
fig

In [24]:
# for plotting traces with the results, we can use the built-in `plot_traces` method of the `result` object
fig = result.plot_traces(intervals=True, time_interval_sec=(7, 22), gain=0.5, alpha=0.25)
fig = fig.opts(ylabel='Location (km)', title='Noisy data with detected intervals, all components')
fig

In [25]:
# for plotting traces with the results, we can use the built-in `plot_traces` method of the `result` object
comp = 0
fig = result.plot_traces(ind=comp, intervals=True, time_interval_sec=(7, 22), gain=0.5, alpha=0.25)
fig = fig.opts(ylabel='Location (km)', title=f'Noisy data with detected intervals, component {comp}')
fig

<hr>

# Joint processing of three-component data

CATS can process multi-component jointly by combining energy information from all available components.

In [26]:
detector1C = cats.CATSDetector(dt_sec=dt,
                               stft_window_type='hann',
                               stft_window_sec=1.0, 
                               stft_overlap=0.8,
                               minSNR=6,
                               stationary_frame_sec=200,
                               cluster_size_t_sec=0.2,
                               cluster_size_f_Hz=8,
                               cluster_distance_t_sec=0.2,
                               cluster_distance_f_Hz=2,
                               aggr_clustering_axis=None  # this parameter enables 3C
                               )

detector3C = cats.CATSDetector(dt_sec=dt,
                               stft_window_type='hann',
                               stft_window_sec=1.0, 
                               stft_overlap=0.8,
                               minSNR=6,
                               stationary_frame_sec=200,
                               cluster_size_t_sec=0.2,
                               cluster_size_f_Hz=8,
                               cluster_distance_t_sec=0.2,
                               cluster_distance_f_Hz=2,
                               aggr_clustering_axis=0  # '0'th axis of data array will be components
                               )

We set `aggr_clustering_axis = 0` because components in data are placed on 0th axis, see below

In [27]:
print(D.shape)
print("[Component, Station, Time]")
print(np.arange(D.ndim))

(3, 10, 70000)
[Component, Station, Time]
[0 1 2]


We can compare the different in 3C and 1C CATS detection

In [28]:
result_1C = detector1C ** D  # operator `**` applies and saves info for QC
result_3C = detector3C ** D

1. STFT	...	Completed in 0.214 sec
2. B-E-DATE trimming	...	Completed in 0.05 sec
3. Clustering	...	Completed in 0.00707 sec
4. Cluster catalog	...	Completed in 0.0242 sec
5. Projecting intervals	...	Completed in 0.0667 sec
Total elapsed time:	0.362 sec

1. STFT	...	Completed in 0.171 sec
2. B-E-DATE trimming	...	Completed in 0.0449 sec
3. Clustering	...	Completed in 0.00266 sec
4. Cluster catalog	...	Completed in 0.0191 sec
5. Projecting intervals	...	Completed in 0.0461 sec
Total elapsed time:	0.284 sec



**Note**, with the same parameters, CATS-3C is faster than CATS-1C, because it clusters single aggregated spectrogram instead of individual components separately. 

We can see the difference in spectrograms below:

In [29]:
station = 1
fig = result_1C.plot_multi([(0, station), (1, station), (2, station)], 
                           time_interval_sec=(7, 20))
fig.opts(title=f"CATS-1C workflow for 3 components of station {station}", fontsize=dict(title=32))

In [30]:
station = 1
fig = result_3C.plot_multi([(0, station), (1, station), (2, station)], 
                           time_interval_sec=(7, 20))
fig.opts(title=f"CATS-3C workflow for 3 components of station {station}", fontsize=dict(title=32))

We see that both events, especially at 11 sec, which is very weak, were detected because CATS used energy information from all components instead processing them individually. This also significantly simplifies the interpretation of the detected intervals, because CATS-3C produces single interval per station, unlike CATS-1C which gives separate results for each channel/component.

In [31]:
print("CATS-1C output shape:", result_1C.detected_intervals.shape)  # intervals are given for each component separately
print("CATS-3C output shape:", result_3C.detected_intervals.shape)  # intervals are given for all components jointly

CATS-1C output shape: (3, 10)
CATS-3C output shape: (1, 10)


We can plot the detected intervals of CATS-1C and CATS-3C, and compare them

In [32]:
fig = result_1C.plot_traces(intervals=True, time_interval_sec=(7, 25), gain=0.3, alpha=0.2)
fig = fig.opts(ylabel='Location (km)', title='CATS-1C')
fig

In [33]:
fig = result_3C.plot_traces(intervals=True, time_interval_sec=(7, 25), gain=0.3, alpha=0.2)
fig = fig.opts(ylabel='Location (km)', title='CATS-3C')
fig

The detected intervals CATS-3C are unified and generally wider, because the energy information was combined from the three components. **Note**, events around 11 sec were properly detected by CATS-3C, contrary to CATS-1C.

<hr>

#  Multi-trace detection and voting
<a id="mCATS"></a>

Sometimes, the data are given on a "regular" array of receivers and earthquakes on the array represent coherent signal. This coherence across multiple stations can be used to enhance the detection quality. Specifically, we can do clustering of spectrograms across multiple stations simultaneously `Trace x Time x Frequency`.

Here we show the multitrace detection on the same dataset to showcase the improvement of the performance.

In [34]:
detector_mt_1C = cats.CATSDetector(dt_sec=dt,
                                   stft_window_type='hann',
                                   stft_window_sec=1, 
                                   stft_overlap=0.8,
                                   minSNR=6.5,
                                   stationary_frame_sec=1000, 
                                   cluster_size_t_sec=0.2,
                                   cluster_size_f_Hz=10,
                                   cluster_distance_t_sec=0.2,
                                   cluster_distance_f_Hz=2,
                                   aggr_clustering_axis=None,  # We can combine CATS-3C and multi-trace detection!
                                   clustering_multitrace=True,  # now we set multitrace clustering as `True`
                                   cluster_size_trace=3,  # VOTING criterion: minimum number of stations to declare an event VOTING
                                   cluster_distance_trace=1,  # number of neighbor stations for association
                                   )

detector_mt_3C = cats.CATSDetector(dt_sec=dt,
                                   stft_window_type='hann',
                                   stft_window_sec=1, 
                                   stft_overlap=0.8,
                                   minSNR=6.5,
                                   stationary_frame_sec=1000, 
                                   cluster_size_t_sec=0.2,
                                   cluster_size_f_Hz=10,
                                   cluster_distance_t_sec=0.2,
                                   cluster_distance_f_Hz=2,
                                   aggr_clustering_axis=0,  # We can combine CATS-3C and multi-trace detection!
                                   clustering_multitrace=True,  # now we set multitrace clustering as `True`
                                   cluster_size_trace=3,  # VOTING criterion: minimum number of stations to declare an event VOTING
                                   cluster_distance_trace=1,  # number of neighbor stations for association
                                   )

Note, the `multitrace` clustering will be performed across the dimension of the input array which goes before the Time axis.

In [35]:
print(f"Input shape : {D.shape}")
print(f"Multitrace clustering will be done on {len(D.shape) - 2} axis with {D.shape[-2]} elements")
print(f"If we swapped axes of input data to {D.swapaxes(0, 1).shape}, \
then clustering would be done on {D.swapaxes(0, 1).shape[-2]} elements")

Input shape : (3, 10, 70000)
Multitrace clustering will be done on 1 axis with 10 elements
If we swapped axes of input data to (10, 3, 70000), then clustering would be done on 3 elements


In [36]:
result_mt_1C = detector_mt_1C ** D  # mCATS-1C
result_mt_3C = detector_mt_3C ** D  # mCATS-3C

1. STFT	...	Completed in 0.23 sec
2. B-E-DATE trimming	...	Completed in 0.051 sec
3. Clustering	...	Completed in 0.0131 sec
4. Cluster catalog	...	Completed in 0.0282 sec
5. Projecting intervals	...	Completed in 0.059 sec
Total elapsed time:	0.382 sec

1. STFT	...	Completed in 0.178 sec
2. B-E-DATE trimming	...	Completed in 0.0482 sec
3. Clustering	...	Completed in 0.015 sec
4. Cluster catalog	...	Completed in 0.0203 sec
5. Projecting intervals	...	Completed in 0.033 sec
Total elapsed time:	0.294 sec



In [37]:
fig = result_mt_1C.plot_traces(intervals=True, time_interval_sec=(7, 22), gain=0.3, alpha=0.2)
fig = fig.opts(ylabel='Location (km)', title='mCATS-1C')
fig

In [38]:
fig = result_mt_3C.plot_traces(intervals=True, time_interval_sec=(7, 22), gain=0.3, alpha=0.2)
fig = fig.opts(ylabel='Location (km)', title='mCATS-3C')
fig

<hr>

# Cluster catalogs
<a id="catalogs"></a>

Cluster catalogs is a dataframe of time-frequency attributes calculated by CATS.

It allows more detailed post-processing and quality control based on analysis of events' attrbiutes, which allow for more fine grained classification of detected clusters.

It allows:
- custom functions for arbitrary time-frequency attributes
- custom rules and criteria based on arbitrary attributes to filter out noise (e.g. `catalog.Frequency_peak_Hz > 5`):
    - automaticaly on-the-fly
    - manual post-processing

## Custom time-frequency attributes

Time-frequency attributes are calculated for **each cluster individually**.
If one wants to define their own custom function for attributes, it must follow the signature:

- it takes at least these four inputs: `(freq, time, values_dict, inds)`
- it returns dictionary of calculated attributes `{"my_attribute": scalar_or_array}`

In [39]:
def my_custom_attribute(freq, time, values_dict, inds, my_input='squared'):
    """
    freq - frequency values (Hertz), 1D array always, shape (N,)
    time - time values (seconds), 1D array always, shape (N, )
    values_dict - available time-frequency distributions, 1D array (N,) or 2D array (N_components, N) if 'aggr_clustering_axis' was set
    inds - indices of 'freq' and 'time' associated with their original array, 2D array (2, N), [freq_ind, time_inds]

    my_input - custom input parameter
    """

    # let's calculate energy volume in a window defined by standard deviation

    amplitudes = values_dict['spectrogram']  # it gives amplitude values, shape (3, N) as I use CATS-3C
    area = time.std() * freq.std()  # window of standard deviation 
    volume = area * amplitudes.mean()  # height is defined by average value among 3 components

    my_attribute = volume
    
    if my_input == 'squared':
        my_attribute = my_attribute**2

    return {"my_attribute": my_attribute}

## Custom filter of clusters

We can also define a function that filter events by custom attributes on-the-fly automatically:

In [40]:
def my_filter_func(catalog):  # the first argument is always 'catalog'
    # it must return a condition like below, which can be combined with other conditions
    return (catalog.Frequency_peak_Hz < 8) & (catalog.SNR_mean > 4)  # Only these event will be kept

In [41]:
from functools import partial

In [42]:
catalog_funcs = [cats.clustering.bbox_peaks,  # default, attributes defining bounding box and peak values of a cluster
                 partial(cats.clustering.calculate_moments, order=2, interpretable_only=True),
                 partial(my_custom_attribute, my_input='no')  # if attribute function has custom params like 'my_input'
                 # we can define them via `partial`
                 ]

detector3C = cats.CATSDetector(dt_sec=dt,
             stft_window_type='hann',
             stft_window_sec=1.0, 
             stft_overlap=0.8,
             minSNR=6,
             stationary_frame_sec=200,
             cluster_size_t_sec=0.2,
             cluster_size_f_Hz=8,
             cluster_distance_t_sec=0.2,
             cluster_distance_f_Hz=2,
             aggr_clustering_axis=0,  # we will use CATS-3C
             cluster_catalogs_funcs=catalog_funcs,  # list of all attribute funcs for the catalog
             cluster_feature_distributions=['spectrogram', 'spectrogram_SNR'],  # distributions that will be available
             cluster_catalogs_filter=my_filter_func,  # this func will filter out clusters
             )

In [43]:
result = detector3C ** D

1. STFT	...	Completed in 0.178 sec
2. B-E-DATE trimming	...	Completed in 0.0633 sec
3. Clustering	...	Completed in 0.00315 sec
4. Cluster catalog	...	Completed in 0.0382 sec
5. Projecting intervals	...	Completed in 0.041 sec
Total elapsed time:	0.324 sec



In [44]:
result.cluster_catalogs

Unnamed: 0_level_0,Unnamed: 1_level_0,Event_ID,Cluster_ID,Time_centroid_sec,Time_Frequency_covar,Time_std_sec,Time_start_sec,Time_end_sec,Time_peak_sec,Frequency_centroid_Hz,Frequency_std_Hz,...,Ellipse_angle,Ellipse_eccentricity,Energy_peak,Energy_mean,SNR_sum,SNR_mean,SNR_peak,Interval_ID,Interval_start_sec,Interval_end_sec
Component,Station,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
0,0,1,1,52.789102,-0.115922,0.54729,51.816103,54.230055,52.520006,6.032063,1.622513,...,-0.049525,0.993781,8.804494,5.586019,245.784836,5.586019,8.804494,1.0,51.815714,54.230427
0,0,2,2,11.139338,-0.036972,0.199487,10.767424,11.568765,11.067434,7.220021,1.672224,...,-0.013409,0.999901,8.002988,5.301037,95.418663,5.301037,8.002988,0.0,10.765595,11.570499
0,1,1,1,52.492933,-0.178112,0.503873,51.614878,53.8276,52.31878,5.56878,2.023079,...,-0.046263,0.998205,10.026932,6.72364,336.182007,6.72364,10.026932,1.0,51.614488,53.827975
0,1,2,2,10.77733,-0.179758,0.244915,10.365041,11.367508,10.664982,7.364231,1.94831,...,-0.047968,0.999909,11.000287,6.697976,187.543335,6.697976,11.000287,0.0,10.363143,11.369273
0,2,1,1,52.218985,-0.342314,0.671912,51.212429,54.431282,51.916327,5.178782,2.11034,...,-0.084714,0.99555,14.895798,7.663951,459.837067,7.663951,14.895798,1.0,51.212036,54.431653
0,2,2,2,124.126766,0.046014,0.504031,123.452361,125.866749,123.955262,5.459458,1.945364,...,0.013031,0.997756,19.388588,8.021687,449.214478,8.021687,19.388588,2.0,123.452197,125.86691
0,2,3,3,10.450701,-0.154282,0.240539,9.962664,11.166251,10.463756,7.071323,2.309983,...,-0.029197,0.99995,21.384996,9.228666,406.06131,9.228666,21.384996,0.0,9.960691,11.168047
0,3,1,1,51.656223,-0.174033,0.466642,50.407531,53.02269,51.513875,5.347459,2.239175,...,-0.036222,0.999113,20.728031,8.864241,629.361084,8.864241,20.728031,1.0,50.407132,53.023071
0,3,2,2,10.159131,-0.045626,0.220164,9.560292,10.964992,10.061304,7.682755,2.808663,...,-0.005819,0.999981,35.583946,14.245861,769.276489,14.245861,35.583946,0.0,9.558239,10.966821
0,3,3,3,124.042722,0.121887,0.667292,123.049909,126.470428,123.55281,5.267053,1.65719,...,0.052774,0.987211,30.947525,9.835901,668.841248,9.835901,30.947525,2.0,123.049745,126.470588


In [45]:
result.cluster_catalogs.loc[0, 5]

Unnamed: 0_level_0,Unnamed: 1_level_0,Event_ID,Cluster_ID,Time_centroid_sec,Time_Frequency_covar,Time_std_sec,Time_start_sec,Time_end_sec,Time_peak_sec,Frequency_centroid_Hz,Frequency_std_Hz,...,Ellipse_angle,Ellipse_eccentricity,Energy_peak,Energy_mean,SNR_sum,SNR_mean,SNR_peak,Interval_ID,Interval_start_sec,Interval_end_sec
Component,Station,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
0,5,1,1,50.956904,-0.000856,0.390338,49.803858,52.21778,51.111423,5.452649,2.379793,...,-0.000155,0.999638,38.070206,12.079211,881.78241,12.079211,38.070206,1.0,49.803454,52.218167
0,5,2,2,123.486792,0.054633,0.711103,122.647458,126.269202,123.150358,5.160733,1.683881,...,0.023433,0.984066,45.917828,12.211625,867.025391,12.211625,45.917828,2.0,122.647293,126.269362
0,5,3,3,10.590552,0.101482,0.233682,10.163852,11.166251,10.463756,7.46912,2.276899,...,0.019773,0.999949,17.399834,8.063495,314.476288,8.063495,17.399834,0.0,10.161917,11.168047


In [46]:
result.cluster_catalogs.my_attribute

Component  Station
0          0           2.674441
           0           0.993039
           1           3.788955
           1           1.825567
           2           6.303616
           2           4.862156
           2           3.460850
           3           5.671872
           3           6.182938
           3           6.733097
           4           7.906062
           4           6.589115
           4           5.403295
           5           7.483561
           5          10.584249
           5           2.538067
           6           7.530071
           6           8.681159
           6           1.569573
           7           6.105296
           7           6.614990
           8           3.886960
           8           8.757021
           9           6.573320
           9          10.884612
Name: my_attribute, dtype: float64

In [47]:
fig = result.plot_traces(intervals=True, gain=0.3, alpha=0.2)
fig = fig.opts(ylabel='Location (km)', title='CATS-3C with events only < 8 Hz of peak frequency')
fig

We can notice how one event around 15-20 seconds was eliminated because it does not satisfy our custom filtering condition.

# Pre-tuned CATS models

There are a few models with pre-tuned parameters, whic are not necessarily best for arbitrary data, but may be a good starting point.

To get a pre-tuned CATS, follow the instruction below:

In [3]:
cats_pretuned = cats.load_pretuned_CATS(
    mode='detector',  # 'detector' or 'denoiser'
    multitrace=False,  # True or False
    time_frequency_base='STFT',  # STFT or CWT
)

Successfully loaded a pre-tuned CATS from c:\users\seraf\dropbox\github\cats\cats\data\pretuned\detector_CATS.pickle
