<a href="https://colab.research.google.com/github/sgrubas/cats/blob/main/tutorials/DenoisingTutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# !pip install git+https://github.com/sgrubas/cats.git  # CATS installation

# Earthquake denoising via Cluster Analysis of Trimmed Spectrogram (CATS)

This notebook explains the usage of the CATS denoiser.

A minimalistic example would look like this:

```python
data = import_sample_data()
denoiser = cats.CATSDenoiser(**parameters)  # or `cats.CATSDenoiserCWT(**parameters)`
result = denoiser.denoise(data)
result.plot((1, 2))
```

**Note**, this notebook briefly explains how to run the CATS denoiser, all the details about the parameters and CATS extensions are covered in [DetectionTutorial]("https://github.com/sgrubas/cats/blob/main/tutorials/DetectionTutorial.ipynb").

CATS denoiser is the same as CATS detector, but also performs inverse STFT at the end, no extra parameters are introduced. All methods and functions are the same, except for a few which rename 'detect' to 'denoise' words.
Only `CATSDenoiserCWT` has slightly different set of parameters. 

In [1]:
import numpy as np
import holoviews as hv

In [2]:
import cats

<hr>

# Import of synthetic dataset

In [3]:
data = cats.import_sample_data()

Dclean = data['data']
time = data['time']  # time
dt = data['dt']      # sampling time
x = data['x']        # location of recievers
dimensions = ["Component", "Receiver", "Time"]

In [4]:
# contamination with white gaussian noise
np.random.seed(132)
noise_scale = 0.1
Noise = np.random.randn(*Dclean.shape) * noise_scale   # colored noise
Noise += noise_scale * np.sin(time * 2 * np.pi * 50)[None, None, :]  # constant electric 50 Hz noise
D = Dclean + Noise

# CATS Denoiser

In [5]:
# NOTE: all parameters work absolutely the same way as in CATSDetector
# refer to DetectionTutorial for the details

denoiser = cats.CATSDenoiser(dt_sec=dt,
                 stft_window_type='hann',
                 stft_window_sec=1.0, 
                 stft_overlap=0.9,
                 minSNR=6,
                 stationary_frame_sec=200,
                 cluster_size_t_sec=0.2,
                 cluster_size_f_Hz=8,
                 cluster_distance_t_sec=0.2,
                 cluster_distance_f_Hz=2,

                 freq_bandpass_Hz=(0, 30),
                             
                 aggr_clustering_axis=0,  # 3C denoising
                 full_aggregated_mask=False,  # whether to use full 3C mask, if True, noise pixels may be picked
                             
                 cluster_catalogs_funcs=None,
                 cluster_feature_distributions=None,
                 cluster_catalogs_filter=None,
                             
                 clustering_multitrace=False,
                 cluster_size_trace=2,
                 cluster_distance_trace=1
                 )

In [6]:
result = denoiser ** D

1. STFT	...	Completed in 0.403 sec
2. B-E-DATE trimming	...	Completed in 0.0187 sec
3. Clustering	...	Completed in 0.00522 sec
4. Cluster catalog	...	Completed in 0.0675 sec
5. Inverse STFT	...	Completed in 0.209 sec
Total elapsed time:	0.703 sec



In [7]:
result.cluster_catalogs.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Event_ID,Cluster_ID,Time_start_sec,Time_end_sec,Time_peak_sec,Frequency_start_Hz,Frequency_end_Hz,Frequency_peak_Hz,Energy_peak,Energy_mean,Energy_sum,SNR_sum,SNR_mean,SNR_peak
Component,Station,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
0,0,1,1,51.760135,54.198097,52.623549,2.999192,9.997306,6.998115,9.178542,5.077018,431.546539,431.546539,5.077018,9.178542
0,0,2,2,10.718205,11.631598,11.174885,3.998923,10.997037,6.998115,7.519713,4.726258,165.419037,165.419037,4.726258,7.519713
0,0,3,3,14.984846,16.203266,15.340069,4.998653,26.992728,12.996498,13.744453,6.819213,1050.158813,1050.158813,6.819213,13.744453
0,1,1,1,51.556955,53.791737,52.31878,1.999461,10.997037,6.998115,9.948761,6.043772,640.639832,640.639832,6.043772,9.948761
0,1,2,2,10.311864,11.42841,10.666936,3.998923,10.997037,6.998115,10.778017,6.232872,342.807953,342.807953,6.232872,10.778017


In [8]:
station = 1
fig = result.plot_multi([(0, station), (1, station), (2, station)], 
                        time_interval_sec=(7, 20))
fig.opts(title=f"CATS-3C denoising on station {station}", fontsize=dict(title=32))

In [14]:
fig = cats.plot_traces(data=Dclean, time=time, time_interval_sec=(7, 22), gain=0.3)
fig = fig.opts(ylabel='Location (km)', title='Clean data')
# hv.save(fig, "../figures/clean_traces_sample.png", dpi=100)  # use this to save figure
fig

In [15]:
fig = result.plot_traces(show_denoised=False, time_interval_sec=(7, 22), gain=0.3)
fig = fig.opts(ylabel='Location (km)', title='Noisy data')
fig

In [9]:
fig = result.plot_traces(show_denoised=True, time_interval_sec=(7, 22), gain=0.3)
fig = fig.opts(ylabel='Location (km)', title='Denoised data')
fig

<hr>

# CATS denoising with CWT

The same workflow can be based on CWT instead of STFT, which can improve the denoising quality due to the properties of CWT.

In [10]:
cats_cwt_denoiser = cats.CATSDenoiserCWT(dt_sec=dt,
                 wavelet_type=('morlet', {'mu': 5}),  # mother wavelet
                 scales_type='log',  # distrbution of scales
                 nvoices=4,  # how frequent the scales are, step is 2^(1/nvoices)
                 minSNR=6,
                 stationary_frame_sec=200,
                 cluster_size_t_sec=0.2,
                 cluster_size_scale_octaves=1,  # instead of freq in Hertz
                 cluster_distance_t_sec=0.2,
                 cluster_distance_scale_octaves=0,  # min step possible

                 freq_bandpass_Hz=(1, 40),  # Hertz will be transformed to scales
                 bandpass_scale_octaves=None,  # bandpass in scales, superseded by 'freq_bandpass_Hz'
                 define_scales_by_bandpass=True,  # if True, scales outside the bandpass are not calculated, speed up
                                         
                 aggr_clustering_axis=0,  # 3C denoising
                 full_aggregated_mask=False,  # whether to use full 3C mask, if True, noise pixels maybe used as well
                                         
                 cluster_catalogs_funcs=None,
                 cluster_feature_distributions=None,
                 cluster_catalogs_filter=None,
                             
                 clustering_multitrace=False,
                 cluster_size_trace=2,
                 cluster_distance_trace=1
                 )

`CATSDenoiserCWT` has almost the same parameters as regular `CATSDenoiser`, but instead of frequencies we have scales, particularly: 
`cluster_size_scale_octaves` and `cluster_distance_scale_octaves` (which are sampled on $log_2$ scale, this is why octaves)

In [12]:
# let's apply to the same data
cwt_denoised = cats_cwt_denoiser ** D

1. CWT	...	Completed in 1.52 sec
2. B-E-DATE trimming	...	Completed in 0.606 sec
3. Clustering	...	Completed in 0.105 sec
4. Cluster catalog	...	Completed in 0.586 sec
5. Inverse CWT	...	Completed in 0.384 sec
Total elapsed time:	3.2 sec



In [13]:
cwt_denoised.cluster_catalogs.loc[0, 0]

Unnamed: 0_level_0,Unnamed: 1_level_0,Event_ID,Cluster_ID,Time_start_sec,Time_end_sec,Time_peak_sec,Frequency_start_Hz,Frequency_end_Hz,Frequency_peak_Hz,Energy_peak,Energy_mean,Energy_sum,SNR_sum,SNR_mean,SNR_peak
Component,Station,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
0,0,1,1,15.128098,15.919327,15.330301,4.498788,36.002019,15.136547,23.73538,7.666949,17909.992188,17909.992188,7.666949,23.73538
0,0,2,2,66.177005,66.448562,66.408512,18.002962,30.273094,21.408295,4.00732,3.660921,259.925415,259.925415,3.660921,4.00732
0,0,3,3,51.784456,54.165957,52.580569,3.182736,12.72704,7.568273,9.711936,5.007578,21983.269531,21983.269531,5.007578,9.711936
0,0,4,4,10.800761,11.562685,10.983427,4.498788,10.704147,7.568273,7.927531,4.905537,7554.527344,7554.527344,4.905537,7.927531
0,0,5,5,123.719847,125.952871,124.289337,2.249394,9.001481,5.350121,11.50771,5.642431,21209.898438,21209.898438,5.642431,11.50771
0,0,6,6,126.09158,127.373175,127.110409,4.498788,9.001481,5.350121,7.124785,4.703844,8245.838867,8245.838867,4.703844,7.124785
0,0,7,7,67.134294,67.749694,67.430272,2.249394,3.784137,3.182736,4.586117,3.863537,2750.838379,2750.838379,3.863537,4.586117


In [14]:
station = 1
fig = cwt_denoised.plot_multi([(0, station), (1, station), (2, station)], 
                        time_interval_sec=(7, 20))
fig.opts(title=f"CATS-CWT-3C denoising on station {station}", fontsize=dict(title=32))

In [15]:
fig = cwt_denoised.plot_traces(time_interval_sec=(7, 23), gain=0.3)
fig.opts(title='CATS-CWT denoised data')