# Matched-filters

This notebook will provide a look at using EQcorrscan's Tribe objects for matched-filter detection of earthquakes.

This notebook extends on the ideas covered in the [Quick Start](quick_start.ipynb) notebook. In particular this
notebook also covers:
1. Concurrent execution of detection workflows for more efficient compute utilisation with large datasets;
2. Use of local waveform databases using [obsplus](https://github.com/niosh-mining/obsplus);
3. Cross-correlation pick-correction using the `lag_calc` method.

## Set up

We are going to focus in this notebook on using local data. For examples of how to directly use data from online providers
see the [Quick Start](quick_start.ipynb) notebook. 

To start off we will download some data - in your case this is likely data that you have either downloaded from one or more
online providers, or data that you have collected yourself. At the moment we don't care how those data are organised, as long
as you have continuous data on disk somewhere. We will use [obsplus](https://github.com/niosh-mining/obsplus) to work out
what data are were and provide us with the data that we need when we need it.

Obsplus is great and has more functionality than we expose here - if you make use of obsplus, please cite the 
paper by [Chambers et al., (2021)](https://joss.theoj.org/papers/10.21105/joss.02696).

As in the [Quick Start](quick_start.ipynb) example, we will control the output level from EQcorrscan using logging.

In [1]:
import logging

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s\t%(name)s\t%(levelname)s\t%(message)s")

Logger = logging.getLogger("TutorialLogger")

We will use the March 2023 Kawarau swarm as our case-study for this. This was an energetic swarm that
was reported by New Zealand's GeoNet monitoring agency and discussed in a news article [here](https://www.geonet.org.nz/response/VJW80CGEPtq0JPCBHlNaR).

We will use data from ten stations over a duration of two days. The swarm lasted longer than this, but
we need to limit compute resources for this tutorial! Feel free to change the end-date below to run
for longer.  To be kind to GeoNet and not repeatedly get data from their FDSN service we are going to get data from the AWS open data bucket. If you don't already have boto3 installed you will need to install that for this sections (`conda install boto3` or `pip install boto3`).

NB: If you actually want to access the GeoNet data bucket using Python, a drop-in replacement from FDSN clients exists [here](https://github.com/calum-chamberlain/cjc-utilities/blob/main/src/cjc_utilities/get_data/geonet_aws_client.py)

In [11]:
def get_geonet_data(starttime, endtime, stations, outdir):
    import os
    import boto3
    from botocore import UNSIGNED
    from botocore.config import Config
    
    GEONET_AWS = "geonet-open-data"
    
    DAY_STRUCT = "waveforms/miniseed/{date.year}/{date.year}.{date.julday:03d}"
    CHAN_STRUCT = ("{station}.{network}/{date.year}.{date.julday:03d}."
                  "{station}.{location}-{channel}.{network}.D")
    if not os.path.isdir(outdir):
        os.makedirs(outdir)
    
    bob = boto3.resource('s3', config=Config(signature_version=UNSIGNED))
    s3 = bob.Bucket(GEONET_AWS)
    
    date = starttime
    while date < endtime:
        day_path = DAY_STRUCT.format(date=date)
        for station in stations:
            for instrument in "HE":
                for component in "ZNE12":
                    channel = f"{instrument}H{component}"
                    chan_path = CHAN_STRUCT.format(
                        station=station, network="NZ",
                        date=date, location="10", channel=channel)
                    local_path = os.path.join(outdir, chan_path)
                    if os.path.isfile(local_path):
                        Logger.info(f"Skipping {local_path}: exists")
                        continue
                    os.makedirs(os.path.dirname(local_path), exist_ok=True)
                    remote = "/".join([day_path, chan_path])
                    Logger.debug(f"Downloading from {remote}")
                    try:
                        s3.download_file(remote, local_path)
                    except Exception as e:
                        Logger.debug(f"Could not download {remote} due to {e}")
                        continue
                    Logger.info(f"Downloaded {remote}")
        date += 86400

In [12]:
%matplotlib inline

from obspy import UTCDateTime

starttime, endtime = UTCDateTime(2023, 3, 17), UTCDateTime(2023, 3, 19)
stations = ['EDRZ', 'LIRZ', 'MARZ', 'MKRZ', 'OMRZ', 'OPRZ', 'TARZ', 'WKHS', 'HNCZ', 'KARZ']

outdir = "tutorial_waveforms"

get_geonet_data(starttime=starttime, endtime=endtime, stations=stations, outdir=outdir)

2023-11-17 09:20:50,237	TutorialLogger	INFO	Skipping tutorial_waveforms/EDRZ.NZ/2023.076.EDRZ.10-EHZ.NZ.D: exists
2023-11-17 09:20:50,237	TutorialLogger	INFO	Skipping tutorial_waveforms/EDRZ.NZ/2023.076.EDRZ.10-EHN.NZ.D: exists
2023-11-17 09:20:50,238	TutorialLogger	INFO	Skipping tutorial_waveforms/EDRZ.NZ/2023.076.EDRZ.10-EHE.NZ.D: exists
2023-11-17 09:20:50,583	TutorialLogger	INFO	Skipping tutorial_waveforms/LIRZ.NZ/2023.076.LIRZ.10-EHZ.NZ.D: exists
2023-11-17 09:20:50,584	TutorialLogger	INFO	Skipping tutorial_waveforms/LIRZ.NZ/2023.076.LIRZ.10-EHN.NZ.D: exists
2023-11-17 09:20:50,585	TutorialLogger	INFO	Skipping tutorial_waveforms/LIRZ.NZ/2023.076.LIRZ.10-EHE.NZ.D: exists
2023-11-17 09:20:50,932	TutorialLogger	INFO	Skipping tutorial_waveforms/MARZ.NZ/2023.076.MARZ.10-EHZ.NZ.D: exists
2023-11-17 09:20:50,932	TutorialLogger	INFO	Skipping tutorial_waveforms/MARZ.NZ/2023.076.MARZ.10-EHN.NZ.D: exists
2023-11-17 09:20:50,933	TutorialLogger	INFO	Skipping tutorial_waveforms/MARZ.NZ/2023.076

clustering.rst  matched-filter.ipynb  quick_start.ipynb  template-creation.rst
mag-calc.rst    processing_flow.png   subspace.rst       waveforms
