# Fetcher
 
The Fetcher class builds on the unified data request interfaces to provide convenient methods for working with whole datasets. Specifically, you can:
 
1. Get continuous data using channels contained in an inventory (or returned from a station_client).


2. Iterate over events and their corresponding waveforms, which have channels defined by the station_client. 

Additionally, more methods are certainly possible and may be implemented in the future.

## Setup
We will use a Fetcher from the TA and Crandall dataset to demonstrate different aspects of the `Fetcher` functionality.

In [None]:
import obsplus

ta_dataset = obsplus.load_dataset('ta')
crandall = obsplus.load_dataset('crandall')

ta_fetcher = ta_dataset.get_fetcher()
crandall_fetcher = crandall.get_fetcher()

Likewise, the Fetcher can be initialized with any objects from which the appropriate client can be obtained. Commonly it is used with a `WaveBank`, a `Catalog` (or `EventBank`) and an `Inventory`. The following would also be valid:

In [None]:
cat = crandall.event_client.get_events()
inv = crandall.station_client.get_stations()
wavebank = crandall.waveform_client

crandall_fetcher = obsplus.Fetcher(waveforms=wavebank, stations=inv, events=cat)

The WaveFetcher constructor can also take obsplus created csv/dataframes for the [stations](../datastructures/stations_to_pandas.ipynb) and [events](../datastructures/events_to_pandas.ipynb) arguments. However, this may limit some of the uses of the Fetcher.

## Quick fetch
The easiest way to get data out of a data fetcher is to call it. The fetcher takes an argument that will provide it with information about when the stream should start. It can be a variety of types (float, UTCDateTime, Catalog, Event). The time before the reference time, and the time after the reference time must also be provided in the method call or in the Fetcher construction. 

The fetcher uses the inventory (or `station_client`) to know which channels to request from the waveform_client.

In [None]:
import obspy
reference_time = obspy.UTCDateTime('2007-02-15T06')
time_before = 1
time_after = 30
stream = ta_fetcher(reference_time, time_before, time_after)
print(stream)

## Continuous data

Continuous data can be requested from the wavefetcher, which uses the `station_client` to know which channels to pull from the waveform_client. This enables users to skip a lot of the boiler-plate associated with the normal `get_waveforms` interface.  

For example, looping over all the continuous data and and running a simple STA/LTA detector could be done like so: 

In [None]:
from obspy.signal.trigger import classic_sta_lta

# first define a function for doing the sta/lta
def print_sta_lta(tr: obspy.Trace):
    """ prints the sta/lta """
    sr = tr.stats.sampling_rate
    cft = classic_sta_lta(tr.data, int(20 * sr), int(60 * sr))
    print(f'{tr.id} starting at {st[0].stats.starttime}, has a max sta/lta of {max(cft):0.2f}')

In [None]:
# starttime for the continuous data
t1 = obspy.UTCDateTime('2007-02-16')

# endtime for the continuous data
t2 = t1 + 36000 * 10  # use 10 hours

# duration of each chunk returned (in seconds)
duration = 72000

# overlap (added to the end of the duration)
overlap = 60

# iterate over each chunk
for st in ta_fetcher.yield_waveforms(t1, t2, duration, overlap):
    # select only z component and perform preprocessing
    st = st.select(component='Z')
    st.detrend('linear')
    # do the sta/lta
    for tr in st:
        print_sta_lta(tr)
    

## Stream processors

It can be useful to define a stream_processing function that will be called on each stream before yielding it. This allows users to define flexible, custom processing functions without cluttering up the function calls with a lot of processing parameters.

In [None]:
# define a function that will be called on the stream before returning it. 
def stream_processor(st: obspy.Stream) -> obspy.Stream:
    """ select the z component, detrend, and filter a stream """
    st = st.select(component='Z')
    st.detrend('linear')
    st.filter('bandpass', freqmin=.005, freqmax=.04)
    return st

# attach stream processor to the wave fetcher
ta_fetcher.stream_processor = stream_processor

for st in ta_fetcher.yield_waveforms(t1, t2, duration, overlap):
    for tr in st:
        print_sta_lta(tr)

## Event data

Because we provided the wavefetcher object with an event_client, we can use it to iterate through the events it contains. Of course this should only be done with reasonably sized event clients, or the events should be limited in the query.

In [None]:
time_before = 1
time_after = 3
iterrator = crandall_fetcher.yield_event_waveforms(time_before, time_after)
for event_id, st in iterrator:
    print(f'fetched waveform data for {event_id} which has {len(st)} traces')

We can also create a dict of {event_id: stream} which is common input for converting streams to a [DataArray](../datastructures/xarray.ipynb).

In [None]:
st_dict = dict(crandall_fetcher.yield_event_waveforms(time_before, time_after))

for event_id, st in st_dict.items():
    print(event_id, len(st))

## Different events/inventories
The clients can be swapped out on each method call. This may be be useful to get a subset of the events or channels by providing a filtered catalog/inventory. For example, if we were only interested in Station M11A for a single call (but didn't want to modify the original wavefetcher) this could be easily accomplished like so:

In [None]:
# get a subset of the original inventory ()
inv = ta_dataset.station_client.get_stations()
inv2 = inv.select(station='M11A')

# iterate and print
for st in ta_fetcher.yield_waveforms(t1, t2, duration, overlap, stations=inv2):
    for tr in st:
        print_sta_lta(tr)

The same applies for swapping out events:

In [None]:
# read in catalog as and get a subset as a dataframe
cat = crandall.event_client.get_events()
cat_df = obsplus.events_to_df(cat)[:2]

# iterate the events and print 
iterator = crandall_fetcher.yield_event_waveforms(time_before, time_after, events=cat_df)
for event_id, st in iterator:
    print(f'fetching {event_id}, got {len(st)} traces')