# WaveBank
`WaveBank` is an in-process database for accessing seismic time-series data. Any directory structure containing obspy-readable waveforms can be used as the data source. `WaveBank` uses a simple indexing scheme and the [Hierarchical Data Format](https://en.wikipedia.org/wiki/Hierarchical_Data_Format) to keep track of each `Trace` in the directory.

In this tutorial after downloading seismic data with [obspy's fdsn mass download](https://docs.obspy.org/packages/autogen/obspy.clients.fdsn.mass_downloader.html), `WaveBank` will provide a way to access the data. The alternative is to manually navigate directories of waveform files and to load the desired data but this can become very tedious and clutters up your application code with data access code. `WaveBank` provides a better way. 

## Create a WaveBank object
The [Crandall Canyon](https://en.wikipedia.org/wiki/Crandall_Canyon_Mine) dataset from [ObsPlus' datasets](../datasets/datasets.ipynb) will be used in this tutorial. From this dataset a new `WaveBank` instance will be created.

This will ensure the waveforms have been downloaded (to the obsplus installation directory), copy the dataset to a temporary directory, and initialize a `WaveBank` instance.

In [None]:
import obsplus

In [None]:
%%capture
# Make sure the crandall canyone dataset is loaded and supress output.
crandall = obsplus.load_dataset('crandall_test')

In [None]:
crandall.waveform_client.bank_path

In [None]:
# create a copy of the crandall dataset, storing the copied files in a temporary directory
crandall = obsplus.copy_dataset('crandall_test')
# a directory of waveforms now lives here:
waveform_path = crandall.waveform_path
print(f"The waveform path is: {waveform_path}")

The next step is to feed the path of the waveform files to the WaveBank constructor.

In [None]:
bank = obsplus.WaveBank(waveform_path)

Utilizing the `udpate_index` method on the bank ensures the index is up-to-date. This will iterate through all files that are timestamped later than the last time `update_index` was run.

Note: If the index has not yet been created or new files have been added, `update_index` needs to be called.

In [None]:
bank.update_index()

## Get waveforms

The files can be retrieved from the directory with the `get_waveforms` method. This method has the same signature as the ObsPy client get_waveform methods so you can use them interchangeably:

In [None]:
import obspy

t1 = obspy.UTCDateTime('2007-08-06T01-44-48')
t2 = t1 + 60
st = bank.get_waveforms(starttime=t1, endtime=t2)
print (st[:5])  # print first 5 traces

`WaveBank` can filter on channels, locations, stations, networks, etc. using linux style search strings or regex. 

In [None]:
st2 = bank.get_waveforms(network='UU', starttime=t1, endtime=t2)

# ensure only UU traces were returned
for tr in st2:
    assert tr.stats.network == 'UU'

print(st2[:5])  # print first 5 traces

In [None]:
st = bank.get_waveforms(starttime=t1, endtime=t2, station='O1??', channel='BH[NE]')

# test returned traces
for tr in st:
    assert tr.stats.starttime >= t1 - .00001
    assert tr.stats.endtime <= t2 + .00001
    assert tr.stats.station.startswith('O1')
    assert tr.stats.channel.startswith('BH')
    assert tr.stats.channel[-1] in {'N', 'E'}

print(st)

WaveBank also shares the `get_waveforms_bulk` method with the FDSN client for efficiently retrieving a large number of streams. 

In [None]:
args = [  # in practice this list may contain hundreds or thousands of requests
    ('TA', 'O15A', '', 'BHZ', t1 - 5, t2 - 5,),
    ('UU', 'SRU', '', 'HHZ', t1, t2,),
]
st = bank.get_waveforms_bulk(args)
print(st )

## Yield waveforms
The Bank class also provides a generator for iterating large amounts of continuous waveforms. For example, it is common to use a power detector for identifying seismic events in continuous data. The following example shows how to get streams of one hour duration with a minute of overlap between the slices. 

We first need to create a bank on a dataset which has continuous data. For this we will use the TA dataset.

In [None]:
ds = obsplus.load_dataset('TA_test')
ta_bank = obsplus.WaveBank(ds.waveform_client)

In [None]:
# get a few hours of kemmerer data
ta_t1 = obspy.UTCDateTime('2007-02-15')
ta_t2 = obspy.UTCDateTime('2007-02-16')

for st in ta_bank.yield_waveforms(starttime=ta_t1, endtime=ta_t2, duration=3600, overlap=60):
    print (f'got {len(st)} streams from {st[0].stats.starttime} to {st[0].stats.endtime}')

## Put waveforms
Files can be added to the bank by passing a stream or trace to the `bank.put_waveforms` method. `WaveBank` does merge files, overlap in data may occur if care is not taken.

In [None]:
# show that no data for RJOB is in the bank
st = bank.get_waveforms(station='RJOB')

assert len(st) == 0

print(st)

In [None]:
# add the default stream to the archive (which contains data for RJOB)
bank.put_waveforms(obspy.read())
st_out = bank.get_waveforms(station='RJOB')

# test output
assert len(st_out)
for tr in st_out:
    assert tr.stats.station == 'RJOB'


print(st_out)

## Availability
`WaveBank` can be used to get the availability of data. The outputs can either be a dataframe or as a list of tuples in the form of [(network, station, location, channel, min_starttime, max_endtime)]. 

In [None]:
# get a dataframe of availability by seed ids and timestamps
bank.get_availability_df(channel='BHE', station='[OR]*')

In [None]:
# get list of tuples of availability
bank.availability(channel='BHE', station='[OR]*')

## Get Gaps and uptime
`WaveBank` can return a dataframe of missing data with the `get_gaps_df` method, and a dataframe of reliability statistics. These are useful if you are trying to assess the completeness of an archive of contiguous data.

In [None]:
bank.get_gaps_df(channel='BHE', station='O*').head()

In [None]:
ta_bank.get_uptime_df()

## Read index
`WaveBank` can read the index directly, although in most cases this shouldn't be needed.

In [None]:
ta_bank.read_index().head()

## Similar Projects
`WaveBank` is a useful tool, but it may not be a good fit for every application. Check out the following items as well:

Obspy has a way to visualize availability of waveform data in a directory using [obspy-scan](https://docs.obspy.org/tutorial/code_snippets/visualize_data_availability_of_local_waveform_archive.html). If you prefer a graphical option to working with `DataFrame`s this might be for you.

Obspy also has [filesystem client](https://docs.obspy.org/master/packages/autogen/obspy.clients.filesystem.sds.Client.html#obspy.clients.filesystem.sds.Client) for working with SeisComP structured archives.

[IRIS](https://www.iris.edu/hq/) released a mini-seed indexing program called [mseedindex](https://github.com/iris-edu/mseedindex) which has an [ObsPy API](https://github.com/obspy/obspy/pull/2206).