# WaveBank
`WaveBank` is an in-process database for accessing seismic time-series data. Any directory structure containing ObsPy-readable waveforms can be used as the data source. `WaveBank` uses a simple indexing scheme and the [Hierarchical Data Format](https://en.wikipedia.org/wiki/Hierarchical_Data_Format) to keep track of each `Trace` in the directory. Without `WaveBank` (or another similar program) applications have implement their own data organization/access logic which is tedious and clutters up application code. `WaveBank` provides a better way. 



## Load Example Data
This tutorial will demonstrate the use of `WaveBank` on  two different [obsplus datasets](../datasets/datasets.ipynb). 

The first dataset, [crandall canyon](https://en.wikipedia.org/wiki/Crandall_Canyon_Mine), only has event waveform files. The second only has continuous data from two TA stations. We start by loading these datasets, making a temporary copy, and getting a path to their waveform directories.

In [None]:
%%capture
import obsplus

# make sure datasets are downloaded and copy them to temporary
# directories to make sure no accidental changes are made
crandall_dataset = obsplus.load_dataset("crandall_test").copy()
ta_dataset = obsplus.load_dataset("ta_test").copy()

# get path to waveform directories
crandall_path = crandall_dataset.waveform_path
ta_path = ta_dataset.waveform_path

In [None]:
crandall_path

## Create a WaveBank object
To create a `WaveBank` instance simply pass the class a path to the waveform directory.

In [None]:
bank = obsplus.WaveBank(crandall_path)

Utilizing the `udpate_index` method on the bank ensures the index is up-to-date. This will iterate through all files that are timestamped later than the last time `update_index` was run.

Note: If the index has not yet been created or new files have been added, `update_index` needs to be called.

In [None]:
bank.update_index()

## Using a custom index path

If you are working from a data directory that doesn't have write access, you can specify a custom location for the index path:

In [None]:
import tempfile
from pathlib import Path

index_path = Path(tempfile.mkdtemp()) / "index.h5"
cust_ind_bank = obsplus.WaveBank(crandall_path, index_path=index_path)
cust_ind_bank.update_index()

## Get waveforms

The files can be retrieved from the directory with the `get_waveforms` method. This method has the same signature as the ObsPy client `get_waveform` methods so they can be used interchangeably:

In [None]:
import obspy

t1 = obspy.UTCDateTime("2007-08-06T01-44-48")
t2 = t1 + 60
st = bank.get_waveforms(starttime=t1, endtime=t2)

`WaveBank` can filter on channels, locations, stations, networks, etc. using linux style search strings or regex. 

In [None]:
st2 = bank.get_waveforms(network="UU", starttime=t1, endtime=t2)

# ensure only UU traces were returned
for tr in st2:
    assert tr.stats.network == "UU"


In [None]:
st = bank.get_waveforms(starttime=t1, endtime=t2, station="O1??", channel="BH[NE]")

# test returned traces
for tr in st:
    assert tr.stats.starttime >= t1 - 0.00001
    assert tr.stats.endtime <= t2 + 0.00001
    assert tr.stats.station.startswith("O1")
    assert tr.stats.channel.startswith("BH")
    assert tr.stats.channel[-1] in {"N", "E"}


WaveBank also has a `get_waveforms_bulk` method for efficiently retrieving a large number of streams. 

In [None]:
args = [  # in practice this list may contain hundreds or thousands of requests
    (
        "TA",
        "O15A",
        "",
        "BHZ",
        t1 - 5,
        t2 - 5,
    ),
    (
        "UU",
        "SRU",
        "",
        "HHZ",
        t1,
        t2,
    ),
]
st = bank.get_waveforms_bulk(args)

## Yield waveforms
The Bank class also provides a generator for iterating large amounts of continuous waveforms. The following example shows how to get streams of one hour duration with a minute of overlap between the slices. 

The first step is to create a bank on a dataset which has continuous data. The example below will use the TA dataset.

In [None]:
ta_bank = obsplus.WaveBank(ta_path)

In [None]:
# get a few hours of kemmerer data
ta_t1 = obspy.UTCDateTime("2007-02-15")
ta_t2 = obspy.UTCDateTime("2007-02-16")

for st in ta_bank.yield_waveforms(
    starttime=ta_t1, endtime=ta_t2, duration=3600, overlap=60
):
    pass

## Put waveforms
Files can be added to the bank by passing a stream or trace to the `bank.put_waveforms` method. `WaveBank` does not merge files so overlap in data may occur if care is not taken.

In [None]:
# show that no data for RJOB is in the bank
st = bank.get_waveforms(station="RJOB")

assert len(st) == 0


In [None]:
# add the default stream to the archive (which contains data for RJOB)
bank.put_waveforms(obspy.read())
st_out = bank.get_waveforms(station="RJOB")

# test output
assert len(st_out)
for tr in st_out:
    assert tr.stats.station == "RJOB"



## Availability
`WaveBank` can be used to get the availability of data. The outputs can either be a dataframe or as a list of tuples in the form of [(network, station, location, channel, min_starttime, max_endtime)]. 

In [None]:
# get a dataframe of availability by seed ids and timestamps
bank.get_availability_df(channel="BHE", station="[OR]*")

In [None]:
# get list of tuples of availability
bank.availability(channel="BHE", station="[OR]*")

## Get Gaps and uptime
`WaveBank` can return a dataframe of missing data with the `get_gaps_df` method, and a dataframe of reliability statistics with the `get_uptime_df` method. These are useful for assessing the completeness of an archive of contiguous data.

In [None]:
bank.get_gaps_df(channel="BHE", station="O*").head()

In [None]:
ta_bank.get_uptime_df()

## Read index
`WaveBank` can return a dataframe of the the index with the `read_index` method, although in most cases this shouldn't be needed.

In [None]:
ta_bank.read_index().head()

## Similar Projects
`WaveBank` is a useful tool, but it may not be a good fit for every application. Check out the following items as well:

Obspy has a way to visualize availability of waveform data in a directory using [obspy-scan](https://docs.obspy.org/tutorial/code_snippets/visualize_data_availability_of_local_waveform_archive.html). If you prefer a graphical option to working with `DataFrame`s this might be for you.

Obspy also has [filesystem client](https://docs.obspy.org/master/packages/autogen/obspy.clients.filesystem.sds.Client.html#obspy.clients.filesystem.sds.Client) for working with SeisComP structured archives.

[IRIS](https://www.iris.edu/hq/) released a mini-seed indexing program called [mseedindex](https://github.com/iris-edu/mseedindex) which has an [ObsPy API](https://github.com/obspy/obspy/pull/2206).