# Datasets

ObsPlus includes a few interesting datasets which are primarily for testing purposes. The datasets are "lazy" in that all but the most essential information will be downloaded only when some code requests the dataset. This helps keep the size of ObsPlus small, but does mean you will need an internet connection the first time you use each dataset. Here are the dataset included in ObsPlus:


## Included Datasets

1. Kemmerer:
    A few days of continuous data recorded by TA stations M17A and M18A back in 2009 near Kemmerer Wyoming (USA). The stations recorded several nearby mine blasts, and I used these stations in [my MS thesis](https://academic.oup.com/gji/article-abstract/203/2/1388/584019) (shameless plug). This data set provides continuous waveform data for two stations, a catalog of mining blasts, and an inventory. 
    
2. TA:
    A small dataset with two stations from the TA with channels that have very low sampling rates. 

3. Crandall:
    Event waveforms for the [Crandall Canyon Mine collapse](https://en.wikipedia.org/wiki/Crandall_Canyon_Mine) and associated aftershocks. This dataset includes only event waveforms, but it has them for many TA and UUSS stations. It also includes a catalog of the events and a station inventory.
    
4. Bingham:
    Event waveforms associated with the [Bingham Canyon Landslide](https://en.wikipedia.org/wiki/Bingham_Canyon_Mine#Landslides), one of the largest anthropogenic landslides ever recorded. Luckily, the situation was well managed and no one was hurt. 
    
Each of these data sets is accessed via `obsplus.load_dataset` function which takes the name of the dataset as the only argument. It then returns a `DataSet` instance. This will take a few minutes if the datasets have not yet been downloaded, otherwise it should be very quick.

In [None]:
import obsplus
crandall = obsplus.load_dataset('crandall')

If you plan to modify and data, Datasets can be copied with the copy_dataset function.

In [None]:
from pathlib import Path

obsplus.copy_dataset('crandall', '.')
path = Path('.') / 'crandall'

In [None]:
import shutil

assert path.exists() and path.is_dir()  # ensure the directory was created
shutil.rmtree(path)  # cleanup created directory

## Dataset paths
By default, all datasets are stored in the user's home directory in a directory called 'opsdata'. Each dataset is contained by a subdirectory with the same name as the dataset. If you would prefer the datasets be stored somewhere else the locations can be controlled by the environmental variable `OPSDATA_PATH`.

## Publishing your own datasets
ObsPlus' `DataSet` class can be used to bundle and distribute any seismological dataset. This is primarily done through creating a small python package containing only essential (tiny) data files and instructions for downloading larger datafiles. The package can then be published to [PyPI](https://pypi.org/) and shared with the world! If that sounds hard, don't worry! We have made a [cookiecutter template](https://github.com/seismopy/opsdata) for just this purpose. It even includes files for testing and scripts to automate releases and data versioning.