# MsPASS Getting Started Tutorial
## *Gary L. Pavlis and Yinzhi (Ian) Wang*
## Preliminaries
This tutorial assumes you have already done the following:
1.  Installed docker.
2.  Run the commmand `docker pull wangyinz/mspass_tutorial` 
3.  Launched docker using the tutorial container.
4.  Connected the container to get this tutorial running.

Our installation manual describes how to do that so we assume that was completed for you to get this far. 

Note MsPASS can also be run from a local copy of MsPASS installed through pip.   The only difference is in launching jupyter-notebook to get this tutorial running.   None of the tutorial should depend upon which approach you are using.  Further, if either approach was not done correctly you can expect python errors at the first import of a mspasspy module.  

## Download data with obspy
### Overview of this section
MsPASS leans heavily on obspy.  In particular, in this section we will use obspy's web services functions to download waveform data, station metadata, and source metadata.  The approach we are using here is to stage these data to your local disk.   The dataset we will assemble is the mainshock and X days of larger aftershocks of the Tohoku earthquake.  The next section then covers how we import these data into the MsPASS framework to allow them to be processed.

### Select, download, and save source data in MongoDB
As noted we are focusing on the Tohoku earthquake and its aftershocks.  That earthquake's origin time is approximately  March 11, 2011, at 5:46:24 UTC.  The ISC epicenter is 38.30N, 142.50E.  We will then apply obspy's *get_events* function with the following time and area filters:
1.  Starttime March 11, 2011, 1 hour before the origin time.
2.  End time 7 days after the mainshock origin time.
3.  Epicenters within + or - 3 degrees Latitude
4.  Epicenters within + or - 3 degrees of Longitude. 
5.  Only aftershocks larger than 6.5
Here is the incantation in obspy to do that:

In [4]:
from obspy import UTCDateTime
from obspy.clients.fdsn import Client
client=Client("IRIS")
t0=UTCDateTime('2011-03-11T05:46:24.0')
starttime=t0-3600.0
endtime=t0+(7.0)*(24.0)*(3600.0)
lat0=38.3
lon0=142.5
minlat=lat0-3.0
maxlat=lat0+3.0
minlon=lon0-3.0
maxlon=lon0+3.0
minmag=6.5
cat=client.get_events(starttime=starttime,endtime=endtime,
        minlatitude=minlat,minlongitude=minlon,
        maxlatitude=maxlat,maxlongitude=maxlon,
        minmagnitude=minmag)
print(cat)

We can save these easily into MongoDB for use in later processing with this simple command.

In [5]:
from mspasspy.db.database import Database
from mspasspy.db.client import Client as DBClient
dbclient=DBClient()
db=Database('getting_started')
db.save_catalog(cat)


11 Event(s) in Catalog:
2011-03-12T01:47:16.160000Z | +37.590, +142.751 | 6.5 MW
2011-03-11T19:46:35.300000Z | +38.800, +142.200 | 6.5 mb
...
2011-03-11T05:51:20.500000Z | +37.310, +142.240 | 6.8 None
2011-03-11T05:46:23.200000Z | +38.296, +142.498 | 9.1 MW
To see all events call 'print(CatalogObject.__str__(print_all=True))'


### Select, download, and save station metadata to MongoDB
We use a very similar procedure to download and save station data.   We again use obspy but in this case we use their *get_stations* function to construct what they call an "Inventory" object containing the station data. 

In [20]:
inv=client.get_stations(network='TA',starttime=starttime,endtime=endtime,format='xml',channel='*')
print("Number of stations retrieved=",len(inv))
print(inv)

  root.attrib["schemaVersion"], SCHEMA_VERSION))


Number of stations retrieved= 1
Inventory created at 2021-02-01T13:55:16.000000Z
	Created by: IRIS WEB SERVICE: fdsnws-station | version: 1.1.47
		    http://service.iris.edu/fdsnws/station/1/query?starttime=2011-03-11...
	Sending institution: IRIS-DMC (IRIS-DMC)
	Contains:
		Networks (1):
			TA
		Stations (446):
			TA.034A (Hebronville, TX, USA)
			TA.035A (Encino, TX, USA)
			TA.035Z (Hargill, TX, USA)
			TA.109C (Camp Elliot, Miramar, CA, USA)
			TA.121A (Cookes Peak, Deming, NM, USA)
			TA.133A (Hamilton Ranch, Breckenridge, TX, USA)
			TA.134A (White-Moore Ranch, Lipan, TX, USA)
			TA.135A (Vickery Place, Crowley, TX, USA)
			TA.136A (Ennis, TX, USA)
			TA.137A (Heron Place, Grand Saline, TX, USA)
			TA.138A (Matatall Enterprise, Big Sandy, TX, USA)
			TA.139A (Bunkhouse Ranch, Marshall, TX, USA)
			TA.140A (Cam and Jess, Hughton, LA, USA)
			TA.141A (Papa Simpson, Farm, Arcadia, LA, USA)
			TA.142A (Monroe, LA, USA)
			TA.143A (Socs Landing, Pioneer, LA, USA)
			TA.214A (Organ Pi

The output shows we just downloaded the data form 446 TA stations that were running during this time period. We will now save these data to MongoDB with a very similar command to above: 

In [17]:
db.save_inventory(inv)

Network TA (USArray Transportable Array (EarthScope_TA))
	Station Count: 446/1893 (Selected/Total)
	2003-01-01T00:00:00.000000Z - --
	Access: open
	Contains:
		Stations (446):
			TA.034A (Hebronville, TX, USA)
			TA.035A (Encino, TX, USA)
			TA.035Z (Hargill, TX, USA)
			TA.109C (Camp Elliot, Miramar, CA, USA)
			TA.121A (Cookes Peak, Deming, NM, USA)
			TA.133A (Hamilton Ranch, Breckenridge, TX, USA)
			TA.134A (White-Moore Ranch, Lipan, TX, USA)
			TA.135A (Vickery Place, Crowley, TX, USA)
			TA.136A (Ennis, TX, USA)
			TA.137A (Heron Place, Grand Saline, TX, USA)
			TA.138A (Matatall Enterprise, Big Sandy, TX, USA)
			TA.139A (Bunkhouse Ranch, Marshall, TX, USA)
			TA.140A (Cam and Jess, Hughton, LA, USA)
			TA.141A (Papa Simpson, Farm, Arcadia, LA, USA)
			TA.142A (Monroe, LA, USA)
			TA.143A (Socs Landing, Pioneer, LA, USA)
			TA.214A (Organ Pipe National Monument, Ajo, AZ, USA)
			TA.233A (Rising Star, TX, USA)
			TA.234A (Clairette, TX, USA)
			TA.236A (Katherine and Luke Keathl

### Download waveform data
The last download step for this tutorial is the one that will take the most time and consume the most disk space;  downloading the waveform data.   To keep this under control we keep only a waveform section spanning most of the body waves.   We won't burden you with the details of how we obtained the following rough numbers we use to define the waveform downloading parameters:

1.  The approximate distance from the mainshock epicenter to the center of the USArray in 2011 is 86.5 degrees.
2.  P arrival is expected about 763 s after the origin time
3.  S arrival is expected about 1400 s after the origin time

Since we have stations spanning the continent we will use the origin time of each event +P travel time (763 s) - 4 minutes as the start time.  For the end time we will use the origin time + S travel time (1400 s) + 10 minutes.  

This process will be driven by origin times from the events we downloaded earlier.   We could drive this by using the obspy *Catalog* object created above, but because saved the event data to the database we will use this opportunity to illustrate how that data is managed in MsPASS.

First, let's go over the data we saved in MongoDB.  We saved these data in a *collection* we call *source*.   For those familiar with relational databases a MongoDB "collection" plays a role similar to a table (relation) in a relational database.   A "collection" contains one or more "documents".  A "document" in MongoDB is analagous to a single tuple in a relational database.  The internal structure of a MongoDB is, however, very different being represented by binary storage of name-value pairs in a format they call BSON because the structure can be represented in human readable form as a common format today called JSON.   A key point for MsPASS to understand is the BSON (JSON) documents stored in MongoDB map directly into a python dict container.   We illustrate that in the next box by printing the event hypocenter data we downloaded above and then stored in MongoDB:

In [None]:
dbsource=db.source
cursor=dbsource.find()   # This says to retrieve and iterator overall all source documents
# The Cursor object MongoDB's find function returns is iterable
print('Event in tutorial dataset')
for doc in cursor:
    lat=doc['lat']
    lon=doc['lon']
    depth=doc['depth']
    origin_time=doc['time']
    # In MsPASS all times are stored as epoch times. obspy's UTCDateTime function easily converts these to 
    # a readable form in the print statment here but do that only for printing or where required to 
    # interact with obspy
    print(lat,lon,depth,UTCDateTime(origin_time))

Notice that we use python dict syntax to extract attributes like latitude ('lat' key) from "document" which acts like a python dict.  

With that the loop below is similar, BUT requires an obscure parameter not usually discussed in the MongoDB documentation.   The problem we have to deal with is that the obspy web service downloader we are going to call in the loop below may take some time to complete.  Long running processes interacting with MongoDB and using a "Cursor" object (the thing find returned) can fail in a confusing with from a timeout problem.  That is, a job will mysteriously fail with a message that does not always make the fundamental problem clear.  The solution is to create what some books call an "immortal cursor".   You will see this in the next box that does the waveform downloading as this line:
```
cursor=dbsource.find({},cursor_timeout=False)
```
where here we use an explicit "find all" with the (weird) syntax of "{}" and make the cursor immortal by setting cursor_timeout=False.

TODO;   syntax above may be incorrect

With that background here is the script to download data.  You might want to go grab a cup of coffee while this runs as it will take a while.  

In [21]:
# We have to redefine the client from obspy to use their so called bulk downloader
from obspy.clients.fdsn import RoutingClient
client = RoutingClient("iris-federator")
cursor=dbsource.find({},cursor_timeout=False)
count=0
start_offset=763.0-4*60.0
end_offset=1400.0+10*60.0
for doc in cursor:
    origin_time=doc['time']
    print('Starting on event number with with origin time=',UTCDateTime(origin_time))
    stime=origin_time+start_offset
    etime=origin_time+end_offset
    strm=client.get_waveforms(
        starttime=UTCDateTime(stime),
        endtime=UTCDateTime(etime),
        network='TA',
        channel='BH?',
        location='*'
    )
    

NameError: name 'dbsource' is not defined